Kernel Traffic
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic

Kernel Traffic #41 For 1 Nov 1999

By Zack Brown

Table Of Contents


Anybody feel like hacking on 'mutt'? David Welton has been working on a patch to allow 'mutt' to reorganize broken mail threads dynamically. He's got it to the point where it's at least usable, and it makes a huge difference with Kernel Traffic.

Unfortunately the 'mutt' people have not been inspired to work it into shape for inclusion in an official release, maybe because its value is generally only visible to people tracking very high-traffic mailing lists.

Dave is willing to do more work on it, but he doesn't want to just keep updating it for each new release of 'mutt'. He'd like to get it into a form where it can be accepted in the official sources.

If you'd like to help out with this, contact Dave at I'd really appreciate it.

Mailing List Stats For This Week

We looked at 1140 posts in 4361K.

There were 431 different contributors. 188 posted more than once. 161 posted last week too.

The top posters of the week were:

1. File Server Optimization Discussion

12 Oct 1999 - 19 Oct 1999 (6 posts) Archive Link: "nonblocking disk read again"

People: Stephen C. TweedieChuck LeverRik van RielDan Kegel

Dan Kegel had an interesting problem with finding the most efficient way for a web server to send files to clients. If the server spawned a new thread for each file it had to transfer, this would have the benefit of not delaying the clients while it processed each request; on the other hand, spawning new threads would generate overhead in the context switching required by switching between them. Alternatively, if the web server used the sendfile() system call to handle all transfers from within the original thread, this would save context switching, but would block all the clients during any disk read. Since the server was dealing with file requests, disk reads would be fairly common. He suggested an optimization to minimize this blocking: if sendfile() would only read files from disk that were not already in memory, this would reduce the amount of disk reads, and hence the amount of blocking that would have to take place. His only question was, would the remaining disk-read overhead be less than the context switching involved in spawning a new thread for each transfer?

Stephen C. Tweedie suggested that putting the thread into lazy-TLB mode inside the sendfile() system call, would prevent any hardware context switches until the end of the system call, which would reduce the overhead Dan was talking about.

Dan objected that unless the sockets themselves blocked as well, sendfile() calls to them were likely to be fairly short. Switching nonblocking sockets to blocking mode before transferring large files would keep sendfile() active during the transfer, but he pointed out that this would again require one thread per client, which would take them back to the original problem.

Rik van Riel came in at this point, suggesting a way to use asynchronous callbacks to move the functionality into the bottom-half, where hardware interrupts were handled. Stephen objected that pulling stuff from disks couldn't be done in the bottom half. He suggested reserving a small number of kernel threads to deal with the asynchronous callback, which could handle the disk-to-network pipeline.

Chuck Lever gave a pointer to a PostScript article, Connection Scheduling in Web Servers. It argued for a server scheduling policy that would favor connections that had the fewest bytes left to transmit. The point of the article was apparently that with this method, long connections pay very little penalty. Chuck felt the reference might be relevant to the discussion at hand.

End Of Thread.

2. Swapping Race; Implementation Discussion

14 Oct 1999 - 16 Oct 1999 (13 posts) Archive Link: "[PATCH] kanoj-mm17-2.3.21 kswapd vma scanning protection"

Topics: Virtual Memory

People: Kanoj SarcarLinus TorvaldsManfred Spraul

Kanoj Sarcar posted a patch to fix a race that existed between code that traversed the vma list in order to pick processes to steal memory from, and other code that added or deleted elements from that list. He added that there should be some discussion about swapout() parameter passing, the point being that swapout() currently took a pointer to the Virtual Memory Area (vma) structure as its input, while that structure might be being deleted at the same time. To solve that race condition, he suggested passing only particular fields of the structure to swapout(), rather than a pointer to the structure itself.

Manfred Spraul objected that it was impossible to know beforehand which fields would be important. He suggested either locking the whole kernel during the call, or else grabbing a semaphore and sleeping with it inside the function. Aside from that, he felt swapout() could release the semaphore itself.

Regarding sleeping with the semaphore, Kanoj pointed out, "Process A runs short of memory, decides to swap_out its own mm. It victimizes vma V, whose swapout() routine goes to sleep with A's vmlist lock, waiting for a sleeping lock L. Meanwhile, lock L is held by process B, who runs short of memory, and decides to steal from A. But, A's vmlist lock is held. Deadlock, right?" Manfred agreed.

Kanoj felt that having swapout() release the semaphore itself would not be a realistic rule to expect every driver writer to adhere to. Manfred replied that if they didn't do it they'd get a system crash, so they'd get it right pretty quick. Kanoj pointed out that the system crash would only show up under heavy load and other conditions that might be missed during development.

Kanoj seemed to agree that locking the whole kernel would work instead, but he felt there was an even better solution. He explained, "Before invoking swapout(), and before loosing the vmlist_lock in try_to_swap_out, the vma might be marked with a flag that indicates that swapout() is looking at the vma. do_munmap will look at this flag and put itself to sleep on a synchronization variable. After swapout() terminates, the page stealer will wake up anyone waiting in do_munmap to continue destroying the vma."

The discussion continued briefly, until Linus Torvalds interrupted, with:

I am convinced that all these games are unnecessary, and that the problem is fundamentally different. Not fixing up the current code, but just looking at the problem differently - making the deadlock go away by virtue of avoiding the critical regions.

I think the suggestion to change the semantics of "swapout" is a good one. Now we have the mm layer passing down the vma to the IO layer, and hat makes everything more complex. I would certainly agree with just changing that semantic detail, and changing swapout to something like

        .. hold a spinlock - we can probably just reuse the
           page_table_lock for this to avoid multiple levels of locking

        file = fget(vma->vm_file);
        offset = file->f_offset + (address - vma->vm_start);
        flush_tlb_page(vma, address);

        error = file->f_ops-gt;swapout(file, offset, page);


and then the other requirement would be that whenever the vma chain is physically modified, you also have to hold the page_table_lock.

And finally, we rename the "page_table_lock" to the "page_stealer_lock", and we're all done.

Does anybody see anything fundamentally wrong here? It looks like it should fix the problem without introducing any new locks, and without holding any locks across the actual physical swapout activity.

Manfred asked, "What about shm? vma->vm_file is NULL, this would oops." But Linus replied:

Well, considering that shm_swapout() currently looks like this:

        static int shm_swapout(struct vm_area_struct * vma, struct page * page)
                return 0;

I don't think the SHM case is all that problematic: we could easily just have a dummy vma->vm_file there. In fact, it probably should do so anyway: the SHM code _really_ does not need the private member.

There are strong arguments for saying that if the thing you're mapping actually _needs_ the vma in order to swap out, then the thing is broken. SHM certainly used to be horribly broken in this area, but that's no longer true.

Elsewhere, to Linus' original post, Kanoj replied, "This basically means that you are overloading the page_table_lock to do the work of the vmlist_lock in my patch. Thus vmlist_modify_lock/ vmlist_access_lock in my patch could be changed in mm.h to grab the page_table_lock. As I mentioned before, moving to a spinlock to protect the vma chain will need some changes to the vmscan.c code." Linus agreed. Kanoj went on to say, "The reason I think most people suggested a different lock, namely vmlist_lock, is to reduce contention on the page_table_lock, so that all the other paths like mprotect/mlock/mmap/munmap do not end up grabbing the page_table_lock which is grabbed in the fault path." But Linus explained:

There can be no contention on the page_table_lock in the absense of the page stealer. The reason is simple: every single thing that gets the page table lock has already gotten the mm lock beforehand, and as such contention is never an issue.

Contention _only_ occurs for the specific case when somebody is scanning the page tables in order to free up pages. And at that point it's not contention any more, at that point it is the thing that protects us from bad things happening.

As such, the hold time of the spinlock is entirely immaterial, and coalescing the page table lock and the vmlist lock does not increase contention, it only decreases the number of locks you have to get. At least as far as I can see.

Kanoj had added, "Let me know how you want me to rework the patch. Imo, we should keep the macros vmlist_modify_lock/vmlist_access_lock, even if we do decide to overload the page table lock." Linus replied:

Don't think of it as overloading the page table lock. Notice how the page table lock really isn't a page table lock - it really is just "protection against vmscan", and it's misnamed mainly because the only part we protected was the page tables (which isn't enough).

So think of it as a fix to the current protection, and as that fix makes it protect more than just the page tables (makes it protect everything that is required), the name should also change.

Kanoj replied, "Okay, in the next couple of days, I will try to use the lock currently known as "page_table_lock" for vma scan protection in the page stealing code and post the modififed patch."

3. Bootsector Assembly Questions, Answers, And Anger

15 Oct 1999 - 19 Oct 1999 (21 posts) Archive Link: "bootsect.S changes"

Topics: Assembly

People: Chris NoeAndrzej KrzysztofowiczAlbert D. CahalanRichard B. JohnsonLinus Torvalds

Andrzej Krzysztofowicz noticed that the i386 bootsector code evaluated to 32-bit code rather than 16. Since this would add one byte per statement to parts of the code where every byte counted, he was curious to know the explanation. Richard B. Johnson remarked that the problem seemed to go much deeper: first, the boot code had been changed to use GAS instead of AS86 assembler, and then there didn't seem to be any way to tell GAS to use 16-bit instructions instead of 32-bit. The way it looked to him, the souce code had been tweaked to get the correct binary at the expense of having incorrect source code.

Chris Noe, who wrote the relevant code, offered this explanation:

All the (current) new bootcode was written with the idea of "binary level compatibility" firmly in mind. I had actually worked backward from the original as86 output to come up with the equivalent gas instructions when the assembly was questionable, simply because I knew that that encoding worked. I did that with the idea that 'hey, if it compiles to the same opcodes, were fine' -- that was the sole plan at the moment: to have a 2.4 kernel that doesn't need as86/ld86 to build.

Yes that makes the source wrong on a pure "syntax" level in quite a few spots, but does it truly matter if (a) it assembles to the correct opcodes, (b) gives some much needed testing and real life usage of gas, so that it finally will do the right things when it comes to 16 bit asms, and (c) is planned on being (for lack of a better word) "optimized" back to correctness later on down the road (when everyone's using, or better yet are forced to use, the latest binutils).

You bring up some valid points, but I feel the intent of the patch was misjudged.

Just a few points on the latest binutils (which now might be a requirement of the next 2.3): The changes to the bootcode were intended to be included with a kernel that had a minimum binutils of, because I didn't feel it was worth a binutils upgrade to 2.9.5 to each and every person in order to have "syntactic perfection" (due to the fact that only the *latest* binutils produce the most correct code for most every case). I just wanted joe user to be able to still compile, but without the dependency on as86.

I've heard we might go to binutils 2.9.5 now, and if it happens I will happily rework the code to be more clearly, properly written.

Hope that explains my thoughts a bit.

Richard was happy for the explanation and offered to help out, but Albert D. Cahalan was very upset with the situation. He found using incorrect source to get correct binaries objectionable, and suspected that Chris had snuck those details of the patch past Linus Torvalds. He added that the upgrades Chris' changes forced were fairly important parts of the system, and that Chris acted out of prejudice against Intel syntax. By the end of his post, Albert was really fuming.

Chris replied that he was not prejudiced against AS86, but liked it and used it from time to time. He asserted, however, that it was an outside dependency, and was also out of date. Restricting the requirement to binutils seemed like the right thing to do. True, it required an upgrade, but it was an upgrade to a version that had already been recommended by Linus for 2.2 kernels. In light of that, he didn't think it was asking too much to make it a requirement for 2.4 kernels. He reiterated that the actual binary code was virtually identical to what it had been before. Only the source had changed, and moreover could be gradually brought into syntactic correctness, as systems migrated to the latest binutils.

4. Bigmem Patches Advancing

16 Oct 1999 - 21 Oct 1999 (5 posts) Archive Link: "SMP Extraversion"

Topics: Big Memory Support, Raw IO

People: Linus TorvaldsAlan CoxStephen C. TweedieAndrea ArcangeliKanoj Sarcar

The issue of supporting large quantities of RAM on Intel systems first came up in Issue #10, Section #10  (7 Mar 1999: Big Memory Machines) , where Linus Torvalds grudgingly agreed that Stephen C. Tweedie's implementation ideas were feasible. The first tentative patch was not from Stephen but Kanoj Sarcar, and was covered in Issue #19, Section #4  (10 May 1999: Increasing Maximum Physical Memory On x86 Machines) . It was not successful, and by Issue #31, Section #3  (30 Jul 1999: Large Memory Systems) there was still no report of any success with big memory systems. However, a week later in Issue #32, Article 12, came the first public breakthrough. Andrea Arcangeli and Gerhard Wichert published a patch for big memory systems, and Linus approved it for 2.3; at the time it was already clear that some other patches, to allow raw I/O, conflicted with the bigmem patch, but Linus opposed raw I/O. The conflicts between the two patches had only worsened by Issue #35, Section #2  (2 Sep 1999: bigmem Patch Conflicts With rawio Patch) .

This time, in the course of discussion, Linus showed his willingness to bring the patches even more into the mainstream kernel:

Note that I'm probably going to remove the 1G split.

The 3GB user-space is better for users, and with BIGMEM support the advantages of 2G kernels are much less anyway. I don't see any horribly compelling reasons for showing that particular difference to users any more: the 2G split was a hack to avoid doing bigmem, but now...

This is especially true now that I have fairly clean patches from Ingo to take the BIGMEM stuff up to 64GB - the difference of whether you can directly map 1GB or 2GB of that physical memory is negligible, as 95% of the memory will be high anyway on the big boxes.

In a completely different discussion, under the Subject: Linux 2.2.13ac1, Alan Cox added, "Bigmem to 4Gig should be rock solid. In fact I believe SuSE shipped support with 6.2 (someone from SuSE can confirm this yes/no ?). Bigmem above 4Gig is definitely not stable."

5. OFFTOPIC: Color-Blindness And Viewing The Kernel Version History Page

16 Oct 1999 - 22 Oct 1999 (22 posts) Archive Link: "Colour blindness & the Linux Kernel Version History"

People: Riley WilliamsAudin MalminDavid FordAlexander ViroTim TowersAlan CoxBrandon S. AllberyDavid Weinehall

Riley Williams, who maintains the Linux Kernel Version History site, got some private email about his page being difficult to view by people with color-blindness. The reason he was posting to the list was because he'd lost the email during a power outage, and wanted to receive it again. He added:

Some points I will make with regards to this point though, which you may wish to consider when resending it:

  1. I suffer from Red-Green colour blindness myself, with the result that I am not the ideal person to determine colour schemes for web pages. It is largely for this reason that I keep my pages reasonably simple, and don't normally use lavish colour schemes.
  2. Because of the above, I designed the pages to be readable easily by myself, and by people suffering from the same form of colour blindness. In general, I find pages on a Cyan background the easiest to read, which is why I have used that colour scheme on my pages.

I am of course willing to listen to any criticism on this front, and to try out any suggestions that come my way, but if I can't read the resulting pages myself, they will not go online. I consider this to be reasonable.

Elsewhere, he added:

the colour scheme I'm looking for is one where the text can remain either Black or NavyBlue, but the background colour can vary to indicate different things. Basically, I need four different background colours that satisfy ALL of the following requirements:

  1. Provides a good contrast to the text colour under all of the various colour blindness options.
  2. Does so equally well under all graphical web browsers, and especially under both Netscape and Internet Explod^Hrer.
  3. Does so equally well under all combinations of operating system for each of those browsers.

Any suggestions relating to this will be very much appreciated.

Brandon S. Allbery recommended simplicity, and gave a pointer to as an example. Audin Malmin felt the background should be dark, and the text light. He bemoaned, "Arrgh... Why does everyone insist on bright backgrounds? A CRT is not a piece of paper, it's an illuminated display, bright backgrounds just mean that everyone has to squint to read it... The text is what is important, it should call attention to itself...the background should be just that, in the BACKGROUND, minding it's own business."

David Ford replied, "it's proven that reading black text on a white background is easier on the human eye than white text on a black background. ask your local opthamalogist"

Alexander Viro replied, "Umhm... While you are at it, don't forget to ask him about the effects of flicker and their dependency on the average brightness. It's more than enough to outweight the effect you are refering to. CRT (any CRT) does flicker. While the normal vision can accomodate to 70-odd Hz flicker with no problems _if_ the large part of the field is dark. Make it bright and you will seriously increase the strain. It's enough to turn marginally tolerable situation into hell. I.e. people who have bad vision or bad lighting or have to look at the screen for many hours or have monitors with lower frame frequency, etc. will be very unhappy about the thing. And those things are pretty common ;-/ (and do correlate, BTW)."

Tim Towers added:

remembering back to when the BBC Micro was released in the UK. and the explanation of why it flew in the face of conventional wisdom by providing white writing on a black background. Vt100's, DOS windows and linux terminals do likewise whilst X11/MS windows have the reverse. I expect people want what they're used to, and the easiest way to make a "window" acceptible as the replacement of a piece of paper is to make it look like one.

I find light writing on a dark background to be easier.

Its cheaper to make books with black writing because white ink is harder to make (It has to be opaque, whereas black ink only has to stain).

Alan Cox added, "Remember also that every single computer using a TV by design and thus with stronger visual constraints used black writing on pale background." Several people jumped on this statement. David Weinehall pointed out that the C64 had light blue on dark blue; and the C128 had light green on dark green.

Okay, back to the kernel.

6. Some Explanation Of The Kernel Development Process

19 Oct 1999 - 25 Oct 1999 (63 posts) Archive Link: "PATCH 2.3.23 pre 2 compile fixes"

Topics: Backward Compatibility, Development Philosophy, Disks: IDE, Disks: SCSI, Microkernels, Networking, PCI, Real-Time, Version Control

People: Linus TorvaldsDonald BeckerRichard DynesDavid S. MillerGerard RoudierMartin DaleckiAlan CoxIngo Molnar

Alan Cox and Donald Becker were discussing some of Donald's PCI work, and at one point, Linus Torvalds interrupted with some criticism:

Don't make a monster. Make something that is simple, and expand on it as needed. I do not like the way you usually sit on the code until you are happy, and then release it in one big bunch without any input from others.

I prefer the incremental approach.

Donald replied:

I don't "sit on the code". The process is far from opaque. The updates are describe on web pages, mailing lists and hypermail archives. Yes, many private test versions are sent to individuals and then have limited alpha releases. I do seek to minimize the number of "real" releases, so I don't have to track a zillion released driver versions.

But I wonder why I put in the effort -- all of my work is thrown away when you accept random, tested-only-on-one-machine patches from anywhere, and then refuse to put in the tested, updated drivers.

Linus replied:

Donald, your latest "tested" drivers have been a QA disaster, and you should know it.

The reason your updated drivers don't get included in the latest kernels are _very_ simple:

The "not send timely patches" thing applies to the PCI code in question. I have never EVER seen a patch from you to implement the PCI search function. Not now, not a year ago, not EVER.

And then you complain when I apply patches that ARE sent to me? Get a grip, Donald.

Regarding Linus' "timely patch" argument, Donald replied:

From my side, I send an updated driver, you report "it didn't work for someone" (and that person never sent a problem report to the driver mailing list), and it doesn't go in.

I'm trying to save you work, and have a broader early-test base, by having a separate test process that doesn't require people to update to the latest, probably-broken development kernel. You should *expect* that a driver has a three or six month old date on it.

Richard Dynes defended Donald and tried to find a middle ground, saying that Donald's methods could be a little more transparent, but that Linus shouldn't therefore go around applying other people's patches without the maintainer's acceptance. He concluded with, "If tulip (and other drivers) aren't being maintained, then say so. But AFAIK that isn't the case. Donald's the maintainer. He should be backed up."

Linus replied, "Donald is not the maintainer as far as I am concerned, because I never see any maintenance. He's the AUTHOR, but that is not necessarily the same thing (being the author does mean that he automatically gets the right to be the maintainer if he wants to)."

Linus added this explanation of his processes:

I basically never EVER search for patches on the net. If they don't come to me in email, I take that to mean that the author isn't interested in getting them into the standard kernel. The end.

And if the author cannot be bothered to send me timely updates, the author also looks damn silly if he then complains that I don't use his code.

Note the "timely". It's not enough to send me the patch once in a blue moon (Donald doesn't do even that - even though I've asked him face to face, and he always says he will). Again, if the author cannot be bothered to MAINTAIN his patches, it's not worth my time even trying to do it for him: not because I'm a lazy bastard (which I am), but simply because especially with drivers I need help with maintenance, and an old unmaintained driver with known problems is better than a new unmaintained driver with unknown problems.

For example, I do not generally apply large patches. Ask Alan Cox, for example, who does a hell of a good job of maintaining a large portion of the kernel (hats off to Alan), and then ALSO does a hell of a good job in splitting the patches up in their original components.

When I get something like the current "ac" patches from Alan, I don't get just one email with the patch - I _literally_ get about 50 patches that are independent - one per driver, one per subsystem, one per new feature like PCI detection. And then I apply about 48 of them, and explain to him about the two I didn't like and why I didn't apply them.

Most people don't see that kind of code maintenance - they see the full "ac" patch-set, and then some time later they see the patches in my kernel. And sometimes people wonder why the end result isn't the same as the "ac" tree - now you know why. It's either because I decided to not apply a part of it, or (quite often) it's because Alan didn't even send part of it to me, because he knew it had some other issues.

Donald replied:

In August you rejected all of the then-current updates because they had visible support for multiple kernel-support. You wanted all backwards compatibility code removed. Based on that I spent several weeks changing and testing over a dozen PCI drivers so that the base versions had support for 2.3 only. Now those are changes are rejected.

I spend more than half of my working time working on Linux drivers, including the time-consuming, boring work of answering email and tracking down usually-spurious bug reports. I've done it in a way that minimizes your workload, and makes the updates and new drivers immediately available to those with stable kernels. And now I find out that you don't consider me to be involved with the Linux kernel work.

Linus replied:

Donald, read my emails.

They have NOT been rejected.

They haven't even been CONSIDERED. There's a big difference there.

And the reason they haven't even been considered is that you have not sent in any patches to me. I have not seen the patches, I have only seen the results of Alan applying the patches to the ac tree and various bug-reports.

And I will repeat my rule: I do not apply large patches with many separate changes. I apply patches that do one thing (like implement a PCI search function).

David S. Miller came in at this point, responding to Donald's statement that his (Donald's) method of semi-closed maintenance would minimize Linus' workload. David said:

As Linus has described, and what you seem to have problems understanding, is that _this_ very technique _maximizes_ Linus's workload.

This seems to be the source of your confusion. Batching up sets of fixes over a long (ie. not timely) period of time, and then sending them all at once for integration is about the worst thing you can do.

Let me give you two situations to show you why this is true:

situation 1) You fix a bug today, you send Linus just that small change to one specific driver, to fix that bug.

This change is fine and nobody complains of breakage. You run through this process about 3 or 4 times.

On the 5th change, again small and specific and incremental, Linus and others note that people begin to complain of problems. These people also proclaim that before this 5th change went in things were perfectly fine for them.

situation 2) You spend 4 months, and fix dozens of bugs, and also have reworked major sections of code in a driver to support new cards, or do whatever. You send one big fat patch to Linus after this 4 month period, many users complain that the driver suddenly stops working for them in a big way.

In situation #1 Linus needs only to back out patch #5 or seek a remedy from you for that specific change, the fixes in patches 1 through 4 are not backed out and not lost. Whereas in situation #2 he has to back the whole thing out, and all the work is gone until things are remedied, and you'll probably repeat the cycle as other bugs are found whilst you keep resubmitting this huge patch, and eventually Linus (like right now) will get disgusted with the whole mess.

Now can you see why your "external from the main tree for months" development process sucks and causes grief for everyone?

To David's 'situation 1', in which a single bug is fixed and sent to Linus, Donald replied:

That's almost never the case.

For drivers that support many cards I usually have ten different modifications outstanding, each one to attempt to fix a specific reported problem. Most changes either don't fix the problem, or are known to cause other problems. It's often a week or two before a test versions is reported on, so many times there is no way to serialize the changes.

To Donald's "that's almost never the case," Linus replied, "BINGO! And that's what we're complaining about."

And to Donald's statement about having ten modifications outstanding, Linus went on:

Again, THIS is the problem. They shouldn't be outstanding. They should be out there, in the standard kernel. If they work, they work. If they don't, we find out. And if they don't work, then others can at least help. Right now, if they don't work, people usually don't even know, because the people for whom the old driver is fine will never even bother to try your internal test-of-the-day-driver.

Resulting in that when you _do_ think it's fine, it isn't. And at that point it's too dang hard to tell exactly what it was that broke.

The problem may be that you want to maintain a heck of a lot of drivers, and maybe you should try to delegate a bit, so that you can give the drivers you _really_ want to work on the priority they deserve.

David also replied to Donald, with:

Think how much faster, and more voluminous, reporting you would receive if your changes went into the main trees on a timely basis?

You can have the largest test lab, you can have thousands or millions of cards to verify things on, but it will never be any match for the cast of wacky configurations that users have who actively test and beat on the official release and pre-patch kernels done by Alan and Linus have.

And Linus added:

Think of me as CVS with a brain and with some taste. Nothing more, nothing less. Development is done all over the world, and something has to keep it reasonably coherent, so that when people want to find a kernel that compiles most of the time and does what they expect, they can do so.

Most of the stuff is completely independent of each other, and you can live in your own microkernel universe 99% of the time. But having a central repository means that when global changes are needed, they CAN be done. It's painful, and it breaks code, but at least it is technically possible - which it wouldn't be with everything scattered all over.

Elsewhere and earlier, Gerard Roudier replied to David's two situations and the idea of backing out broken changes, with:

My opinion is that a driver bug should not be backed out, given maintainer being known to be a serious guy, unless the maintainer agrees with, or ask for that. Anyway, a bug must be fixed and delaying it a long time is a bad option, in my opinion.

Note, that if it ever appears that the maintainer did the wrong thing, I trust him to understand how he has been wrong and change the way it will deliver changes for further updates. A maintainer that did waste time of users because he delivered broken stuff when this could be avoided is normally extremally sorry of that, and I am sure Donald has been so if this happened with some of his driver versions.

Just do not backout Donald's patches that seems broken and I bet you that everything will be just fine, or at least not worse that other breakages that sometimes occur during kernel development.

Linus replied:


It's not about "seems broken"

It's about the issue that there are real and definite bugs, and if a lot of things changed there is no good way to find out exactly what change caused the problem - especially not with the problem popping up for people who do not necessarily know C (or the device) enough to make a informed judgement other than "version X works for me, version Y does not".

The whole point of open source is to expose the development, and NOT have the mentality that "it will be fixed in the next release". There should be many small incremental releases, because whatever Donald or others say, especially with drivers you are often in the situation that you cannot from looking at the source see whether something is broken or not.

So it needs to be released often, and TESTED often. Which implies that the test-drivers should be part of the standard development kernel, because if they aren't, they aren't going to get very wide testing.

For example, what has happened multiple times is that the 1% for whom some particular old network driver does not work will try out Donalds new drivers, and what do you know? It works for them! And people think that that means that the new driver has to be much better than the old one, right?

Wrong. The new driver is NOT necessarily better at all. Not only has it been tested by much fewer people, it has been tested by a SELF-SELECTED group of people. Which may mean that the new driver fails horribly for a lot of people where the old driver was fine - because the new driver effectively has ZERO testing for common hardware that worked fine with the old driver.

This is not worth discussing further. Timely incremental changes are just so OBVIOUSLY better to anybody who has done any real maintenance that the argument is pointless. It's true in non-Linux settings too - why do you think commercial software companies have regression tests and a large testbed of different machines that are always active?

If Donald doesn't do the nice incremental patches, then somebody else will end up doing them. But that also means that Donald loses the right to then complain about others doing the work that he somehow considers "his".

Martin replied, "Why the hell do you feel responsible yorself? Let it in and forward everything to Donald. (procmail could even do it...) That's what maintainers are for."

Linus replied:

I have occasionally forwarded stuff to Donald. More often, I forward things to Alan and Davem, simply because they tend to know networking issues well, and I tend to get more response out of them ;)

And no, I _don't_ feel responsible for specific device drivers, and sure, I could take the approach that "what do I care if a driver is broken?". And in fact, without a maintainer, that is what I often end up doing, although it it turns out to be a problem (it isn't always) I try to prod people who are silly enough to send me patches to maybe become a maintainer..

But what would happen if all the drivers were somewhere out in web-land, and to find the drivers for your random combination of hardware you'd have to search five different web-sites from five different maintainers? Do you really think a system can survive that way and find more users? And ever be tested as a "sum of all parts"? I don't.

So in real life, you need to have a "standard base" that includes all the stuff most people are reasonably expected to use. It doesn't include some of the more esoteric stuff (hard realtime, specific drivers for hardware that exists only in rather special places etc), but it certainly has to have drivers for things like a random tulip card, would you not say?

And maintaining that "standard base" is what I do. Others do it too, notably SuSE and Redhat. Much of that "standard base" is based on their work, in addition to the obvious work by the actual people who create the actual drivers and new features.

But exactly because Linux is _NOT_ a "one entity does everything" proposition, it is NOT the case that I go out on the net and find everything I want to have in the full package. I very much depend on people like David Miller, Alan Cox, Ingo Molnar, and a hundred other people who not only maintain their own subsystems, but also help me in maintaining those subsystems as part of the larger whole.

The problem in question is when a maintainer maintains the subsystem, but does not try to help in maintaining the whole. See? Instant disconnect.

The conversation went on a bit more in the same vein, and then moved on to a technical discussion of splitting up the tulip driver into several smaller drivers that each handled some of the many tulip chips out there. Richard Dynes argued that keeping support for all chips in a single driver resulted in a huge source tree.

Linus replied:

Note that we've had that happen before for other drivers, and the sane way to handle it tends to be to split the driver up.

For example, see the IDE driver. It has core IDE functionality, but then multiple "sub-drivers" to do tuning etc. That may not be required for the tulip driver, but if the driver is starting to look fragile, then by all means at least give it some serious thought as an option. Often source duplication is a much lesser evil, and the evil of trying to handle everything with one approach can be a complete disaster.

Remember: the UNIX philosophy is to NOT try to create the "GNU Emacs of drivers". Do one thing, and do it well, even if it means that you need to have another driver for a card that isn't really all that similar any more.

See how the ncr53c7,8xx SCSI driver is handled: there are actually two drivers, where the newer driver only handles the newer cards, and as a result BOTH drivers are simpler (and the newer driver can be noticeably faster because it doesn't have to worry about issues that only happen with the old and arguably broken cards).

Splitting a driver tends to mean that the support for old cards stagnates. But that's fine - old cards do not change over time, and it makes it easier to change both the new and the old driver without having to worry about breaking the "other" one, because they are now independent.

For example, to just point out that this is not a problem just for network drivers: the same thing is definitely apparently happening with the PCMCIA host controllers. They are all just "similar enough" that one driver tries to handle them all, but as a result that one driver is quite fragile, and when somebody tries to fix an old ISA card that can break cardbus support and vice versa.

It's happened to many drivers over time, and if somebody really is willing to try to clean it up, I just want to say that at least as far as _I_ am concerned, I don't think it is a bad idea to have several drivers for similar hardware. You can try to share as much code as you reasonably can (see for example the 8390 driver that does this fairly successfully), without thinking that there has to be ONE driver that handles ALL 8390 cards.

7. Some Discussion Of Interrupt Request Hacking

19 Oct 1999 - 22 Oct 1999 (11 posts) Archive Link: "raw memory & PCI bus access"

People: Alan CoxRogier WolffJamie Lokier

Pavel A. Sher was developing a userland application that would create a /dev/irq interrupt request device, and wait for interrupts by calling select() on that device. He was curious how much time (i.e. the latency) he should expect between IRQ occurrence, and userland application's reaction to that IRQ. Alan Cox replied that the time could be infinite if more than one application was using the device. He suggested handling the IRQ in kernel space, including retrieving the IRQ status and clearing the IRQ state.

Jamie Lokier pointed out that without documentation, clearing the IRQ state could become problematic. He said that in that case, disable_irq() was very helpful. A bit of dissent followed:

Alan replied, "Not if its shared," and Rogier Wolff explained in more detail:

Alan is a bit terse every now and then. In this case, that's because he's been explaining this over and over again.

It is kind of tricky, let me try to explain it.

You have two drivers that share an interrupt. Normally the two drivers both get called when an interrupt happens, they both check if there is something to do for their device, and they both clear the interrupt if their device was the one interrupting.

In this case, we have one real (e.g. disk) device, and one "user-device" on an interrupt. What can happen (but not the first time you try it!: The first time you try it it will work...) is that the user-device interrupts, disables the interrupt, and tries to wake the program. However, this program causes a pagefault which requires the disk. So the disk driver may then reenable interrupts. If it does, this will cause an immidate call of both interrupt routines, but the disk is not yet ready. The user-level driver will then redisable interrupts, to let the user-level program figure it out. Deadlock.

Jamie replied that he had understood all this, but:

My point is that when developing a driver for a device without documentation, where you don't know how to acknowledge the interrupt (yet), this strategy works:

Of course this is slow, prone to failure, won't work everywhere and generally bad. No driver in the kernel rhould even do this.

But for a hack to test something, it does work.

Alan replied, "Its a good debugging aid yes. I've done this myself when trying to get the ESS maestro going and called the IRQ on the timer tick by hand 8)"

8. New Hardware Mailing List; Some Discussion Of The 810 Graphics Chipset

19 Oct 1999 - 20 Oct 1999 (15 posts) Archive Link: "linux-hardware"

Topics: Random Number Generation, Sound

People: Rik van RielBen CastricumAlan CoxMichael CumminsJeff Garzik

Rik van Riel announced the creation of a new mailing list for discussion of Linux hardware compatibility. He said:

now more and more (IMHO off-topic) posts about what hardware to use are starting to appear on linux-kernel and linux-smp, I asked if there would be interest in a new list for the purpose of discussing Linux and hardware.

Since quite a bit of interest has been shown, I created the list as soon as I came back from ALS -- even before I have recovered from jetlag and resulting lack of sleep :)

He explained that the list was, and that to subscribe, anyone can send an email to, with "subscribe linux-hardware" in the body of the message.

Someone pointed out that a lot of driver maintainers were likely to hang out on linux-kernel and linux-smp, while much fewer were likely to monitor linux-hardware, so people would continue to ask their hardware questions on linux-kernel and linux-smp. Ben Castricum replied that linux-hardware would probably be the place where he could tell someone "that you shouldn't buy an Asus MEW board because the on-board sound card (AD1881) isn't supported but also that the on-board intel 810 graphics chipset is not supported."

The new topic was immediately taken up. Alan Cox replied, "The 810 has sound on the chipset. I guess the 1881 is the AC97 codec ? Right now Intel's docs for the 810 almost give enough info for the audio, don't cover the video sufficiently and point blank refuse to cover the random number generator. Other people's chipsets look brighter every day"

Michael Cummins added, "Especially others like Acer Labs International who have Linux patches/drivers on their web site. Some manufacturers must be listening to their customers???"

Jeff Garzik explained, "Intel funded development of the 810 video driver, which has been released to the XFree86 team for inclusion in 3.3.6. Until then you can download the XFCom X server for the 810 chipset at" In a reply to himself, he added, "Also note that apparently the 810 driver can't use more than 1MB of memory without the AGPGART kernel module. (available from glx cvs if nowhere else)"

9. Hardware Debugging

20 Oct 1999 - 22 Oct 1999 (8 posts) Archive Link: "2.2.13pre17 oops in find_buffer"

People: Miquel van SmoorenburgAlexander ViroSimon KirbyDan Hollis

Miquel van Smoorenburg was getting oopsen in del_timer(), select_dcache(), and find_buffer(), depending on the particular 2.2.13preX he was trying out. He replied to himself a day later, explaining that a little program Simon Kirby had sent him had found some bad RAM on his system. Miquel added, "Somehow if you have bad RAM, it is likely to result in oopses in find_buffer or select_dcache - perhaps a symptom to remember (I will)."

Alexander Viro explained, "Both traverse long lists and both do it very often."

Some folks were interested in getting ahold of Simon's little program. Simon replied:

It seems to find bad memory so easily with such a simple program.
*shrug* :)

I've put a slightly updated version up here:

I added a comment, fixed a spelling error and cosmetics in the fprintfs, but didn't change anything else (so hopefully it won't lose its magical powers). :)

Later, he added, "for a machine that is assumed to have a hardware problem, I wouldn't trust memtest.c's output simply if it doesn't print a failure in a few hours...other testers actually have some knowledge of the hardware they're running on built in to them, and so they theoretically should do a better job than mine. *shrug* :) Although, in Miquel van Smoorenburg's case, he had run a DOS memory tester for 48 hours before which didn't turn up anything, so maybe it is worth it..."

And Dan Hollis put in, "If you have ECC ram and a chipset with ECC support you can also run the stuff at And see if any bit errors get flagged."

10. Intel Releases Non-GPL (And Possibly Non-GPL-Compatible) Linux Drivers

22 Oct 1999 - 25 Oct 1999 (8 posts) Archive Link: "Intel's Pro/1000 driver: when/will it make it into the kernel?"

Topics: PCI

People: Jes SorensenJeff GarzikDan Brow

Dan Browning gave a pointer to Intel's Linux 2.2.x driver for the PRO/1000 Gigabit Server Adapter, and asked when this would make it into the kernel. Florian Lohoff pointed out that it was actually impossible because the drivers were not GPLed, and the Intel license was not even compatible with the GPL.

After looking them over, Jes Sorensen added that the drivers were also poorly written. He said, "I wouldn't want to be the person who had to try making it run on non x86 hardware, they didn't even try to make it portable. References PCI device registers directly not using the readl/writel macros as one should, no memory barriers and it expects operations like ptr->foo++ to be atomic - yummy, lotsa fun. Oh and is the hardware really so bad that it requires spin locks to protect the tx queue? (hint, where are the public docs? ;-)"

There was some discussion of Intel's licence. Since it's fairly short, I include it here:

Copyright (c) 1999, Intel Corporation

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  3. Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.


The GPL is also a very interesting document, central to the Free Software or Open Source movement. If you haven't read it five times yet, go do that now.

Jeff Garzik didn't think Intel's license was incompatible with the GPL. Keeping copyright under Intel's name was not an additional restriction (which would be prohibitted by the GPL) since copyright would be maintained anyway, and redistribution of modified sources was allowed by the licence as well.

Jes felt that the copyright statement was an additional restriction. He replied, "Having the license require that you print a vendor's license in accompaning documentation to products being sold is incompatible with the GPL, under which the kernel is distributed."

The discussion ended there.







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.