Kernel Traffic #94 For 20�Nov�2000

By Zack Brown

linux-kernel FAQ (http://www.tux.org/lkml/) | subscribe to linux-kernel (http://www.tux.org/lkml/#s3-1) | linux-kernel Archives (http://www.uwsg.indiana.edu/hypermail/linux/kernel/index.html) | kernelnotes.org (http://www.kernelnotes.org/) | LxR Kernel Source Browser (http://lxr.linux.no/) | All Kernels (http://www.memalpha.cx/Linux/Kernel/) | Kernel Ports (http://perso.wanadoo.es/xose/linux/linux_ports.html) | Kernel Docs (http://jungla.dit.upm.es/~jmseyas/linux/kernel/hackers-docs.html) | Gary's Encyclopedia: Linux Kernel (http://members.aa.net/~swear/pedia/kernel.html)

Table Of Contents

Introduction

Mark Hahn issued a retraction to his statement about George France in Issue�#88, Section�#7� (27�Sep�2000:�Angry Fighting In The ARM Tree) . See that section for the text of the retraction.

Mailing List Stats For This Week

We looked at 1583 posts in 6724K.

There were 409 different contributors. 203 posted more than once. 155 posted last week too.

The top posters of the week were:

1. 2.2 Performance Fixes

24�Oct�2000�-�3�Nov�2000 (20 posts) Archive Link: "Strange performance behavior of 2.4.0-test9"

People: Linus Torvalds,�Alan Cox,�Andrew Morton

In the course of a performance discussion, Andrew Morton recommended that Apache work around a kernel 2.2 slowdown by avoiding the unserialized accept() call, using system V semaphores for serialization instead. For kernel 2.4, the problem didn't exist, so he recommended using the normal unserialized accept() in that case. But Linus Torvalds replied:

No.

Please use unserialized accept() _always_, because we can fix that.

Even 2.2.x can be fixed to do the wake-one for accept(), if required. It's not going to be any worse than the current apache config, and basically the less games apache plays, the better the kernel can try to accomodate what apache _really_ wants done. When playing games, you hide what you really want done, and suddenly kernel profiles etc end up being completely useless, because they no longer give the data we needed to fix the problem.

Basically, the whole serialization crap is all about the Apache people saying the equivalent of "the OS does a bad job on something we consider to be incredibly important, so we do something else instead to hide it".

And regardless of _what_ workaround Apache does, whether it is the sucky fcntl() thing or using SysV semaphores, it's going to hide the real issue and mean that it never gets fixed properly.

And in the end it will result in really really bad performance.

Instead, if apache had just done the thing it wanted to do in the first place, the wake-one accept() semantics would have happened a hell of a lot earlier.

Now it's there in 2.4.x. Please use it. PLEASE PLEASE PLEASE don't play games trying to outsmart the OS, it will just hurt Apache in the long run.

Dave Wagner thought this would leave current Apache users up a stump, since the fix wasn't there currently, and Alan Cox also said, "Do we really want to retrofit wake_one to 2.2. I know Im not terribly keen to try and backport all the mechanism. I think for 2.2 using the semaphore is a good approach. Its a hack to fix an old OS kernel. For 2.4 its not needed." Linus replied that a full backport wouldn't be necessary, they could just do the equivalent of the semaphore around just accept(). While it wouldn't be a generic solution, he felt the alternative would have "old binaries of apache lying around forever that do the wrong thing." Andrew Morton also felt the fix was not too hard, and posted a patch. He added exuberantly, "It's a 16-liner! I'll cheerfully admit that this patch may be completely broken, but hey, it's free. I suggest that _something_ has to be done for 2.2 now, because Apache has switched to unserialised accept()." Linus replied, "This is why I'd love to _not_ see silly work-arounds in apache: we obviously _can_ fix the places where our performance sucks, but only if we don't have other band-aids hiding the true issues." He also had some criticism of the patch, and there followed a brief back-and-forth leading to some revisions.

2. Developer Discussion

31�Oct�2000�-�7�Nov�2000 (14 posts) Archive Link: "USB init order dependencies."

Topics: Disks: SCSI, Framebuffer, USB

People: David Woodhouse,�Russell King,�Jeff Garzik,�Randy Dunlap,�Linus Torvalds

David Woodhouse posted a patch to completely remove the dependency on USB device initialization order. He added, "Personally, I think this fix is less ugly than any of the alternatives I've seen so far." Randy Dunlap replied that he didn't like the patch, and preferred to go back to the way the code had been a few months before, which was to put initialization calls back into the main() function, and making the module_init() function in 'usb.c' only execute if configured as a module (via #ifdef MODULE). This would seem to go against Linus Torvalds' plan as covered in Issue�#87, Section�#5� (16�Sep�2000:�Removing Distinctions Between Modules And In-Kernel Drivers) , and Russell King added, "that breaks the OHCI driver on ARM. Unless we're going to start putting init calls back into init/main.c so that we can guarantee the order of init calls which Linus will not like, you will end up with a lot of ARM guys complaining." Russell asked Linus to pipe up with some kind of statement, but it was not to be. Jeff Garzik replied historically:

Back when some of the initial USB initcall stuff started appearing, there were similar discussions, similar problems, and similar solutions. I was also wondering how fbdev (which needs to give you a console ASAP) would work with initcalls, etc. At the time (~6 months ago?), Linus' opinion was basically "if the link order hacking starts to get ugly, just put it in init/main.c" So, Randy really should be calling the quoted text above "Linus' suggestion" ;-)

Putting a call into init/main.c isn't a long term solution, but it should get us there for 2.4.x... init/main.c is also the best solution for ugly cross-directory link order dependencies. I would say the link order of foo.o's in linux/Makefile is the most delicate/fragile of all the Makefiles... touching linux/Makefile link order this close to 2.4.0 is asking for trouble. Compared to that, adding a few lines to init/main.c isn't so bad.

Russell felt this would lead to quite a few initialization calls going into that file, and "lots and lots" of #ifdefs as well. Jeff didn't think it would be that bad. Elsewhere, Russell also objected, "The problem for ARM is that Linux does a lot of the initialisation for some machines, which basically means the hardware isn't setup for access to the USB device if the USB initialisation was placed in init/main.c (this initialisation is done by the very first initcall on ARM). However, that said, we may be able to get away with only adding hw_sa1100_init() before the USB call, but this is only one family of the ARM machine types."

At this point things started to break down. Russell admitted not quite following the origins of the thread, and Randy also had some trouble understanding what exactly Russell objected to. At one point David tried to explain what Randy had been going for, and Randy apparently mistook David's explanation of Randy's own proposal, as David's.

Finally Randy said, "Sounds like we basically all want the same thing. :)" and the thread ended.

3. Approaching 2.4.0

31�Oct�2000�-�9�Nov�2000 (40 posts) Archive Link: "Linux-2.4.0-test10"

Topics: Disks: SCSI, FS: NFS, Kernel Release Announcement, Raw IO, USB

People: Linus Torvalds,�Rik van Riel,�Paul Mackerras,�Thomas Molina,�Mike Coleman,�Marcelo Tosatti,�Alan Cox,�Randy Dunlap,�Miles Lane,�Jeff Garzik

Linus Torvalds announced Linux 2.4.0-test10, saying:

Ok, test10-final is out there now. This has no _known_ bugs that I consider show-stoppers, for what it's worth.

And when I don't know of a bug, it doesn't exist. Let us rejoice. In traditional kernel naming tradition, this kernel hereby gets anointed as one of the "greased weasel" kernel series, one of the final steps in a stable release.

We're still waiting for the Vatican to officially canonize this kernel, but trust me, that's only a matter of time. It's a little known fact, but the Pope likes penguins too.

Rik van Riel replied, "Lets just hope he doesn't need RAW IO ;)" and Linus rejoined, "Naah, he mainly just does some browsing with netscape, and (don't tell a soul) plays QuakeIII with the door locked."

Rik had gone on to describe the problem with raw IO "being done into a process' address space and the data arriving only after the page gets unmapped from the process." , and Linus had replied:

Yes. But that doesn't count like a "show-stopper" for me, simply because it's one of those small details that are known, and never materialize under normal load.

Yes, it will have to be fixed before anybody starts doing RAW IO in a major way. And I bet it will be fixed. But it's not on my list of "I cannot release a 2.4.0 before this is done" - even if I think it will actually be fixed for the common case before that anyway.

(Note: I suspect that we may just have to accept the fact that due to NFS etc issues, RAW IO into a shared mapping might not really supported at all. I don't think any raw IO user uses it that way anyway, so I think the big and worrisome case is actually only the swap-out case).

Elsewhere, Miles Lane noticed that Linus had not included a Changelog for this announcement, and asked for it. Linus replied:

Sorry. Here it is..

Alan Cox also replied to Linus' initial announcement, saying he thought there were definitely show-stoppers, but that 2.4.0-test10 did seem very close to 2.4.0; and there was some discussion of the various problems.

4. Finer Grain Load Average Calculation

5�Nov�2000�-�7�Nov�2000 (7 posts) Archive Link: "Loadavg calculation"

People: Pavel Machek,�Bert Hubert,�Andi Kleen

Robert A. Yetman was working on a project that relied on automatically invoking new programs when the load average went below a certain threshold. But since Linux calculated the 'loadavg' only once per minute, it was possible for the computer to have up to a minute of wasted time before firing up the new programs. He asked if there were a way to reduce 'loadavg' calculation to 15 seconds, or if there were some other data he should follow instead. There were several replies. Andi Kleen suggested recompiling the kernel with a smaller 'LOAD_FREQ' constant, though he felt this might break other programs. Bert Hubert suggested snooping the source of 'vmstat' to get the same data used in 'loadavg' calculation, and then calculating interim load averages that way. He also suggested calculating how much time was spent in the 'idle' task. Pavel Machek suggested simply 'cat'ing '/proc/loadavg/index.html' and using the data there. He explained that the first three values in that file "are loadavg averaged over different time. Select the right one and you are done."

5. Oops In 2.4.0-test10

5�Nov�2000�-�6�Nov�2000 (10 posts) Archive Link: "Kernel 2.4.0test10 crash (RAID+SMP)"

Topics: Disk Arrays: RAID, SMP

People: Neil Brown,�Linus Torvalds

Someone reported frequent oopsen under 2.4.0-test10. They posted the text of an oops and mentioned that the problem only appeared when the kernel had been compiled for SMP support and used RAID. Neil Brown replied, "It looks like an interupt is happening while another interrupt is happening, which should be impossible... but it isn't." He posted a patch to 'raid1.c', and the original posted reported complete success. Later, Neil explained:

The b_end_io routine that raid1 attaches to io request buffer_heads that are used for resyncing had a side effect of re-enabling interrupts. As it is called from an interrupt context, this is clearly a bug. It allowed another interrupt to be serviced before a previous interrupt had been completed, which is a problem waiting to happen.

In this case, it became a real problem because the first interrupt had grabbed a spinlock (I didn't bother to discover which one) and the second interrupt tried to grab the same spinlock. This produced the deadlock which the NMI-Oopser detected and reported.

He added that he still wasn't sure he'd caught all the occurrences of this particular problem, but that when he felt he had, he'd send the patch to Linus Torvalds.

6. Protecting Processes From 'OOM Killer'

7�Nov�2000�-�10�Nov�2000 (5 posts) Archive Link: "[PATCH] protect processes from OOM killer"

Topics: FS: sysfs, OOM Killer

People: Ingo Oeser,�Chris Swiedler

Chris Swiedler posted his very first patch to linux-kernel, to protect select processes from the 'OOM killer', via setting flags in the /proc directory. Ingo Oeser suggested:

Please base it upon my OOM-Killer-API patch.

http://www.tu-chemnitz.de/~ioe/oom_kill_api.patch

This will reduce your patch to an simple module (but you have to manage refcounting yourself!) and give the user a choice, which one to use.

Elsewhere, under the Subject: [PATCH] oom_nice (http://www.uwsg.indiana.edu/hypermail/linux/kernel/0011.1/0453.html) , Chris posted a nice long patch and announced:

Here's an updated version of the "oom_nice" patch. It allows a sysadmin to set the "oom niceness" for processes, either by PID or by process name. The oom niceness value factors into the badness() function called by Rik's OOM killer. Negative values decrease the chance that the process will be killed, and positive values increase it.

The usage is:

echo [PID|process_name]=oom_niceness > /proc/sys/vm/oom_nice

examples:

echo 418=-10 > /proc/sys/vm/oom_nice

echo netscape=20 > /proc/sys/vm/oom_nice

echo 1=- > /proc/sys/vm/oom_nice

In the first example, the process with PID 418 is 10 times less likely to be killed than it would have been. Likewise, in the second example, any processes named 'netscape' are 20 times more likely to be killed than otherwise. The last example protects the init process from being killed, no matter what.

cating oom_nice will show the current nice values for all processes.

By default the oom_nice proc entry is not world-readable or writable. For security reasons I would suggest that you give good (negative) oom nice values to processes by PID rather than process name. If any process named 'init' is protected, then it's easy for a user to just rename their executable and get around the oom killer.

To test the OOM killer algorithm I also inclued a proc entry /proc/sys/vm/oom_nice_test. On my machine 'cat /proc/sys/vm/oom_nice_test' produces:

"OOM killer would have killed process 516 (csh) with 496 points"

Compiling oom_kill.c with DEBUG defined and cating oom_nice_test will print out the points for all processes, including their oom_nice values and how they affected the final points.

7. Virtual Memory Problems In 2.2

7�Nov�2000�-�8�Nov�2000 (6 posts) Archive Link: "continuing VM madness"

Topics: USB, Virtual Memory

People: Mikulas Patocka,�Alan Cox,�Michael Rothwell,�Andrea Arcangeli

Michael Rothwell got swapping errors in 2.2.16 with USB patches applied, and at one point Mikulas Patocka said, "Sadly it is not a bug but a VM misdesign (and people are just making different workarounds that more or less work). I believe that this solution will break again, as it happened in 2.2.15 and 2.2.16. Go back to Linux 2.0 - it has the swapper implemented correctly :-)" Alan Cox replied, "2.2.15->16 was the major transition in getting stuff right. 2.2.18 should be pretty reasonable. With Andrea's additional patches its quite nice. If we add page aging then in theory it'll be as good as 2.2 but in practice who knows" Andrea Arcangeli agreed with this assessment, but Mikulas pressed with, "What about the possibility that kernel shoots processes when the machine is receiving too much packets and runs out of atomic memory? It didn't seem to go away in 2.2.16. 2.0 behaved correctly in this case." There was no reply.

8. Getting Started With Kernel Code

7�Nov�2000 (4 posts) Archive Link: "How to study linux kernel?"

People: Jeff V. Merkey,�Rik van Riel,�Alan Cox,�Dave Jones

Su Hwan Hwang asked how to get started studying the Linux kernel. Dave Jones and Rik van Riel pointed him to http://kernelnewbies.org/, and Jeff V. Merkey also suggested:

  1. Go to http://www.linuxdoc.org and read the HOWTOs on the Linux kernel.
  2. Buy a coffee maker and 3000 lbs. of coffee beans. You will also need a coffee grinder to grind the beans so you can stay awake around the clock reviewing code.
  3. Grow a very thick skin and expect "baptism by fire" when asking any question on this list.
  4. Be very nice to Alan Cox. He will answer questions and if you are really wanting to help out.
  5. Buy a copy of "Unix - a practical implementation" and read it.
  6. "Linux Kernel Internals" is another great book, get it -- the basics are there.
  7. Grow a ponytail -- view it as your telepathic antenna to other Linux Kernel Developers.

9. Big IrDA Changes Accepted Into 2.4; Linus On Patch Submissions

7�Nov�2000�-�10�Nov�2000 (11 posts) Archive Link: "[RANT] Linux-IrDA status"

Topics: Ioctls, MAINTAINERS File, Networking, Version Control

People: Jean Tourrilhes,�Russell King,�Linus Torvalds,�Michael Rothwell,�Randy Dunlap,�Jeff Garzik

An old story retold. Jean Tourrilhes reported that the IrDA stack (http://linux24.sourceforge.net/) was non-functional, and could even crash the kernel. He ranted:

Most might wonder why the IrDA stack is in such state of disrepair. Is there no maintainers and nobody who cares ?

The truth is that every 2 month, Dag Brattli, the official maintainer of the IrDA stack (see MAINTAINERS), collect all our patches and send the latest official Linux-IrDA patch to Linus.

And every time the patch never materialise in the Linux kernel. Of course, Dag never receive any answer, so doesn't know why his patches are going directly to /dev/null.

As we fix more bugs, the official IrDA patch get growing and growing. The patch that Dag sent last week to Linus was 320k. It has slowly accumulated over one year :-(

On the other hand, what never cease to amaze me is that some patches to the IrDA code gets into the kernel. Some of those patches make things better, some make things worse. Those patches certainly don't come from Dag or any of the most active Linux-IrDA hacker, and none of us see those patches in advance so that we get a chance to comment on them and test them.

I guess that some people have trouble reading the MAINTAINERS file :-( Or maybe there is another maintainer for the IrDA stack and none of us knows about it.

I think for us the only solution is to ignose what's happening in the 2.4.X kernel and have Dag maintaining Linux-IrDA separate from the kernel. I don't see why Dag should take the effort to send regular patch to Linus if they get ignored.

In other words, the chances to have IrDA working in kernel 2.4 are *very* slim at this point.

Jeff Garzik pointed out that this was very similar to the ISDN situation of 1999. This was first covered in Issue�#14, Section�#5� (6�Apr�1999:�ISDN Difficulties Going Into The Main Tree) , where at first it looked as though ISDN would not have any trouble getting into the main tree. But by Issue�#19, Section�#2� (10�May�1999:�ISDN And The Kernel) other problems had crept in due to the large size of the ISDN patches. By Issue�#30, Section�#11� (3�Aug�1999:�Imminent Feature Freeze) Issue�#31, Section�#9� (3�Aug�1999:�Code Freeze; ISDN Perennial Lateness) it seemed that Linus Torvalds was ready to accept a single large patch as the only way to sync up with ISDN given the situation, though he wanted to make sure such a thing never happened again. But by Issue�#33, Section�#6� (22�Aug�1999:�ISDN Still Unsettled) it seemed ISDN had still not made it into the kernel. By Issue 35 however, although no direct reference was made, it seemed as though ISDN had finally made it into the kernel, and the entire situation passed into Linux lore.

Jean replied to Jeff's observation, saying that he was familiar with the ISDN legend, but "The *BIG* difference is that Dag has always sent his patch to Linus from the very start, when it was still small, whereas ISDN did stay on their patch from a long time." And Russell King also added:

I have even tried sending Linus small self-contained obviously correct patches for IrDA, but they just don't go in, and, dispite me asking several times for an explaination why they are not, I've never received an answer.

Its almost as though Linus is no longer interested in kernel support for IrDA. I really don't know why Linus doesn't drop the whole IrDA stuff out of the kernel if he's not willing to let people maintain it.

He and Michael Rothwell asked Linus to explain why he never accepted IrDA patches, and Linus replied that the patch hadn't been submitted as often as folks claimed, and he also hadn't gotten any explanation of what the patch did. He added, "at least the last patch I saw just the first screenful was so off-putting that I just went "Ok, I have real bugs to fix, I don't need this crap"." But Michael replied, "I'm not sure what you're saying here. It seems that the pople writing the IrDA code have gotten no feedback from you as to why their patch is never accepted -- could you clarify? They're apparently putting a lot of effort into writing and fixing IrDA for Linux, and have become very discouraged at the lack of feedback. "Crap" the code may be, but it seems like it would be a good idea to at least say something substantive about why their code keeps getting rejected."

Linus explained:

There's one _major_ reason why things never get accepted:

CVS trees

I'm not fed patches. I'm force-fed big changes every once in a while. I don't like it.

I like it even less when the very first screen of a patch is basically a stupid change that implies that somebody calls ioctl's from interrupts.

When I get a big patch like that, where the very first screen is bletcherous, what the hell am I supposed to do? I'm not going to waste my time on people who cannot send multiple small and well-defined patches, and who send be big, ugly, "non-maintained" (as far as I'm concerned) patches.

I'm surprised Alan rants about this. He knows VERY well how I work, and is (along with Jeff Garzik and Randy Dunlap) one of the people who are very good at sending me 25 separate patches with explanations of what they do.

Basically, if you send me a big patch with tons of changes, how the hell DO you expect me to answer them? Does anybodt really expect me to go through ten thousand lines of code that I do not know, and comment on it? Obviously not, as anybody with an ounce of sense would see.

So what choice do I have? Apply them blindly?

Quite frankly, I'd rather have a few people hate me deeply than apply stuff I don't like. If I just start blindly applying big patches, I can avoid nasty discussions. But I'd rather have people flame me. Maybe some day people will instead start sending me smaller commented patches.

I'm NOT going to do other peoples work for them. If people can't be bothered to send me well-specified patches ESPECIALLY now that we're close to 2.4.x, then I can't be bothered to apply them,

Live with it. Hat eme all you like. I do not care. Th ething I care about is not letting too much crap through unchecked.

In a different post, he pointed out that the only message he'd received from the IrDA people in recent weeks had been a 10000-line patch with the explanation, "Fixes IrDA in 2.4". He said:

That's it.

ONE message during the last month. ONE huge patch. From people who should have known about 2.4.x being pending for some time.

10,000+ lines of diff, with _no_ effort to split it up, or explain it with anything but

"Fixes IrDA in 2.4"

and these people expect me to reply, sending long explanations of why I don't like them? After they did nothing of the sort for the code they claim should have been applied? Nada.

Get a grip.

Russell again asked why smaller, "obviously correct" had also bee rejected, but there was no reply. And Michael suggested Linus tell the IrDA people about those problems; but there was also no reply.

Elsewhere, Jean spent a day splitting the big patch into smaller pieces, which he attached in a private email to Linus. Linus replied publically with an interesting description of his patch-handling activities:

When I say multiple mails, I mean multiple mails. NOT "26 attachements in one mail". In fact, not a single attachment at all, please. Send me patches as a regular text body, with the explanation at the top, and the patch just appended.

Why?

Attachements may look simple, but they are not. I end up having to open each and every one of them individually, remembering which one I've checked, save them off individually, remembering what the file name was, and then apply them each individually.

See the picture? Attachements are evil.

In contrast, imagine that you (and everybody else) sends me plain-text patches, with just an explanation on top. What do I do?

I see the explanation immediately when I open the mail (ie when I press the "n" key for "next email").

I can save it off with a simple "s../doit", which saves it in _one_ "doit" file appended to all the other pending stuff. Alternatively, I can skip it, or leave it pending, and let the _mail_software_ remember whether I answered that particular patch.

I can reply to it individually, and that patch (and nothing else) will be automatically set up for the reply so that I can easily quote whatever parts I want to point out.

I can apply all the patches that I have approved with a single

patch -p1 < ~/doit

without having to go through them individually.

None of the above works with attachments.

In Jean's same email, Jean had said, "If somebody send you 1000 lines in one go or as 100 times 10 lines, it doesn't matter, it is still 1000 lines of code to read through. Even small patches can be totally obscure for somebody not familiar with the code and what it is supposed to do." To which Linus replied:

You are WRONG.

10 emails with 1000-line patches are _much_ easier to handle. I can clearly see the parts that belong together (nothing is mixed up with other issues), and I can keep the explanation in mind. I do not have to remind myself what that particular piece is doing.

It has other advantages too. With a single 10000-line patch, if I don't like something, I have a hard time just removing THAT part. So I have to reject the whole f*cking patch, and the person who sent it to me has to fix up the whole thing (assuming I'd bother answering to it, poitning out the parts that I don't like from the large patch, which I will not).

With 10 1000-line emails, I can decide to apply 8 of them outright, apply one with comments, and discard one that does something particularly nauseating. And I can much more easily explain to the submitter which one I hate, without having to edit it down.

See?

Later, after receiving many small patches from Jean, Linus said, "Ok, thanks to the work of Jean, everything seems to be applied now."

10. 64-Bit 'printk()' in 2.2

7�Nov�2000�-�8�Nov�2000 (6 posts) Subject: "Linux 2.2.18pre20"

Topics: Kernel Release Announcement

People: David Weinehall,�Michael Rothwell,�Alan Cox

Alan Cox announced 2.2.18pre20, inviting folks to break it if they could, or it would be released as the official 2.2.18 kernel. Michael Rothwell requested that 64-bit printk() be included before the official 2.2.18 came out, and David Weinehall exhorted as well, "Please consider this one Alan, if not for v2.2.18, then at least for v2.2.19pre1." Alan asked for an explanation of why it was needed, and Michael replied:

To print 64-bit debugging output on 32-bit machines. I personally need it to aid with development of a 64-bit filesystem. We're maintaining our own 2.2.17 patched kernel here, but I figure that other people can make use of 64-bit printk in their efforts as well.

Perhaps a better question would be, why reject it? 2.4 supports 64-bit printk, right? It would be nice to have it on 2.2 as well, as it may be a while before 2.4 is widely used in production machines.

There was no reply, even after Michael replied to himself a day later, with the one-word post, "Hello?"

11. The US Presidential Election

7�Nov�2000�-�8�Nov�2000 (4 posts) Archive Link: "national problems"

Topics: Patents

People: David Feuer,�Andre Hedrick,�David Schwartz,�David Weinehall,�Zack Brown

David Feuer asked, "Now that it seems that George Bush has won the presidency, I am wondering whether Linus and other members of the free software community intend to leave the U.S. and go to more friendly places. Imagine what G.W. Bush is going to do to export controls, free software, copyright law, patent law, etc.... Be afraid." Andre Hedrick replied, "You are an IDIOT! This is a place to develop the kernel not breed your simple small mind!" And David Schwartz also said of David, "Which is more probable? That the author of this is an idiot or that 48% of the American voting public are idiots? Actually, I'm not sure." David Weinehall concluded the thread, with:

Well... I don't think idiocy can be defined purely by who people voted for. From my political view, more than 90% of all Swedes would be idiots in that case :^)

Oh, and for those who didn't know, Bush hasn't won (yet); the votes are getting recounted in Florida as less the difference was about 700 votes.

(ed. [Zack Brown] That last was posted on November 8, just a day after the election. As this article is written, it is November 19 and the election remains undecided. Why did I cover this thread? A little historical context. Plus it's short. No flames please.)

12. Porting Between Versions

8�Nov�2000 (2 posts) Archive Link: "When I use kernel-2.4.0-test10,I got this problem on my server."

Topics: Forward Port, I2O

People: Alan Cox

Someone reported kernel panics in 2.4.0-test10, and Alan Cox replied, "My fault. 2.4test contains a forward port of some 2.2 experimenting that was backed out." He also included a patch to 'megaraid.c', to remove some code that attempted to prevent crashes on boot with AMI cards configured for I2O. There was no reply.

13. Porting Drivers From 2.2 To 2.4

9�Nov�2000 (4 posts) Archive Link: "Porting Linux v2.2.x Ethernet driver to v2.4.x?"

Topics: Ioctls, Networking, PCI

People: Steven Snyder,�Jeff Garzik,�Randy Dunlap

Steven Snyder wanted to port a 2.2 driver to 2.4, and asked, "Are there any documents which describe the differences in the device driver models (particularly PCI and Ethernet) of the 2 kernel versions?" Jeff Garzik replied:

Not all in one place. Read:

Documentation/pci.txt

Documentation/DMA-mapping.txt

Documentation/IO-mapping.txt

and the attached document, regarding netdevice member locking rules.

Your best reference is other PCI ethernet drivers. grep for 'pci_module_init' in drivers/net/*.c of the most recent 2.4.x kernel, for a good start.

Also... before you start thinking about gunking-up your driver with all sorts of backwards-compatibility code... remember that most 2.4.x drivers can easily be backported to 2.2.x with a few magic macros and static inline functions. I have an example of this sort of thing at http://gtf.org/garzik/drivers/kcompat24/

The attachment read:

Network Devices, the Kernel, and You!

Introduction

The following is a random collection of documentation regarding network devices.

struct net_device synchronization rules

dev->open:
Locking: Inside rtnl_lock() semaphore.
Sleeping: OK

dev->stop:
Locking: Inside rtnl_lock() semaphore.
Sleeping: OK

dev->do_ioctl:
Locking: Inside rtnl_lock() semaphore.
Sleeping: OK

dev->get_stats:
Locking: Inside dev_base_lock spinlock.
Sleeping: NO

dev->hard_start_xmit:
Locking: Inside dev->xmit_lock spinlock.
Sleeping: NO

dev->tx_timeout:
Locking: Inside dev->xmit_lock spinlock.
Sleeping: NO

dev->set_multicast_list:
Locking: Inside dev->xmit_lock spinlock.
Sleeping: NO

And Randy Dunlap also recommended:

Search the lkml archives. Here are 2 instances to find:

from jamal, 2000-jan-6: [ANNOUNCE] SOFTNETing Network Drivers HOWTO

from kuznet, 2000-feb-14: "softnet" drivers: an attempt to clarify

from dave miller, 2000-feb-9: new network driver interface changes, README http://www.uwsg.indiana.edu/hypermail/linux/kernel/0002.1/0408.html

from jamal, 2000-feb-10: ditto http://www.uwsg.indiana.edu/hypermail/linux/kernel/0002.1/0461.html

from dave miller, 2000-feb-12: ditto

14. Devices Incorrectly Reported Open Multiple Times - And It's Not A Bug

9�Nov�2000 (10 posts) Archive Link: "Module open() problems, Linux 2.4.0"

People: Jeff Garzik,�Richard B. Johnson,�Olaf Titz,�Brian Gerst,�Christoph Hellwig

Richard B. Johnson reported that in the latest development kernels, 'lsmod' showed a device open twice when in fact it was only open once. Brian Gerst replied that this was harmless, and explained that it was part of the proper way for modules to handle locking. Richard replied that even if harmless, it was still a bug, and had the potential to cause a lot of problems on large systems. He explained that in his case, it took two people a long time to track down which process actually had a device open. If reporting had been handled correctly in the kernel, he said, that would not have been necessary. Brian, Christoph Hellwig, and Jeff Garzik explained that the module usage count had no relation to the number of open devices, and that the only significant value was non-zero (in which case the device was in use) or zero (in which case the device was not in use and the module could be unloaded. Jeff added, "The kernel is free to bump the module reference count up and down as it pleases. For example, if a driver creates a kernel thread, that will increase its module usage count by one, for the duration of the kernel thread's lifetime." He explained that there had never been a correlation between the reported number and the actual usage count, but that pre-2.4 code just happened to work that way. Richard replied:

Look at what you just stated! This means that a reported value is now worthless.

To restate, somebody decided that we didn't need this reported value anymore. Therefore, it is okay to make it worthless.

I don't agree. The De-facto standard has been that the module usage count is equal to the open count. This became the standard because of a long established history.

Jeff And Christoph explained that the value had never corresponded to the open count, and Olaf Titz added, "Right. The module "use count" is not a use count, it's a lock count." As these folks explained, the apparent correspondence in old kernels had been completely accidental, and was not to be relied on, as the current situation showed. Richard did not reply.

15. Config File Documentation Fixes

12�Nov�2000�-�13�Nov�2000 (2 posts) Archive Link: "ppp.txt"

Topics: Networking

People: Mircea Damian,�Francois Romieu

Mircea Damian reported, "I just want to say that the file '/kernel-traffic/Documentation/networking/ppp.txt' (as it is mentioned in Configure.help at CONFIG_PPP option) does not exists." Francois Romieu posted a patch, to refer only to the PPP HOWTO instead of both the HOWTO and '/kernel-traffic/ppp.txt'. End Of Thread.

16. OOM Killer Success

13�Nov�2000 (3 posts) Archive Link: "reliability of linux-vm subsystem"

Topics: OOM Killer, Virtual Memory

People: Erik Mouw

Someone posted some code which, when run repeatedly as a normal user process, seemed to give a Denial Of Service. He observed that the program would use up all available memory until all instances were killed by the VM subsystem. Until that happened, however, the system became unusable. As far as the poster knew, the system should remain usable even at such high load. Erik Mouw replied that this all seemed like normal behavior to him, and that it even showed that the 'OOM Killer' worked, since it had killed off the proper processes when the system ran out of memory. He concluded, "If you don't enforce process limits, you allow a normal user to thrash the system."

Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.