Kernel Traffic #91 For 30 Oct 2000

By Zack Brown

linux-kernel FAQ (http://www.tux.org/lkml/) | subscribe to linux-kernel (http://www.tux.org/lkml/#s3-1) | linux-kernel Archives (http://www.uwsg.indiana.edu/hypermail/linux/kernel/index.html) | kernelnotes.org (http://www.kernelnotes.org/) | LxR Kernel Source Browser (http://lxr.linux.no/) | All Kernels (http://www.memalpha.cx/Linux/Kernel/) | Kernel Ports (http://perso.wanadoo.es/xose/linux/linux_ports.html) | Kernel Docs (http://jungla.dit.upm.es/~jmseyas/linux/kernel/hackers-docs.html) | Gary's Encyclopedia: Linux Kernel (http://members.aa.net/~swear/pedia/kernel.html)

Table Of Contents

Mailing List Stats For This Week

We looked at 1093 posts in 4317K.

There were 371 different contributors. 179 posted more than once. 143 posted last week too.

The top posters of the week were:

1. Some Developer Interactions Toward 2.4

6 Oct 2000 - 16 Oct 2000 (7 posts) Archive Link: "tty_[un]register_devfs putting 3K structures on the stack"

Topics: User-Mode Linux

People: Jeff DikeTheodore Y. Ts'o

Jeff Dike reported a stack-overflow, impacting the UML (user-mode Linux) project; Theodore Y. Ts'o described a possible fix, and Jeff asked, "Are you willing to consider this a critical bug that deserves to be fixed before 2.4.0? If so, I'm willing to make the fix and send it to whoever wants it." Theodore replied:

If the problem only impacts User-mode Linux, it's hard for me to justify hanging the "critical" label on it. However I'm willing to look at the patch, bless it, and send it on to Linus (who as you know sometimes is a softy about such things. :-)

I'm pretty sure that we'd be able to get it into 2.4.x, x > 0 at the very least, since the patch is highly localized, won't break anything, and is easy to test for correctness.

Jeff posted a patch, and added, "I still think that treating it as critical deserves consideration because it's a potentially nasty problem, and one that could be tough to debug. As Alan pointed out, it's easier to fix the bug than it is to prove that it can't happen on any of the other arches."

End of thread.

2. iBCS For 2.4

13 Oct 2000 - 17 Oct 2000 (4 posts) Archive Link: "iBCS with 2.4?"

Topics: BSD: NetBSD, IBCS, Microsoft

People: Tigran AivazianChristoph HellwigDavid WoodhouseDavid Ford

David Ford asked whether iBCS support would be in 2.4, and Tigran Aivazian replied with a historical summary:

The ABI (iBCS is just an Intel-specific standard so the old name iBCS was a misnomer while ABI is both platform and OS-independent) patch was maintained during 2.3.x development series jointly by David Woodhouse, myself and others at least until 2.3.99-preX. Then I left SCO and SCO collapsed (more or less) and thus was no longer considered as a strong player in OS market so I don't believe it is worth expending much effort to emulate their OSes. I hope the same fate awaits Sun together with their Solaris product (and any other commercial OS vendors, Microsoft will go next) so I doubt it is worth emulating Solaris either -- there are more exciting projects for the Linux kernel. So I personally am no longer interested in ABI. I suspect the unimportance of non-Linux (legacy) systems is what made Linus decide to not include it in the official kernel, but I cc'd all people mentioned here to correct me if I am wrong.

Anyway, the latest version of the patch I had you can download from:

http://www.moses.uklinux.net/patches/abi-bcp-2.3.99.patch.bz2

but ask Christoph Hellwig (of Caldera who bought SCO, I heard?) he may have a more recent copy.

He replied to himself to add that if Linus was interested in iBCS support,
I am willing to spend some time cleaning up the code, updating it for current interfaces and re-testing at least the small parts of it that I wrote myself :)
. Christoph Hellwig replied that he thought there was definitely interest, and that he'd already done a lot of the work to clean up the code. He proposed,
If no one has a better idea I will setup a linux-abi projects at sourceforge and check Tigrans last patch and my changes in - everyone interested should join.
And added,
My primary interests are SCO UnixWare and OpenServer now, but I would like to get some NetBSD binaries working under Linux sooner or later.

3. 'IDE-patch' In 2.2

15 Oct 2000 - 17 Oct 2000 (4 posts) Archive Link: "now that NFS V3 is in 2.2.18pre, could we *please* add the ide-patch"

Topics: Disks: IDE

People: Mike A. HarrisChris KloiberVojtech PavlikAndre Hedrick

Someone asked if the 'IDE-patch' could please be included in 2.2; perhaps marked as 'experimental'. But Mike A. Harris explained:

I personally *NEED* the IDE patches. That is NOT good enough reason to include them in the mainstream kernel however.

The 2.2.x IDE patches are unmaintained if I'm correct, unless someone else has stepped up to maintain them. Vojtech Pavlik (sp?) is maintaining the VIA portion I believe.

If the IDE patch were integrated, it would likely fall upon Alan to manage.

Chris Kloiber mentioned that Andre Hedrick didn't have time to backport the patches from 2.4 to 2.2, but Andre replied to Mike, saying that Alan wouldn't get the bug reports, he (Andre) would. Mike replied that this completely changed the whole picture, and added his vote for including the IDE patches in 2.2.

Maintenance of IDE support has been changing hands a bit recently, at least partially. See Issue #83, Section #10  (25 Aug 2000: New Maintainer For VIA vt82c586 IDE Driver) , Issue #86, Section #8  (12 Sep 2000: Backporting The IDE Patch) , and Issue #86, Section #10  (12 Sep 2000: New Maintainer For ATA Backport) .

4. Tulip Driver Complexity Interferes With Development

15 Oct 2000 - 16 Oct 2000 (10 posts) Archive Link: "Problems with Tulip driver in 2.2 and 2.4"

Topics: Networking

People: David ReesAlan CoxMike A. HarrisEric S. RaymondLinus Torvalds

J. S. Connell reported a problem with the Tulip driver in 2.2 and 2.4, and David Rees confirmed having a similar problem. He remarked, "I've noticed this behavior for a few kernel revisions now, up to and including 2.2.17. It would be nice to get this bug worked out before 2.2.18." But Alan Cox said, "I dont think that is likely to happen. Every time someone touches the tulip driver close to release they fix one card and break another 8(." Mike A. Harris suggested, "This might be a good reason to fork the driver code. Linus commented before that he'd prefer a fork if it prevented problems like this from occuring I believe." For more on the idea of splitting up code to ease complexity, see the world-famous thread in which Eric S. Raymond told Linus Torvalds to grow up, covered in Issue #83, Section #4  (21 Aug 2000: Driver Organization; Serial Devices And X; Sharing Code; Philosophy Of Development)

5. Some Details Of Getting Device Drivers Into The Official Sources

16 Oct 2000 - 18 Oct 2000 (29 posts) Archive Link: "Device Driver"

People: Mike McLeodKeith OwensAlan CoxMiles LaneJohn AlvordIgmar Palsenberg

Mike McLeod said that Platypus Technology had developed some hardware and was keen to get its driver included in the official Linux kernel sources. He explained, "All of the code is open except for an image file that is loaded onto the card when the driver is installed that handles it's logic." Alan Cox suggested reading 'Documentation/SubmittingDrivers (http://lxr.linux.no/source/Documentation/SubmittingDrivers?v=2.4.0-test9) ' in the source tree, and Keith Owens explained, "You need to generate a patch to add the code into the relevant kernel directory and update the Makefile accordingly." He offered to guide Mike through the process, since as he pointed out, it could get messy at times.

Miles Lane pointed out that if the firmware image file was not going to be available as Open Source, the driver should be able to handle its absence, or else Platypus should provide an Open Source version. But Keith explained, "Firmware for cards can be proprietary. It can either be installed by a userspace utility on startup (nothing to do with the kernel) or it can be installed by the kernel driver for the card during initialization. In the latter case, the image must be supplied in text format and converted to binary, no binary files in the kernel tarball."

Igmar Palsenberg felt it was unacceptable to include any closed source software in the kernel, primarily for security reasons. John Alvord replied that he was pretty sure there were precedents for this already, and Alan Cox replied:

Quite a few. You can make the driver upload be done via a userspace app so its pretty clearly seperate from the kernel.

As to the security risk - yes if you run a military installation or something that is similarly paranoid. However its no different to the firmware burned into ROM or the BIOS having NSAkey like hooks.

This didn't sit entirely well with Igmar, but there wasn't much discussion.

6. 'OOM Killer' Code Evaluation

17 Oct 2000 - 21 Oct 2000 (5 posts) Archive Link: "OOM Test Case - Failed!"

Topics: OOM Killer

People: Stephen TweedieByron StanoszekRik van Riel

Byron Stanoszek complained that the current 'OOM Killer' code, aimed at killing memory hogs on out-of-memory conditions, seemed to kill all the wrong processes. In particular, when running a memory-intensive compile, the 'OOM Killer' chose to kill only the 'httpd' children of 'apache'. 'apache' then spawned off new children to fill the gaps, and the process repeated. He felt it would have been more intelligent to kill off either the 'apache' parent itself, or else the memory-hogging compilation. Stephen Tweedie replied that the compilation process as displayed by Byron in his email, showed itself to be owned by root. "Protecting root processes is an explicit design goal here," he explained. Rik van Riel added that since only 'httpd' children were killed, no actual work was lost on that system. Byron replied that his system had still been unusable during the out-of-memory condition, as verified by the fact that even logfiles stopped updating. There was no reply.

7. Filesystem Corruption Bug Revisited

17 Oct 2000 (7 posts) Archive Link: "[BUG]: Ext2 Corruption in test10pre3 (incl. Oops)"

Topics: FS: ext2

People: Alexander ViroUdo A. SteinbergLinus Torvalds

Udo A. Steinberg reported filesystem corruption with 2.4.0-test10-pre3, and posted an oops. Linus Torvalds and Alexander Viro took a look, but couldn't find anything wrong with the oops trace. This issue came up before (also involving Udo) and was covered in Issue #85, Section #4  (3 Sep 2000: 2.4.0-test8-pre2: Long-Time (Over A Year) Filesystem Corruption Bug Uncovered) , Issue #85, Section #5  (4 Sep 2000: 2.4.0-test8-pre4: Filesystem Corruption Bug And Rolled Up Newspaper) , Issue #85, Section #6  (5 Sep 2000: 2.4.0-test8-pre5: More Swipes At Filesystem Corruption Bug) , Issue #85, Section #7  (6 Sep 2000: Still Swatting Filesystem Corruption Bug(s)) , and Issue #85, Section #8  (6 Sep 2000: 2.4.0-test8-pre6: Filesystem Corruption Bug Seems Dead) , at which point the problem seemed to have been solved. Apparently there may still be a bug lurking around somewhere.

8. Proposal To Speed Up Release Cycle

17 Oct 2000 - 19 Oct 2000 (15 posts) Archive Link: "three kernel trees?"

Topics: Assembly

People: Alan CoxJeff V. MerkeyStephen TweedieHorst von BrandDavid Fort

Mirko Kloppstech proposed having three separate kernel trees. 2.2 would represent the stable tree, 2.4 would represent a feature-frozen branch, and 2.5 would be for development. When 2.4 became the stable tree, 2.6 would become the feature-frozen branch, and 2.7 would become the unstable branch. He felt this would shorten the release cycle, but Alan Cox replied:

It requires too much people overhead. I have proposed another idea which is at about 10 months in or when seems appropriate we say 'ok which bits can we fairly reliably backport to 2.4 and call 2.6' then go on to make 2.5->2.7 and stabilise the big changes as 2.8

That would mean driver features and the like get a yearly cycle but deep magic gets what seems to be needed as a 2 year cycle

Jeff V. Merkey gave his opinion:

Were Linux to go totally modular in 2.5, development cycles will be reduced by 1/2 to 1/3. This is because you could always roll back to known good modules to post a release. The way you guys are going, if Linux stays monolithic, your cycles will get longer and longer. Modularity will allow multiple people to proceed in parallel without every patch and bug going through you and Linus all the time (which would mean you could enjoy more free time).

This does not solve the problem of integration testing, but eh solution here is to create an integration test group whose sole charter is to test modules in an integrated framework as they roll off the assembly line. I am speaking from my commercial software development experienes here. Linux is falling into the same rut NetWare did as it became successful -- too much work for mere mortals without some new structures put in place.

Stephen Tweedie pointed out, "Most of the big 2.4 module changes involved totally rethinking the interactions between the modules. If you're changing the APIs between modules, rolling one back is hard." Jeff replied, "Sooner or later we are going to have to bite the bullet on this one, though and sooner is better than later -- we really need to start thinking about it for 2.5."

There followed some discussion of whether this would actually be any help. Jeff advocated a stable binary interface for drivers, so that a given kernel version could 'ship' regardless of whether any particular driver was ready. This idea of stable APIs was something several other people also supported, while a number of others vocally opposed. Horst von Brand said at one point, "the API isn't set in stone, as that would slow down development. Plus you need the machinery to build whatever pieces are extant and ignore the rest, you have to carve up the whole into separate pieces, ... This is a *lot* of work for very minimal gain and many future problems." In the same post, he added, "I don't see how chopping up the kernel will speed up development. Quite to the contrary, any change has to be propagated to all pieces, and that will take more time, and create a nightmare of "drivers bar-0.37.2 to 0.39.6 go with kernel-core 2.9.45 to 2.9.76, but work only with foo-3.65.x". Syncronization issues arise by the distributed development, which are now trivially solved by "this is _the_ official version of _all_ there is"."

One argument put forward by David Fort was, "as time will go by, we'll have more and more drivers, and we'll have to wait for ALL to be stable to release anything." To which Horst replied, "Why should we start doing that now? It works just fine as is: You release the driver as a patch for testing, if it works out for enough people and is done well enough it gets into the standard kernel, perhaps marked "EXPERIMENTAL", "DANGEROUS" and "DON'T USE"; the adjectives are slowly dropped as it shows its worth, and it becomes a standard part; sometimes it will fall into disrepair during development kernel series and not even compile; it might be removed, redone or fixed as may be when it happens."

9. kHTTPd Bug Fix

18 Oct 2000 (10 posts) Archive Link: "[patch] khttpd doesn't detach from the files of its parent"

Topics: Web Servers

People: Alessandro Rubini

Alessandro Rubini noticed that kHTTPd didn't detach itself from the 'files' structure of the parent process. This meant that, when run as a module, all files opened by 'insmod' would remain open. He posted a patch to fix this, and there was some technical discussion.

10. Developers Uncomfortable Expressing Discontent

19 Oct 2000 - 22 Oct 2000 (23 posts) Archive Link: ""Tux" is the wrong logo for Linux"

People: Lincoln DaleRagnar Hojland EspinosaGregory MaxwellMike A. HarrisAlex BuellZack BrownDan HollisLinus Torvalds

An (at least at first) anonymous troll appeared on the list, saying 2.4 was years late and broken; the poster concluded that 'tux' should be replaced as the Linux logo, by the "international symbol for the fucking retard", which the poster made available at their anonymous geocities page (http://www.geocities.com/kmfav/) . This page berated Linus Torvalds for taking too long getting 2.4 out the door, and making bad decisions overall; and finally included a picture of Linus' face on top of some vertebrae (the caption being, "Buy Linus A Spine").

Alex Buell tracked down the person's IP number and provider, inviting everyone to send in complaints and have the person's account terminated. From the IP number, Lincoln Dale was able to identify the full name of the person by examining the linux-kernel list archives. Lincoln said, "the person has been foolish enough to post to this list from that ip-address before. i won't post details of that, but it is a relatively trivial exercise to go thru... perhaps we can all get on with real work now... :-)" . Dan Hollis also found the physical location of the machine itself in Cary, NC, and invited folks to drive over and identify the poster in person.

Throughout the 'hunt', other folks were arguing that this was really a very unimportant event, and there was no need to cause trouble for the person. At one point, Ragnar Hojland Espinosa remarked, "Geez. What are you going to do next, post his picture on interpol.int? It's just a fscking troll, that's what /dev/null is for." Gregory Maxwell and Mike A. Harris had the most pointed remarks to make. Gregory said:

While the poster was obviously being an irrational jerk, his opnion wasn't totally inapporiately placed on this list. By getting people kicked off of systems because they have strong (and potentially stupid) ideas is not a good thing. Fear can be a more effective restraint on open discussion then laws.

Any developer thick-skinned enough to with the strong personalities on this list should have no problem ignoring the poster's inarticulate and childish complaints.

And Mike put it, "While I disagree with the poster's idiotic posting, and harsh comments, this is free speech, and he has every right to speak freely. A shame he hides from us, but to be removed from a list for such a troll is totalitarian IMHO."

(ed. [Zack Brown] My own personal opinion is that the initial post was more than just a troll to be ignored or tolerated or hunted down. The original poster turned out to be a kernel hacker who participates fairly regularly and helpfully on linux-kernel. Stripping away the rhetoric and emotion, the issues he raised were significant, and dealt with the fundamental nature of Linux development. Overly long release schedules seem to be universal among all large Open Source projects, and the proper way to shorten them is still very much an open question. None have even come close to a plausible solution, in spite of acknowledgements from project leaders that the problem exists and is important. Apparently it's reached the point where developers are blowing off steam by posting angry anonymous messages. All the more reason to continue to raise the issue publically.)

11. Speeding Up Boot Process

19 Oct 2000 - 22 Oct 2000 (7 posts) Archive Link: "Proposal: driver initialization pipelining"

Topics: Disks: IDE, Disks: SCSI, USB

People: Felix von LeitnerJeff GarzikAndre Hedrick

Felix von Leitner proposed speeding up the Linux boot process by pipelining the driver initializations. He explained:

The idea is to split the initialization of drivers into two routines. This is only useful for drivers that reset hardware and then wait a while before continuing. My thought is: during that time, other drivers could work.

If we split the initialization into one "trigger the reset" routine and one "do the rest" routine, we could interleave initializations by first calling all the reset routines, then doing some static initializations and then call all the second halves of the initialization. Particularly SCSI and IDE scans need noticeable time and could possibly be done in parallel with the USB init

Jeff Garzik replied thoughtfully,
Some of the initialization can definitely be done in parallel, but there are all sorts of special cases, like devices which turn off interrupts during init (IDE), and other fun tricks... Some of the delays during init are timing sensitive, where you don't want to have to wait for the tasklet to be called for completion.
Andre Hedrick pointed out obliquely that it would be very easy for errors to creep into initialization routines running in parallel, particularly with IDE, leading to data corruption or worse.

12. Minor Problems With 'linux-kernel'

19 Oct 2000 - 20 Oct 2000 (19 posts) Archive Link: "[ADMIN] some list related topics .."

People: Matti AarnioMarty FoutsJeff V. MerkeyDavid S. MillerH. Peter AnvinAlan ShutkoAlexander ViroPhilippe Troin

Matti Aarnio reported some issues regarding linux-kernel:

We (abuse@vger.kernel.org -> me & DaveM) got just reports that somebody is diverting incoming email to some sort of auto-responding ticket system.

The thing does not carry original message "Received:" headers in replies, and is reporting invalid URL.

Independent of that, people with supposedly working addresses are sometimes bouncing:

  1. because their backup MXes don't like the domain they have (configuration problem somewhere, you can choose in between the DNS data writer, and the backup MX admin)
  2. that ISP's system is for some reason loosing track of part of their user database.

    Couple days ago SGI.COM had apparently lost its external system primary alias file for a few hours... I have seen similar blunders here and there, cisco.com had blunder that some months ago too. -- things vary, but usually we refrain from deleting recipients if "a bit too many" are bouncing from some big site...

  3. some ISP systems yield 500 series errors with text:

    "system is temporarily busy"

    or something of that effect. Now THAT is really offensive stupidity by the ISP software folks...

For the subset 1 above, I am preparing to begin to run regularly (weekly very least, daily possibly) scanner which tries interactive testing of recipients address at all of user's domain's MX servers, and if any of the MX systems for user's domain gives a bad response for subscriber's address, that user gets the log report and some pointers on what to do.

Philippe Troin complained about getting reports from Matti's scanner, even though his system functioned properly. Matti explained:

There are now two things running.

The uninteded double-reports are due to me having the script running in two instances for a moment. I thought I SIGTERMed the first one immediately when it showed a nuisance detail in online monitoring feature, but alas, things were not so.

In the future the MX verifier will report only cases with problems to the users who are likely to encounter them.

Marty Fouts pointed out, in reply to item 3 of Matti's original post, that the SMTP protocol did not guarantee that all mail had to be accepted at all times. He went on, "SMTP is *not* designed to be a reliable delivery mechanism, let alone a first-time reliable delivery mechanism. Refusal to accept email because the receiving system is under high load is well understood, commonly accepted, and even codified in implementation practice. In my opinion, you are doing a GoodThing(tm) by trying to weed broken addresses from the mailing list. But please don't demand from the internet behavior it wasn't designed to provide." Matti, David S. Miller, and Alexander Viro all pointed him to RFC 821 (http://www.landfield.com/rfcs/rfc821.html) (Simple Mail Transport Protocol) Appendix E. The ISPs reporting email errors were contradicting themselves, since they reported only a "temporary failure" to deliver the message, yet also returned series 500 error codes, which indicated a permanent failure that should not be attempted again.

Elsewhere, Jeff V. Merkey replied to Matti's initial post:

A legal perspective.

It is a Felony Federal) in the United States to divert or re-route emails on the internet, esspecially across interstate lines. It's also a Felony (Federal) to misuse and email address to disrupt interstate or international commerce.

I would suggest calling the FBI office in your area and file a formal complaint. You can be assured they will put anyone doing this in jail (and walk in and seize all their computer equipment).

Matti pointed out that there was no FBI office where he was in Finland, and David also said:

We have no interest in sending the FBI to go knock down someones doors, just removing the offending address and fixing the problem, thats all.

I used to live in New Jersey, where "run out and sue your brother and neighbour just because you can" was considered quite cool. I don't see things that way any more, in fact I think it's kind of lame.

But thanks for your legal perspective anyways. :-)

Jeff replied that it was actually illegal not to report the crime in the US, and reiterated that diverting email was a crime. Alan Shutko was unaware of this law, and Jeff tracked it down to the Electronic Industrial Espionage Act.

At one point H. Peter Anvin remarked, "This is downright silly. There is no "interfering with email routing" going on... just someone who has set up their email .forward to a broken address."

13. Searching For Dual-AGP-Slot Motherboards

19 Oct 2000 - 20 Oct 2000 (14 posts) Archive Link: "Any dual AGP slot motherboards?"

People: H. Peter AnvinJoel JaeggliJames SimmonsDan Hollis

James Simmons asked if there were any dual-AGP-slot motherboards available, and Joel Jaeggli gave a link to the Accellerated Graphics Port specification (http://www.intel.com/technology/agp/agp_index.htm) and replied that the AGP spec was only for a single device, and H. Peter Anvin went on, "Technically, AGP is a "port", not a "bus", for this very reason. However, there is nothing that says a chipset can't provide more than one AGP port; it just means you can't connect more than one device to each port." Dan Hollis agreed that there was no reason there could not be multiple AGP buses. James replied that he'd heard of an Apple system with two AGP slots, and just wanted to know if Intel produced a similar item. But when asked for specifics on the Apple system, James was unable to find it, in spite of much searching. He replied that it must have been a rumor he'd picked up somewhere.

14. Virtual Memory Subsystem Still Shaky

20 Oct 2000 - 23 Oct 2000 (11 posts) Archive Link: "question wrt context switching during disk i/o"

Topics: Virtual Memory

People: Mike GalbraithMark HahnAndrea ArcangeliRik van Riel

Mike Galbraith reported, "I notice on my system that during disk write we do much context switching, but not during disk read." Mark Hahn explained, "bdflush is broken in current kernels. I posted to linux-mm about this, but Rik et al haven't shown any interest. I normally see bursts of up to around 40K cs/second when doing writes; I hacked a little premption counter into the kernel and verified that they're practially all bdflush..." Mike replied that in fact, his system was not happy with the VM subsystem in general, and would start to grind pretty badly as soon as it hit any swap. He couldn't figure out the problem, and added the Andrea Arcangeli's 'classzone' patch worked very well for him (though he felt he was probably beating a dead horse on that issue). There also followed some technical discussion of bdflush(), in which it looked like some problems were uncovered, though nothing conclusive came to the list, and Rik van Riel had no comment throughout the thread.

 

 

 

 

 

 

Sharon And Joy
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.