Kernel Traffic #282 For 1 Nov 2004

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1883 posts in 10757K.

There were 463 different contributors. 248 posted more than once. 182 posted last week too.

The top posters of the week were:

1. Linux 2.4.28-pre4 Released; New Patch Acceptance Policy Clarification

8 Oct 2004 - 16 Oct 2004 (14 posts) Archive Link: "Linux 2.4.28-pre4"

Topics: Bug Tracking

People: Marcelo TosattiBrian LazaraManfred SpraulJeff GarzikMartins Krikis

Marcelo Tosatti announced Linux 2.4.28-pre4, saying:

It contains a number of driver updates (pcnet, e1000, gdth, prism54), a network update from David, few more gcc3.4 warning fixes.

I'm happy that the number of updates is small, -pre3 has been released more than one month ago.

From now on can now change only what is necessary and let the 2.4 tree in peace :)

Brian Lazara from NVidia asked, "At some point, can we get forcedeth.c updated in 2.4.x? We've taken the latest from 2.6.8 and posted a patch against 2.4.27, but it isn't getting picked up. See" Manfred Spraul replied, "The driver in 2.6.8 contains a critical bug that prevents the operation on the non-GB board with a modularized driver. See It's now fixed, I've written a backport of the 0.29 driver, it's at But that backport was stopped due to an oddity in your original backport" . And Jeff Garzik also said to Brian, "Nobody has submitted a forcedeth update to me for 2.4." Brian replied, "I figured as much. I've pinged Manfred a couple of times on this. It doesn't appear that he is interested in keeping the 2.4 version of the driver up-to-date."

Elsewhere, Martins Krikis asked Marcelo if this new freeze meant that "there is no hope for adding iswraid to the 2.4 kernel? It still applies cleanly to 2.4.28-pre4 as well... Please consider." Marcelo replied, "New drivers are OK, as long as they dont break existing setups, and if substantial amount of users will benefit from it." [...] "A review by someone with good knowledge on this area (arjan, bart, alan, ?) would also be a good point on getting it into the tree." Martins said he would very much appreciate a review of the driver, and Jeff Garzik said, "FWIW I ACK'd iswraid a while ago..." Martins said:

True, and it was very much appreciated. Jeff's comments induced many of the changes between iswraid versions 0.1.3 and 0.1.4.

But now, of course, the current iswraid version is, and nobody has reviewed it, AFAIK. As always, I'm looking forward to any feedback.

2. New Real-Time Patches For 2.6

8 Oct 2004 - 17 Oct 2004 (95 posts) Archive Link: "[ANNOUNCE] Linux 2.6 Real Time Kernel"

Topics: Microkernels, Real-Time, SMP

People: Sven-Thorsten DietrichIngo MolnarScott Wood

Sven-Thorsten Dietrich said:

Announcing the availability of prototype real-time (RT) enhancements to the Linux 2.6 kernel.

We will submit 3 additional emails following this one, containing the remaining 3 patches (of 4) inline, with their descriptions.


Patches against the Linux-2.6.9-rc3 kernel are available at:

The patches are to be applied to the linux-2.6.9-rc3 kernel in the order listed above.

Subsequent announcements will include the links to the ftp site only, to reduce email bulk on the Linux kernel mailing list.


The purpose of this effort is to to further reduce interrupt latency and to dramatically reduce task preemption latency in the 2.6 kernel series. Our broad objective is to achieve preemption latency bounded by the worst case IRQ disable.

We are in progress of porting to the 2.6.9-rc3-mm kernel series, and would like to present our work at this stage, to request general feedback, and interact with others working on similar kernel enhancements.

These RT enhancements are an integration of features developed by others and some new MontaVista components:


Our objective is to enable the Linux 2.6 kernel to be usable for high-performance multi-media applications and for applications requiring very fast, task level reliable control functions.

The AV industry is building HDTV related technology on Linux, and desktop systems are increasingly used for similar applications.

Cell phones, PDAs and MP3 players are converging into highly integrated devices requiring a large number of threads. These threads support a vast array of communications protocols (IP, Bluetooth, 802.11, GSM, CDMA, etc.). Especially the cellular-based protocols require highly deadline-sensitive operations to work reliably.

GPS processing, for example, requires hard real-time tasks and guaranteed KHz frequency interrupt processing. Linux-based remote controlled GPS stations at inaccessible or dangerous sites, like the inside of Mt. St. Helens, stream live data via IP.

Additionally, Linux is being increasingly utilized in traditional real-time control environments including radar processing, factory automation systems, "in the loop" process control systems, medical and instrumentation systems, and automotive control systems. Many times these systems have task level response requirements in the 10's to hundreds of microsecond ranges, which is a level of guaranteed task response not achievable with current 2.6 Linux technology.

Other precedent work:

There are several micro-kernel solutions available, which achieve the required performance, but there are two general concerns with such solutions:

  1. Two separate kernel environments, creating more overall system complexity and application design complexity.
  2. Legal controversy.

In line with the above mentioned previous Kernel enhancements, our work is designed to be transparent to existing applications and drivers.

Implementation Details:

We have substituted the definition of kernel spinlocks with a mutex abstraction that uses the P-mutex from the Bundeswehr University in Munich, Germany:

The spinlock definitions have been abstracted to invoke a crude but effective #define-based substitution of spin_lock to mutex_lock functions (in linux/kmutex.h).

We have abstracted the mutex layer to allow configuration and selection of the mutex implementation. We have used a simple mutex implementation, but intend to support use of other mutexes, for example the existing system semaphore, or third party plugins such as the the FUSYN project.

Partitioning the Critical Sections:

A partitioning between critical sections protected by spinlocks and critical sections protected by mutexes has been established.

There are currently some overlaps (or holes) in the partitioning. It is possible for a task holding a spinlock to block on a mutex, causing a deadlock. These deadlocks are resolved for interactive tasks on UP by grace of the interactive scheduler.

We are eliminating this nesting of mutex-protected sections inside of spinlock-protected critical sections. Only a minimal set (teens) of the spinlocks will remain. This set will be composed of spinlocks necessary to protect immediate hardware, as well as minimal critical sections that would not benefit from mutex-based preemptability.

Our broad objective is to achieve preemption latency bounded by the worst case IRQ disable. Total response latency (i.e, time to initiate/complete an arbitrary system call) would still be bounded by the worst case spinlock protected critical region.


This experimental code requires further enhancement and is very much a work in progress.

The kernel is fairly stable, failing under high loads and in low memory conditions.

The kernel has not been extensively tested on SMP systems.

We are reluctant to publish any performance numbers until we have completed the mutex-spinlock partitioning and provisioned support for RW locks.

At that point, we expect the worst case preemption latencies to be in the hundreds of microseconds on a typical workstation.

We are acknowledging performance degradation due to the mutex debug code and the abstraction layer. We expect to be able to improve throughput as the code matures, and the RT kernel becomes more refined.


Please find additional documentation in the Documentation/rttReleaseNotes file.

Please see this document for a complete list of known problems and latest status.

Credits and Thanks:

We wish to acknowledge the precedent work that has allowed us to build this framework, as cited above.

We would also like to thank Dirk Grambow, Arnd Heursch, and Witold Jaworski of the Universitaet der Bundeswehr, Muenchen, Germany.

We are providing this kernel patch as waypoint on the course towards configurable responsiveness in the 2.6 Linux kernel.

Ingo Molnar replied:

cool! Basically the biggest problem is not the technology itself, but its proper integration into Linux. As it can be seen from the 2.4 RT patches (TimeSys and yours), just walking the path towards a fully preemptible kernel is not fruitful because it generates lots of huge, intrusive patches that end up being unmaintainable forks of the Linux tree.

the other approach is what i'm currently doing with the voluntary-preempt patchset: to improve the generic kernel for latency purposes without actually adding too many extra features. Here is what is happening in the -mm tree right now:

A couple of suggestions wrt. how to speed up the integration effort: you might want to rebase this stuff to the -mm tree. Also, what i dont see in your (and others') patches (yet?) is some of the harder stuff:

These are basic correctness issues that affect UP just as much as SMP. Without these the kernel is still not a "fully preemptible" kernel. These need infrastructure changes too, so they must preceed any addition of a spinlock -> mutex conversion feature.

So the mutex patch will probably the one that can go upstream _last_, which will do the "final step" of making the kernel fully preemptible.

Various folks began discussing the technical issues; and a subset of these were also upset to see yet another attempt at real-time patches, competing with their own. The discussion never reached flame-war calibre, and most of the talk focused on dealing with various technical issues. Clearly a lot of people want better real-time support in the kernel, while few agree on the best way to do it.

3. Linux 2.6.9-rc4-mm1 Released

11 Oct 2004 - 20 Oct 2004 (79 posts) Archive Link: "2.6.9-rc4-mm1"

Topics: Version Control

People: Andrew Morton

Andrew Morton said:

4. ABI Stability

13 Oct 2004 - 17 Oct 2004 (10 posts) Archive Link: "Announcing Binary Compatibility/Testing"

Topics: Sound: OSS

People: Timothy D. WithamJeff GarzikRobert LoveLinus Torvalds

Timothy D. Witham said:

Announcing Binary Compatibility/Testing

In talking to end users, distributions, OSS developers and large scale ISV's one issue kept popping up. And that is the fact that binaries keep breaking.

This is a real problem for large end users deploying Linux in that they like to be able to run/roll forward the same version of an application for 5 or so years. They can do this with their legacy operating systems and we need to be able to do this with Linux.

One of the big problems is that these ISV's release and test on a cycle that is measured in calendar quarters and of course the OSS cycle is measured in days. The idea is to move testing of these binary applications upstream to match the OSS development cycle. For this purpose I've started a mailing list to discuss how to accomplish this. I've got slides for anybody who is interested. (PDF.) (Follow binary testing for slides)

Let the flaming start. :-)

Jeff Garzik said, "Userland ABI compatibility has always been a strongly held value in Linux, I don't think we would flame any efforts to support that..." And Robert Love replied:

Yah. With the exception of maybe changing something in /proc (which has been rare, and hopefully will never happen with /sys) the kernel-to-user ABI is really stable.

I'd venture, in fact, to say that this effort is very important but does not affect the kernel at all. Current "fault" lies in things e.g. like the C++ ABI, which is constantly fluctuating (rightly so, to fix bugs, but still).

Any other incompatibility lies in libraries, but we have library versioning. There is nothing wrong with newer libs breaking compatibility so long as they have a different soname. Vendors just need to ship compat libs and ISV's need to make sure they request the right lib and don't touch internals.

Linus Torvalds said, of the library versioning:

No we don't.

Yes, we "have the technology". But it's not actually used for libc (which is most of the problematic stuff), so we do not actually have library versioning.

Instead, glibc tries very hard to be binary compatible, and invariably fails occasionally.

Oh, well.

5. New 'mini kernel dump' Tool

13 Oct 2004 - 17 Oct 2004 (5 posts) Archive Link: "Yet another crash dump tool"

People: Itsuro OdaRobin Holt

Itsuro Oda said:

We released a crash dump tool called "mini kernel dump".

Please see the following URL to get the motivation and the overview of the mini kernel dump.

Robin Holt voiced his objections:

I am not sure why this is such a huge improvement. The one concern I have is you blindly are copying all of memory to the dump device. Can you dump device span multiple volumes? If I have a system using 1TB of physical memory, but 98% of that is allocated as huge TLB pages for users, do I _REALLY_ need to dump them all?

lkcd, and I would hope others, only dump kernel pages unless configured to do otherwise. More importantly lkcd can eliminate page cache and buffer cache pages. Those types of pages are seldom relevant to figuring out what actually went wrong.

Realistically, if the basic structures telling you whether pages are used by the kernel or not are so messed up you can not use them for dumping, they have probably been allocated to multiple users and will be riddled with inconsistent information.

Itsuro replied that yes, the mini kernel dump tool would dump all memory to the dump device. He explained, "Our target is customer's production system, not developping/debugging system. The chance of capturing fault analysis materials may be only one time. If a kernel destroy the memory using user process(page cache, buffer cache), looking the pattern of destroy is great helpful to analyze. (note that I have encountered such case many times) We also analyze user proccesses at the crash time from the dump." He acknowledged that the mini kernel dump tool was not the best solution for all purposes. Robin replied that even for commercial, production systems, "some of our customers have classified data. They require assurances that the minimal amount of their unclassified data is being sent outside their control to reduce the chance that someone can infer their methods." Itsuro said he and the other developers would consider this; but there was no further discussion.

6. inotify Updates; Some Conflict With dnotify

14 Oct 2004 - 18 Oct 2004 (10 posts) Archive Link: "[RFC][PATCH] inotify 0.14"

People: John McCutchanStefanos HarhalakisStephen RothwellRobert Love

John McCutchan said:

Here is release 0.14.0 of inotify. Attached is a patch to

New in this version

Stefanos Harhalakis replied, "AFAICS this patch adds inotify and removes dnotify. I believe that the addition of inotify to 2.6 series (if it is going to happen) should leave dnotify intact since there may be programs that rely on it (kde for example)." But John McCutchan corrected, "This patch makes both inotify and dnotify conditional features. It does not remove dnotify." Robert Love confirmed this, but Stephen Rothwell pointed out, "But you have removed the sysctl that allows enabling and disabling of dnotify at run time. And you create setattr_mask_dnotify for which I can find no caller." John admitted that there may have been bugs in the way dnotify was made optional, and said he'd gladly accept patches to fix the bug. He remarked, "It is debatable whether or not the inotify patch should carry this dnotify config patch as well. I don't see it being that large of a burden on maintaining or using the patch that includes the dnotify config changes." Stephen suggested, "You should probably submit the patch making dnotify optional as a completely separate patch as it is logically s separate issue." Robert agreed, saying he had submitted such a patch in the past, and would submit it again.

7. udev 039 Released

15 Oct 2004 (1 post) Archive Link: "[ANNOUNCE] udev 039 release"

Topics: FS: devfs, FS: sysfs, Hot-Plugging, Version Control

People: Greg KH

Greg KH said:

I've released the 039 version of udev. It can be found at: (

udev allows users to have a dynamic /dev and provides the ability to have persistent device names. It uses sysfs and /sbin/hotplug and runs entirely in userspace. It requires a 2.6 kernel with CONFIG_HOTPLUG enabled to run. Please see the udev FAQ for any questions about it: (

For any udev vs devfs questions anyone might have, please see: (

And there is a general udev web page at:

This release fixes a few major bugs:

Thanks to everyone who has send me patches for this release, a full list of everyone, and their changes is below.

udev development is done in a BitKeeper repository located at:

Daily snapshots of udev from the BitKeeper tree can be found at: If anyone ever wants a tarball of the current bk tree, just email me.

8. forcedeth Backport To 2.4

17 Oct 2004 (1 post) Archive Link: "[CFT,PATCH] new forcedeth backport to 2.4"

Topics: Networking

People: Manfred Spraul

Manfred Spraul said:

Jeff and Christoph found a few bugs in the previous backport, thus I've decided to start a new backport from the latest driver (0.30) from the 2.6 -mm tree.


It's a new backport, not based on the backport from Jane Liu.

Please test it - it works on my nForce 250 Gb, but I don't have an non-gigabit board to test the media detection changes.

9. Developers Unhappy With Linus' Kernel Versioning Anomolies

18 Oct 2004 - 19 Oct 2004 (6 posts) Archive Link: "Enough with the ad-hoc naming schemes, please"

People: Matt MackallRussell KingMartin J. BlighGeert UytterhoevenCliff WhiteChristoph HellwigLinus Torvalds

Matt Mackall said to Linus Torvalds:

I can't help but notice you've broken all the tools that rely on a stable naming scheme TWICE in the span of LESS THAN ONE POINT RELEASE.

In both cases, this could have been avoided by using Marcello's 2.4 naming scheme. It's very simple: when you think something is "final", you call it a "release candidate" and tag it "-rcX". If it works out, you rename it _unmodified_ and everyone can trust that it hasn't broken again in the interval. If it's not "final" and you're accepting more than bugfixes, you call it a "pre-release" and tag it "-pre". Then developers and testers and automated tools all know what to expect.

Cliff White, speaking on behalf of OSDL's automated testing team, seconded this. Russell King put in his 'Aye' of assent, saying he had also broached the matter privately with Linus, and adding, "I, for one, no longer believe in any naming scheme associated with mainline." Geert Uytterhoeven was also happy to hear people objecting to Linus' version naming pattern, as was Christoph Hellwig and Martin J. Bligh. Martin also suggested, "Perhaps we could document whatever the standard is going to be somewhere, then stick to it."

10. Software Suspend Version 2.1 Available 2.6.9 Kernel; 2.4 Support To Follow

19 Oct 2004 - 20 Oct 2004 (3 posts) Archive Link: "Announce: Software Suspend 2.1 for 2.6.9."

Topics: Software Suspend

People: Nigel Cunningham

Nigel Cunningham said:

I'm pleased to announce that Software Suspend 2.1 is now available for the 2.6.9 kernel.

I hope to make a version for 2.4 available reasonably quickly. This release in intended to be the last, apart from bug fixes and updates for new releases, for the 2.4 kernel.

There are tons of changes since 2.0, the main one being that suspend can now be built as modules and loaded from an initrd. For more details on configuring this feature, please see the web site:

A direct link to the download is:







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.