Kernel Traffic
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic

Kernel Traffic #200 For 13 Jan 2003

By Zack Brown

Table Of Contents


Tomorrow (January 14) will be the 4 year anniversary of Kernel Traffic. Wow! Today is the 200th issue, and yesterday was my mother's birthday. So... this issue is brought to you by the number 4, 200, and by the letter M.

The Kernel Traffic pages are now hosted by the generous folks at Tux.Org, for which I am extremely grateful. The official KT URL is now changed to, although the address will continue to work for now. Please update your bookmarks, as the zork address may eventually go away.

Oh yeah, and the quotes indices are back. Check out the links in the top nav bar, or go to the quotes index. I'd appreciate comments. Note that this feature is different from what it used to be. It no longer tracks instances where a given person was only mentioned in a thread, but not quoted. I'm working on adding that feature back in, but it won't be for a little while. I'm also working on a way to combine similar spellings of people's names, like "Stephen C. Tweedie" and "Stephen Tweedie". Finally, I'm also working on putting a list of names at the top of each summary, by the list of topics. Each name will link to that person's unique index page, in turn linking back to each summary that quotes them, in the full KT issue archive.

Let me know how I can make this more useful.

Mailing List Stats For This Week

We looked at 1662 posts in 8939K.

There were 442 different contributors. 235 posted more than once. 156 posted last week too.

The top posters of the week were:

1. Memory Management Updates For 2.5

28 Dec 2002 - 1 Jan 2003 (3 posts) Archive Link: "2.5.53-mm2"

Topics: FS: ext2, Virtual Memory

People: Andrew Morton

Andrew Morton announced:

Mainly stability work:

2. Possible Replacement For devfs

31 Dec 2002 - 2 Jan 2003 (9 posts) Archive Link: "RFC/Patch - Implode devfs"

Topics: FS: devfs, FS: procfs, FS: ramfs, FS: sysfs

People: Adam J. RichterChristoph HellwigRichard Gooch

Adam J. Richter announced, "The following patch replaces devfs with a ramfs-derived implementation which is under one quarter the size although it eliminates certain functionality." He added:

Further minor reductions may be possible from some additional clean-ups and perhaps a simplification to the programming interface.

This code is probably very buggy. I've just gotten it working on the machine on which I'm composing this message. It is not ready for integration yet. I would particularly appreciate help with the dentry manipulation and locking, which I think I've probably botched.

The notable differences between devfs and mini-devfs are:

  1. devfsd has been replaced with the user mode helper /sbin/devfs_helper which is exec'ed for registration and initial lookup events (I could catch all the events that devfs does, but I don't think that's really necessary). I have yet to port devfsd to this interface. When sysfs support is added to the new kernel parameter system, I will make this a kernel parameter, so it can be off initially, and only turned on by systems that use it when the boot process is ready for it. This will also eliminate the problem where the boot process currently tries to do 200+ execs as each pseudo-terminal is registered.
  2. The removable device code that cleared and reread partition tables from removable media all the time preventing use of userland partitioning with devfs is eliminated. (Non-devfs systems never had this problem.)
  3. devfs_handle_t is now a synonym for struct dentry*.
  4. A lot of the devfs routines are unimplemented. I haven't noticed much code that uses them, and I'm not sure that any code really should. I think arch/ia64/sn uses devfs_get_first_child, devfs_get_next_sibling. I need to understand what if any of the other routines are really necessary and why (for example, why can't we use struct dentry). My computer seems to run fine without them.


First of all, I'd like to debug this code and I'd welcome any help. The only malfunction (aside from routines that aren't written) that I've noticed is that previously xdm would give up after trying to start the X server about five times (it is not configured). With this smaller devfs, xdm keeps trying to start the X server. Also, I'm pretty sure that my code is at least releasing and reacquiring dcache_lock when it shouldn't, and I think there may be some similar inode->i_sem issues.

In the future, I'd like to shrink the devfs interface to devfs_{create,delete}. If we prohibit file renaming in devfs, then drivers can be sure they've removed the devfs nodes that they've created when they delete the paths that they've created. A side-effect of this would be that devfs_handle_t would be eliminated. I still want the ability to pass a struct file_operations* to devfs_create as it may enable the elimination of fixed device numbers.

I think I'd like to change fs/super.c slightly to make it easier to statically allocate the struct super_block for filesystems that can have only one instance even if they are mounted in multiple locations (devfs, procfs, sysfs, usbdevfs, etc.).

I started hacking on this code by making an approximately ten line change to ramfs just to have it call a user level program on lookups and file creation. I had hoped to change the devfs routines to just generically operate on whatever was mounted on /dev. If the devfs API were shrunk substantially, it might be worth trying that approach again.

Later that day he also said:

I have made a new version of my mini-devfs (attached below). I have also made a first version of my devfs_helper program to handle the functionality of devfsd on systems that use my mini-devfs. devfs_helper is avaiable from the following location:

This version of devfs_helper only supports "EXECUTE" and "MODLOAD" actions, which may be the only events that are really necessary. In the future, I envision eliminating the "MODLOAD" action, since it is equivalent to "EXECUTE modprobe -C /etc/modprobe.devfs $devname".

devfs_helper is currently 211 lines of C code. In comparison, devfsd-1.3.25/*.[ch] is 3143 lines.

Also, here is version 2 of my mini-devfs. I am embarassed to say that I omitted Richard Gooch's copyright notice on his code that I copoied into mini-devfs/numspace.c. This patch corrects that and fixes a variety of other little problems. It also changes the directory of this facility to fs/mini-devfs/. One change that is necessary for operation with devfs_helper is that the event names are now in all capitals ("REGISTER" and "LOOKUP") to match the format of /etc/devfsd.conf.

Christoph Hellwig was thrilled to see this, but remarked, "I just wonder where viro is hiding the last weeks, he promised more devfs API cleanups that likely clash with your changes." Adam replied, "I don't know what devfs API changes he wants to make, but I imagine that I can probably follow them. The one devfs quirk that I really like that one might be tempted to eliminate is that you can pass a file_operations pointer to devfs_register, which enables the possibility in future of being able to build systems that just use devfs names for identifying devices." Elsewhere, Adam offered another update:

Just to keep everyone up to date, here is a third iteration of my patch converting devfs to a ramfs-based file system. This one uses an almost unmodified version of devfs/util.c to restore automatic device number allocation and devfs_{,un}register_tape().

This code now tries to implement almost all of the devfs functionality that anything outside of arch/ia64/sn uses. The most significant except that I'm aware of is the ability to create a plain file with custom file operations, which is done the Memory Type Range Register code, but that code also provides a proc interface for the same thing, and I think the proc interface is what everyone uses right now anyhow.

If Christoph's patch for deleting a bunch of unused stuff from devfs gets into 2.5.54, that should make my patch smaller, and I'll post a new version then. If nobody objects, then perhaps I'll make that version replace fs/devfs rather than creating a separate fs/mini-devfs.

3. More Memory Management Updates

1 Jan 2003 (4 posts) Archive Link: "2.5.53-mm3"

Topics: FS: ext3, Virtual Memory

People: Andrew Morton

Andrew Morton announced:

Later he added, "2.5.54-mm1 is at it is identical to 2.5.53-mm3."

4. Linux 2.5.54 Released

1 Jan 2003 - 5 Jan 2003 (46 posts) Archive Link: "Linux v2.5.54"

Topics: Framebuffer, Kernel Build System, User-Mode Linux

People: Linus TorvaldsJames SimmonsUdo A. Steinberg

Linus Torvalds announced 2.5.54:

Happy new year to you all, hopefully most of you are back from the dead and the hangovers are all long gone. And if not, I'm told reading a large kernel patch is _just_ the medication for whatever ails you.

The 2.5.54 patch is largely mainly a big collection of various small things, all over the place (diffstat shows a long list of small changes, with some noticeable activity in UML, the MPT fusion driver and some of the fbcon drivers).

Various module updates (deprecated functions, updated loaders etc), usb, m68k, x86-64 updates, kbuild stuff etc etc.

As is usual, a number of people replied with compile-time or run-time problems, and various folks discussed the oopsen and errors. Among these, Udo A. Steinberg had a problem with the riva framebuffer driver, and James Simmons replied, "I'm working on a new imporve driver right now. Can you give me another day."

5. Support For .config Values In The Kernel Binary

2 Jan 2003 - 5 Jan 2003 (11 posts) Archive Link: "kernel .config support?"

People: Alan CoxRobert P. J. Day

Robert P. J. Day asked about a feature he remembered from 2.4; in which the .config file was included in the kernel itself. He wanted to know if 2.5 would support the same feature.

A number of folks replied that the feature had never been in the standard kernel, though some vendors shipped with it. Alan Cox also added, "The facility has been in the -ac kernel, and was recently submitted for consideration in 2.5."

6. IPMI Driver Update For 2.4 And 2.5

2 Jan 2003 (1 post) Archive Link: "[PATCH] Version 16 of the IPMI driver"

People: Corey Minyard

Corey Minyard announced:

This is yet another release of the IPMI driver for Linux. This release cleans up the rather broken locking that was in the driver and fixes the linux command line parsing so the driver may be properly compiled into the kernel. Patches are relative to 2.4.20 and 2.5.54.

This release adds minor bugfixes to the watchdog and fixes for handling buggy hardware. It adds the ability to have the user be notified of a pretimeout if not using NMIs for pretimeout notification.

As usual, you can get the drivers from SourceForge. The home page is gets you directly to the page with the info.

PS - In case you don't know, IPMI is a standard for system management, it provides ways to detect the managed devices in the system and sensors attached to them. You can get more information at

7. More Memory Management Updates

4 Jan 2003 - 5 Jan 2003 (12 posts) Archive Link: "2.5.54-mm3"

Topics: FS: autofs, FS: devfs, FS: ext2, Real-Time

People: Andrew Morton

Andrew Morton announced more memory management updates:

Several patches here which fix pretty much the last source of long scheduling latency stalls in the core kernel - long-held page_table_lock during pagetable teardown.

The preemptible kernel now achieves around 500 microsecond worst-case latency on a 500MHz PIII (with a slow memory system). This is about as good as the 2.4 low-latency patch. Maybe better.

This is with ext2, and only with ext2. Other filesystems need work to reach that level of performance.

Non-preemptible kernels will benefit as well. This sort of means that preemptibility is only really needed for specialised multimedia/control type apps. Opinions vary ;)

Filesystem mount and unmount is a problem. Probably, this will not be addressed. People who have specialised latency requirements should avoid using automounters and those gadgets which poll CDROMs for insertion events.

This work has broken the shared pagetable patch - it touches the same code in many places. I shall put Humpty together again, but will not be including it for some time. This is because there may be bugs in this patch series which are accidentally fixed in the shared pagetable patch. So shared pagetables will be reintegrated when these changes have had sufficient testing.

Hugh, could you please closely review these changes sometime? Thanks.

Steven Barnhart was upset about the automount situation, since he couldn't get it working in 2.5.54; but he did recognize that such things happen in a development cycle. Andrew replied, "autofsv4 has been working fine across the 2.5 series. You'll need to send a (much) better report." Steven gave more information and they went back and forth a little, but Steven had trouble figuring out exactly what information to post, and the thread petered out. At one point Andrew guessed, "There is a devfs mounting problem in 2.5.54. If you're using devfs you may find that will help"

8. IDE Still Hard To Develop For In 2.5

4 Jan 2003 - 5 Jan 2003 (6 posts) Archive Link: "[PATCH] Make ide-probe more robust to non-ready devices"

Topics: Disks: IDE

People: Benjamin HerrenschmidtAlan CoxEric W. Biederman

Benjamin Herrenschmidt posted a patch and explained:

I've needed this patch (well, this is a cleaned up version of what I used actually) for some time on PPC and on some embedded platforms. The issue that typically happens is when the kernel is booted with an IDE device still doing it's POST sequence (or just beeing reset, that is with no firmware or a firmware that doesn't wait for the device to be ready before booting the kernel).

The patch just waits up to 35 seconds (30 seconds per spec, plus a small margin to deal with a couple of bogus drives I saw that took 31 seconds) for the BUSY bit to go away on an HWIF.

It's mandatory in the IDE spec to pull-down D7 to ground on an inteface, so that an interface with no driver connected should return a value with bit BUSY 0x80 cleared, thus will not trigger this wait loop. I did a sanity check against 0xff anyway to deal with a couple of bogus interfaces I encountered though.

I don't expect this patch to break any existing working configuration, so please send to Linus for 2.5. If you accept it, I'll then send a 2.4 version to Marcelo as well. This have been around for some time and, imho, should really get in now.

Alan Cox replied:

There is a ton of stuff pending for 2.5 IDE. Unfortunately 2.5 isn't in a state I can do any usable testing so it will have to wait. The Marcelo 2.4 tree is current and I'm doing the work in 2.4 first now.

Rusty seems to have a lot of the module stuff in hand so hopefully I'll get back onto 2.5 after LCA

And Benjamin said:

Well, actually, I'd like to see this patch in 2.4 asap too ;) It should apply "as is" with some offset.

As Eric W. Biederman noticed, it may not be enough for some really broken devices, but will not harm neither on these, and will fix the problem on a whole lot of better ones. It's definitely necessary with some WD hard disks and the "combo" DVD/CDRW drive shipped by Apple on some ibooks (Apple firmware typically does a reset of all drives just before booting the kernel, without waiting)

9. Status Of Page Coloring Patch For 2.4

4 Jan 2003 - 5 Jan 2003 (6 posts) Archive Link: "[PATCH] rewritten page coloring for 2.4.20 kernel"

Topics: Disks: IDE

People: Jason PapadopoulosWilliam Lee Irwin III

Jason Papadopoulos announced:

After a year in stasis, I've completely rebuilt my kernel patch that implements page coloring. Improvements include:

Right now the actual page coloring algorithm is the same as in previous patches, and performs the same. In the next few weeks I'll be trying new ideas that will hopefully reduce fragmentation and increase performance. This is an early attempt to get some feedback on mistakes I may have made.

lmbench shows no real gains or losses compared to an unpatched kernel; some of the page fault and protection fault times are slightly slower, but it's close to the rounding error over five lmbench runs.

Here are all the performance results I have for the patch:

  1. Compile of 2.4.20 kernel with gcc 3.1.1 on 466MHz DS10 Alphaserver with 2MB cache: repeatable 1% speedup (573 sec vs. 579 sec)
  2. 1000x1000 matrix multiply: 10% speedup on Athlon II with 512kB cache (Dieter N?tzel)
  3. Without page coloring, the alpha gets 80% of max theoretical bandwidth for working sets at most 1/8 the size of its L2 cache. For larger working sets than that the achieved bandwidth is only 30%-50% of max. With page coloring, the 80% figure applies to the entire L2 cache.
  4. FFTW (alpha): 30% speedup for 64k-point FFTs, 20% speedup for 1M-point FFTs

Patch is available at

William Lee Irwin III asked pragmatically, "Any chance for a 2.5.x-mm port? This is a bit feature-ish for 2.4.x." Jason replied, "I know. The problem is that 2.5.53 cannot finish booting on the Alpha I have here (IDE issues). While I can port the patch over, I'm not comfortable being unable to test it at all." In a subsequent post he asked (perhaps tongue-in-cheek), "Is 2.4 really in bug-fix mode now? 2.4.19 and 2.4.20 were huge patches." William tried to help him on the 2.5 IDE issues, but they didn't get far, and the thread ended.

10. Status Of DM Filesystem In 2.5

5 Jan 2003 - 6 Jan 2003 (3 posts) Archive Link: "dm fs?"

Topics: Device Mapper, FS: sysfs, Ioctls

People: Jeff GarzikJoe ThornberGreg KHAndrew Morton

Jeff Garzik aksed:

What is the status of dmfs going into mainline?

I saw that Greg KH posted a patch with some corrections to dmfs for 2.5.50?

IMO it would be nice to have a kernel config option that makes the ioctl method optional when dmfs is set to y or m in kernel config. That will not only save a bit of code space, but it will also serve to encourage use of dmfs. :)

Joe Thornber replied:

The last version I released is here:

There are still a couple of easy to fix issues with it (eg. the kmalloc while a spin lock is held that Andrew Morton pointed out :).

Both Andrew Morton and Greg KH expressed concerns with the way I've mapped the dm semantics onto the filesystem ( So Greg is currently trying to get a sysfs interface working.

We need to get a concensus of opinion in the community as to what is a good interface. I'm not going to be rushed into including something in dm that could cause critism for years to come. dmfs is what Alasdair Kergon and I have proposed, we're just waiting for an alternative to kick off the discussions ATM.

Greg KH confirmed he was working on the sysfs interface, and added, "Yes, and hopefully I'll have something that works later this week, after I've dug out under this mount of email..."

11. More Work On devfs Replacement; Maybe Too Late For 2.5

5 Jan 2003 - 8 Jan 2003 (7 posts) Archive Link: "Patch(2.5.54): devfs shrink - integration candidate"

Topics: Code Freeze, FS: devfs, Spam

People: Adam J. RichterAndi KleenAndrew WalrondJohn BradfordH. Peter AnvinRichard Gooch

Adam J. Richter announced:

The sixth iteration of my devfs code shink is available here:

I believe the deletions make the patch so big that the linux-kernel mailing list filters prevent me from submit an email that includes it.

This patch reduces include/linux/devfs*.h and fs/devfs from 3655 lines to 1239, a reduction of 2450 lines, nearly a factor three. That may not be as impressive as the original 5X reduction, but that is mostly because I've restored a bunch of functionality that I hope to eliminate in the future.

I'd like to thank Richard Gooch for writing devfs. I think it was a great idea and the effort involved in implementing it, especially when it was not clear that it could work well, was probably about 30-100 times my effort in shrinking it. I immediately became a convert within a day of trying it. I'd be happy for Richard to take over this code and continue maintaining devfs. If he doesn't want to, I'm willing to and I'm also happy to let someone else do it if they want.

This is nearly the same patch that I attempted to post on January 2, but apparently some well intentioned spam filter blocked it. I had this problem once before, also when submitting a big patch with a lot of deletions sent as a MIME attachment. This time I'm submitting the patch as part of the text of my message.

The there are no code changes between this version and the one that I tried to post on January 2. In the meantime, I've used it and stared at it more, and now I'm posting this as a candiate for integration into Linus's kernel.

The January 2 version introduced two significant changes: isolating the filesystem driver to a separate file that only exports two symbols (devfs_vfsmount and init_devfs_fs), and making that patch a change to fs/devfs rather than a new filesystem. (If anyone would prefer that I submit this as a separate file system, please let me know.)

If you want devfsd functionality (well, at least the "REGISTER" and "LOOKUP" events), you can get my user level program devfs_helper, which is a reduced functionality replacement program for devfsd from the following URL.

devfs_helper is program that is exec'ed on each event rather than being a daemon that waits on events. When the new module_param code is further developed, I will default the devfs_helper to be turned off until a user level program sets the name of the program.

Finally, I'd like to move forward toward getting this into Linus's kernel. Any blessings, curses, requests for changes or advice on the best way to proceed would be appreciated.

Andi Kleen replied, "Me thinks you're two to three months too late for 2.5. This looks definitely not like a good idea to merge in a feature/code freeze, when the main goal should be to finally get 2.6 out. Submit it when 2.7 opens." Andrew Walrond remarked, "This would be a shame as it's excellent work and really just an tidy up of existing code rather than a new feature?" John Bradford supported this, "Especially as we are not in a code freeze yet."

On a technical note, H. Peter Anvin asked, "Do we have any idea what the impact of this is on runtime data size? I seem to remember devfs playing lots of tricks to reduce its working set. If this code size reduction ends up pinning large data structures like dentries and inodes which wouldn't otherwise have been pinned, this could be a significant lose." But there was no reply.

12. Status Of The New 2.5 Driver Model

6 Jan 2003 - 8 Jan 2003 (12 posts) Archive Link: "status on the new driver model?"

Topics: Code Freeze, Disks: IDE, Disks: SCSI, FS: sysfs, PCI, USB

People: Louis GarciaAnders FugmannPatrick MochelGreg KH

Louis Garcia asked, "What is the status of the new driver model? Are the driver being ported over in a timely fashion? Is this process going to be complete before the code freeze/2.6? I've heard the PCI bus and drivers are not yet converted to strut devices? Oh, is the old LDM or what ever is was being riped out or left for compatibility?" Greg KH replied that the driver model was currently working, and Louis should mount sysfs and take a look for himself. Drivers were indeed being ported in a timely fashion, and there was definitely hope of finishing things up before 2.6. Greg suggested Louis look at the 2.5 code, as all his questions were answered there. He also suggested looking through the Documentation/driver-model directory in the kernel sources.

Anders Fugmann said, "I'm voluntering to try and make some porting/cleanup. Are there some good small modules that needs porting (Lets start easy)?" And Patrick Mochel replied:

Great, glad to hear it.

I would suggest at least browsing Documentation/driver-model/*.txt and reading the appended document. (I've responded to several similar emails in the past, and realized there was no master document for describing this, so I decided to write one.)

Hopefully, this is useful to you and others. I'll be adding this to Documentation/driver-model/.

In general, the driver model infrastructure (and kobject infrastructure) are all about using generically defined objects and routines, rather than duplicating them again and again. Because of that, most of the changes happen at a high level (i.e. in the bus driver) rather than at the low-level (i.e. in the driver modules).

The appended document describes how to convert a bus driver to the new model, which covers the representation of devices and device drivers. It is a gradual process that can be done in several steps.

If you don't feel up to taking on an entirely unconverted driver, please feel free to help out with ones that are already underway, including:

Alternatively, if someone doesn't feel up to converting the drivers, something that would be really handy would be a list of all bus types and device classes that the kernel supports, and the drivers that belong to each. It's not necessarily an easy list to compile, but again something that can happen gradually. ;)

13. Linux 2.4.21-pre3 Released

6 Jan 2003 - 7 Jan 2003 (5 posts) Archive Link: "Linux 2.4.21-pre3"

People: Marcelo Tosatti

Marcelo Tosatti announced 2.4.21-pre3.

14. Status Of Adaptec 79xx Support In 2.4; Status Of rmap In -ac Tree

7 Jan 2003 - 8 Jan 2003 (7 posts) Archive Link: "Question for Marcelo"

Topics: Disks: IDE, Virtual Memory

People: Samuel FloryAlan CoxTomas Szepe

Someone asked if support for the Adaptec 79xx would appear in the main 2.4 tree anytime soon. The driver from Adaptec's web site seemed to be working fine. Samuel Flory replied, "I believe that he would prefer that it get tested in the ac tree 1st. Alan seemed receptive to including it, but he's not doing much with the 2.4 ac kernel any more." . And Alan Cox said, "I've been working on merging a lot of stuff with Marcelo and cleaning up the other changes. 2.4.21pre-ac should be out today, and its a lot smaller than before as Marcelo as almost all the apic stuff, IDE updates etc. I've also dropped rmap out for now."

Tomas Szepe raised an eyebrow at this last, and asked why Alan had dropped rmap from his tree. Alan explained, "15a wasnt working very well, the base VM isn't too bad now and its a _lot_ easier to do merging with Marcelo without rmap. The other related bits are seperated out but present (vm overcommit handling, fixed shmem, removepage callback)"

15. Status Of NTFS Write Support In 2.4

8 Jan 2003 (3 posts) Archive Link: "status of ntfs write-support in 2.4.20"

Topics: FS: NTFS

People: Joshua M. KwanPawel Kot

Folkert Vanheusden asked about the status of NTFS write support for 2.4; Joshua M. Kwan replied, "I believe it is still very unsafe. It *can* be done but you have to mess with scandisk everytime you reboot back to's very very dirty and quite hackish. I wouldn't think about risking data on an NTFS partition through the limited NTFS driver in Linux 2.4 (even 2.5.)." Pawel Kot added:

Well, the ntfs driver from the 2.4.20 vanilla kernel has really dangerous write support for the ntfs partitions. It is strongly discouraged to use it. You can use though the backport driver from the 2.5 kernel series (aka ntfs-tng). It allows you to overwrite the files using mmap() and write(). So, neither size changes, nor attribute changes. You'll find mode detailes along with the driver itself at







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.