Kernel Traffic
Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic

Kernel Traffic #205 For 14�Feb�2003

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1350 posts in 6821K.

There were 395 different contributors. 183 posted more than once. 137 posted last week too.

The top posters of the week were:

1. Ancient Race Condition On 2.0, 2.2, 2.4, And 2.5 Kernels

2�Feb�2003�-�10�Feb�2003 (14 posts) Archive Link: "2.0, 2.2, 2.4, 2.5: fsync buffer race"

Topics: FS: ext2

People: Mikulas Patocka,�Andrew Morton,�Linus Torvalds,�Pavel Machek

Mikulas Patocka reported:

there's a race condition in filesystem

let's have a two inodes that are placed in the same buffer.

call fsync on inode 1
it goes down to ext2_update_inode [update == 1]
it calls ll_rw_block at the end
ll_rw_block starts to write buffer
ext2_update_inode waits on buffer

while the buffer is writing, another process calls fsync on inode 2
it goes again to ext2_update_inode
it calls ll_rw_block
ll_rw_block sees buffer locked and exits immediatelly
ext2_update_inode waits for buffer
the first write finished, ext2_update_inode exits and changes made by second proces to inode 2 ARE NOT WRITTEN TO DISK.

This bug causes that when you simultaneously fsync two inodes in the same buffer, only the first will be really written to disk.

Andrew Morton confirmed this, and said, "This is a general weakness in the ll_rw_block() interface. It is not suitable for data-integrity writeouts, as you've pointed out." He proposed a fix for 2.5, but Mikulas didn't think that would be satisfactory; and they discussed the various implications. Elsehwere, Pavel Machek suggested the bug should also be fixed in the 2.4 tree, but Mikulas said:

It should, but it is a hazard. The problem is that every use of ll_rw_block has this bug, not only the one in ext2 fsync. The most clean thing would be to modify ll_rw_block to wait until buffer becomes unlocked, no one knows if it can produce some weird things.

Even Linus didn't know what he was doing, see this comment around the buggy part in 2.2, 2.0 and previous kernels.

        /* Uhhuh.. Nasty dead-lock possible here.. */
        if (buffer_locked(bh))
        /* Maybe the above fixes it, and maybe it doesn't boot. Life is interesting */

Elsewhere, he asked Linus Torvalds what he'd meant by that comment, and Linus replied:

Nope, that's a _long_ time ago. It may well have been simply a MM bug that got fixed long since, ie something like

He and Andrea and Andrew hunted around for a bit, and the thread ended.

2. SysFS Interface For ZT5550 Redundant Host Controller In 2.5

4�Feb�2003�-�6�Feb�2003 (11 posts) Archive Link: "[PATCH][2.5.59-bk]Sysfs interface for ZT5550 Redundant Host Controller"

Topics: FS: sysfs, PCI, Version Control

People: Rusty Lynch,�Greg KH,�Scott Murray

Rusty Lynch announced:

Last week I finally got access to a decent (but old) technical specification for the ZT5550 redundant host controller. The document was published for the ZT5550C, but I am hoping that newer versions of the RHC just add more functionality to all the documented reserved bits in the document I am looking at.

The following patch adds a sysfs interface to most of the bits accessible via the indirect register (through the HCINDEX and HCDATA addresses in the Command and Status Register (CSR). The only bits I did not add access to were the ones that are cleared by reading. There are a lot of bits to get access to, which makes this patch a little bigger then I first expected, so I created a new config option so only people who actually want to mess with the RHC would pay for it.

Enabling this code will cause a new directory called zt5550_rhc to be created in the root of sysfs

He showed an extensive directory tree, saying:

This provides all kind of rope to hang yourself with, but it was fun messing with it. This also points out other areas where an interested party could further enable this particular board, for example adding the code to respond to a fault condition (after enabling fault handling through sysfs).

This patch was generated off of today's bk tree with the previously posted patch by Scott to fix the deadlock issue and the patch posted by Stanley to add sysfs support. Both of these patches are attached as a single patch to this email message.

Greg KH said that the root of sysfs was not the place for this directory, and suggested "putting this directory either under the pci device that is the zt5550 (if it is a pci device), or at the least, under the devices/ directory." Rusty said he'd go with the former location. Later, Rusty posted an updated patch with more advanced features, and after a recommendation by Scott Murray, Greg accepted the patch, and aimed it up to Linus.

3. klibc Update

4�Feb�2003�-�6�Feb�2003 (5 posts) Archive Link: "klibc update"

Topics: Executable File Format, FS: initramfs, FS: ramfs, Hot-Plugging, Klibc, Version Control

People: Greg KH,�Arnd Bergmann,�H. Peter Anvin

Greg KH explained:

For those wondering what's happening with klibc, here's an update...

I have it building relatively well within the kernel, and have modified the usr/gen_init_cpio.c file to add files to the cpio "blob". That all seems to work, but I don't seem to be able to extract the files properly (or at least that's what I'm guessing is happening).

If anyone wants to see the current progress, there's a big patch against 2.5.59 at:

and a bk tree with the different changes broken down into "logical" chunks at:


Any help with trying to debug init/initramfs.c to figure out what is going wrong would be greatly appreciated.

H. Peter Anvin was impressed, and Arnd Bergmann offered his initramfs experience:

I've managed to mount the initramfs with MS_BIND into my root fs and found why /sbin/hotplug cannot be run currently. There is some off-by-one bug during file extraction that causes the first byte of the file to get left out. I.e. the file starts with "ELF\001" instead of "\577ELF".

This may or may not be related to another off-by-one bug that I'm seeing sometime when unpacking initramfs on s390x ("panic: length error").

The patch below is how I hacked prepare_namespace() to keep initramfs visible after boot.

Later he explained, "I found what kept initramfs from working here: While creating of initramfs_data.cpio.gz, the padding between a file header and the file contents was wrong, which can be verified by unpacking the archive by hand. The trivial patch below fixed this for me." Greg liked that patch and incorporated it into his tree.

4. Journalling Support For IDE In 2.4

5�Feb�2003�-�6�Feb�2003 (5 posts) Archive Link: "[PATCH] ide write barriers"

Topics: Disks: IDE, FS: ReiserFS, FS: ext3, Version Control

People: Jens Axboe,�Marc-Christian Petersen

Jens Axboe said, "The attached patch implements write barrier operations in the block layer and for IDE, specifically. The goal is to make the use of write back cache enabled ide drives safe with journalled file systems. Patch is against 2.4.21-pre4-bk as of today, and includes a small patch to enable it on ext3. Chris has a patch for reiserfs as well." Marc-Christian Petersen was happy about this, but asked if Jens could make another patch against 2.4.20; Jens replied, "Sure, I had that one already. BTW, I discovered that the default io scheduler forgets to honor the cmd_flags, it's supposed to break like the noop does (see very first hunk in very first file). Must have removed that by mistake some time ago... This applies both to the 2.4.21-pre4 patch posted and this one."

Marc tried out the new version and remarked, "I don't have benchmarks handy yet but as far as I can _feel_, this is a _MUST_ (I repeat: a _MUST_ for 2.4.21). And I am very good in feeling slowdowns for interactivity :)"

5. syscalltrack 0.82 Released

5�Feb�2003 (1 post) Subject: "ANN: syscalltrack 0.82 "Minty Chinchilla" released"

People: Muli Ben-Yehuda,�Muli

Muli Ben-Yehuda announced, "syscalltrack is made of a pair of Linux kernel modules and supporting user space environment which allow interception, logging and possibly taking action upon system calls that match user defined criteria. syscalltrack can operate either in "tweezers mode", where only very specific operations are tracked, such as "only track and log attempts to delete /etc/passwd", or in strace(1) compatible mode, where all of the supported system calls are traced. syscalltrack can do things that are impossible to do with the ptrace mechanism, because its core operates in kernel space." He said:

syscalltrack-0.82, the 14th alpha release of the Linux kernel system call tracker, is now available. syscalltrack supports version 2.4.x of the Linux kernel on the i386 platform.

This release containes several new features, bug fixes and cleanups.

New in version 0.82, "Minty Chinchilla"

6. XFS Patches For 2.4.20

5�Feb�2003 (1 post) Archive Link: "Announce: XFS split patches for 2.4.20 - respin"

Topics: Access Control Lists, FS: XFS

People: Keith Owens

Keith Owens announced:

The xfs patches for 2.4.20 have been respun as of 2003-02-05 23:39 UTC, including kdb v3.0.

For some time the XFS group have been producing split patches for XFS, separating the core XFS changes from additional patches such as kdb, xattr, acl, dmapi. The split patches are released to the world with the hope that developers and distributors will find them useful.

Read the README in each directory very carefully, the split patch format has changed over a few kernel releases. Any questions that are covered by the README will be ignored. There is even a 2.4.21/README for the terminally impatient :).

7. Replacing pcihpfs With sysfs In 2.5

5�Feb�2003 (17 posts) Archive Link: "[BK PATCH] PCI Hotplug changes for 2.5.59"

Topics: Bug Tracking, FS: sysfs, Hot-Plugging, PCI

People: Greg KH

Greg KH announced:

Here's a series of PCI Hotplug patches, a few related PCI core patches, and two small, related sysfs patches.

The hotplug driver patches consist of a lot of bug fixes due to problems found by the smatch and checker projects, and a big patch to remove pcihpfs and use sysfs instead from Stanley Wang. I've also moved the few functions in drivers/hotplug/pci_hotplug_util.c to drivers/pci/hotplug.c which is a better place for them.

There are some sysfs updates for pci devices from Dan Stekloff and a new function was added to sysfs to support the move from pcihpfs to sysfs. This sysfs patch was blessed by Pat Mochel.

8. Discontigmem Support For The IBM x440

5�Feb�2003�-�8�Feb�2003 (3 posts) Archive Link: "[PATCH][RFC] Discontigmem support for the x440"

Topics: Hyperthreading, Power Management: ACPI

People: Patricia Gaughen,�Andrew Grover,�John Stultz,�James Cleverdon

Patricia Gaughen from IBM announced:

This patch provides discontigmem support for the IBM x440. This code has passed through the hands of several developers: Chandra Seetharaman, James Cleverdon, John Stultz, and last to touch it, me :-) This patch requires full acpi support.

I've tested this patch on an 8 way x440 16 GB of RAM with and without HT (acpi=off).

Andrew Grover pointed out that part of her patch broke ACPI event handling, and that it would need to be fixed; but remarked, "Other than that, thumbs up. SRAT support is a good thing to have."

9. Linux Test Project 20030206 Released

6�Feb�2003 (1 post) Archive Link: "[ANNOUNCE] LTP-20030206"

Topics: Bug Tracking, Hyperthreading, Version Control

People: Robert Williamson

Robert Williamson announced:

The Linux Test Project test suite LTP-20030206.tgz has been released. Visit our website ( to download the latest version of the testsuite that contains 950+ tests for the Linux OS. The site also contains other information such as: test results, a Linux test tools matrix, technical papers and HowTos on Linux testing, and a code coverage analysis tool. There is also a list of test cases that are expected to fail, located at (

The highlights of this release are:

We encourage the community to post results, patches, or new tests on our mailing list, and to use the CVS bug tracking facility to report problems that you might encounter. More details available at our web-site.

10. ACPI License Change

6�Feb�2003 (1 post) Archive Link: "ACPI Licensing change"

Topics: BSD, Power Management: ACPI

People: Andrew Grover

Andrew Grover reported:

As of the next release, we will be adding the option to license the ACPI AML interpreter (drivers/acpi/*/*.c) under the BSD license, as well as the current, GPL license.

While this will nominally increase your rights w.r.t. the code, the real reason for this is for us to more easily accept external contributor's changes into the interpreter's code (a good thing for everyone).

The Linux-specific ACPI code (drivers/acpi/*.c) is not affected by this change (i.e. it is still GPL-only).

This was mentioned a couple of months ago, but we're now finally getting around to doing it. :)

11. klibc Update For 2.5

6�Feb�2003�-�9�Feb�2003 (5 posts) Archive Link: "[RFC] klibc for 2.5.59 bk"

Topics: FS: initramfs, FS: ramfs, Klibc, Version Control

People: Greg KH,�Arnd Bergmann,�H. Peter Anvin

Greg KH announced:

Thanks to Arnd Bergmann, it looks like the klibc and initramfs code is now working. I've created a patch against Linus's latest bk tree and put it at:

(I can't get to right now, sorry) and there's a bk tree at:


I'd really like to send this to Linus now, but I'm going to be away from email for about a week, so I'll wait will I get back. If anyone has any issues with this patch, please let me know.

H. Peter Anvin was happy for the delay, since it meant more time to pound on it, and asked what version of klibc it was based on. Greg replied, "klibc-0.72. Ugh, I see you've now released a few versions since then :( I'll sync up to the latest version before sending the patch on to Linus, thanks for making me look."

12. More devfs, initramfs, And Scheduler Work

7�Feb�2003�-�8�Feb�2003 (7 posts) Archive Link: "2.5.59-mm9"

Topics: FS: devfs, FS: initramfs, FS: ramfs, Real-Time

People: Andrew Morton,�Daniel Jacobowitz,�Adam J. Richter

Andrew Morton announced:

An hour or so later, he took the patch down, and reported, "there's something bad in the signal changes in Linus's current tree. mozilla won't display, and is unkillable." Daniel Jacobowitz confirmed, "Yeah, I'm seeing hangs in rt_sigsuspend under GDB also. Thanks for saying that they show up without ptrace; I hadn't been able to reproduce them without it. Something is causing realtime signals to drop." Later that day, Andrew said:

OK. Looks like Linus is hot on the trail.

BTW, some nice people have been sending in smalldevfs testing results (successful). I've put that patch back up at

for other testers. It applies to 2.5.59 base.

13. Possible License Violation By Castle Technology Ltd, UK

7�Feb�2003�-�10�Feb�2003 (5 posts) Archive Link: "The Linux Kernel and Castle Technology Ltd, UK"

Topics: PCI, USB

People: Russell King,�Alan J. Wylie

Russell King said:

I'm afraid that I have to bring this news to linux-kernel; people who have written code for the Linux kernel need to know about this, and we need to come to a decision about the action we wish to take. Taking no action sends a message that "we don't care what you do with kernel code, even if you violate the terms of the license."

It would appear that Castle Technology Limited, UK, have taken some of the Linux kernel 2.5 code, and incorporated it into their own product, "RISC OS", which is distributed in binary ROM form built into machines they sell. This code is linked with other proprietary code.

I have a detailed description which shows how the Linux source code can be slightly modified to produce the disputed code, with reasons each modification. This will be provided to people upon private email request.

Having discussed this with Linus, Linus is of the opinion that a public letter should be written to Castle Technology Ltd, copied to lkml and various news sites. However, I'd like to get this issue into the minds of people who have touched any of the following code:

The guy who reported the problem to me has already tried to contact the company concerned to ask for the source under the terms of the GPL, and this resulted in the "function signatures" being removed in the next version of the product, while the actual code remained. No other response was forthcoming.

Subsequently, during the first week of January, the guy has contacted the company again asking for the source covering the disputed code, this time copying me with the email. Again, no repsonse from Castle Technology has been forthcoming to date.

Someone pointed out the following quote from Castle Technology's web site: "Note that the source code for many of the Linux PCI device drivers is publicly available on the Internet and may be useful in developing the corresponding RISC OS device driver." Alan J. Wylie also pointed out:

There is more than one player involved with RISC OS:

Pace Micro, of Saltaire, West Yorkshire

seem to own the Copyright.

And RISCOS Ltd., of Cardiff, Wales

are involved as well.

Russell replied:

I'd like to point out, however, that Pace sub-license RISC OS. As far as I am aware at present, RISCOS Ltd do not distribute the code in question, and neither does Pace Microtechnology Ltd.

It has come to my attention that some people are trying to implicate the above companies in this.

I would strongly suggest people do not start to make (unfounded) claims against neither RISCOS Ltd nor Pace Microtechnology Ltd unless they have proof, in which case such proof should first contact the appropriate copyright holders concerned.

A couple days later, Russell said, "Castle Technology Limited ask me to post this press release to the Linux Kernel mailing list. By posting this press release, I wish to make it absolutely clear that I am not expressing any views either way with respect to this press release, merely passing the information on as requested." He included the press release:


10th February 2003

Castle Technology Limited note with interest the recent discussion regarding their IYONIX computer, the world's first desktop computer to use the Intel XScale processor.

Following discussions with Russell King and with this in mind, Castle should like to respond to claims originally proposed in Justin Fletcher's "/kernel-traffic/ReadMe.txt" file and Russell King's subsequent posting to the Linux Kernel Mailing List.

The RISC OS 5.00 kernel did not contain work taken from or derived from the ARM-Linux or Linux kernel.

The RISC OS 5.01 kernel did not contain work taken from or derived from the ARM-Linux or Linux kernel.

The RISC OS 5.02 kernel does not contain work taken from or derived from the ARM-Linux or Linux kernel.

There are no plans to use GPL derived code in any part of the RISC OS kernel in the future.

For the avoidance of doubt, the hardware abstraction layer (roughly analogous to a PC's BIOS) has it's PCI allocation and bridge setup based in part on the following functions from the Linux kernel sources:


Any company or individual wishing to receive a copy of the source code to this component should apply in writing to:

The Managing Director
Castle Technology Ltd
Ore Trading Estate
Woodbridge Road
IP13 9LL

enclosing a formatted 3.5" floppy diskette and return postage stamps, or international reply coupons for those outside the United Kingdom.

These sources will also form an integral part of a forthcoming Linux port to the IYONIX.

With the tough goal of fitting all of the supporting software and applications for the IYONIX computer into just 4Mbytes of ROM, later issues of the supporting software have had to have function names removed (along with a strategy of tokenising textual messages and compressing binaries) to make room for, in particular, the support for the 'boot keyboard' USB drivers.

Issued by Mike Williams on behalf of Castle Technology Ltd

14. perfctr 2.4.5 Released

9�Feb�2003 (1 post) Archive Link: "perfctr-2.4.5 released"

Topics: Profiling

People: Mikael Pettersson

Mikael Pettersson announced:

perfctr-2.4.5 is now available at the usual place:

This just is a minor maintenance release, before the API fixes and extensions which are scheduled for perfctr-2.5.

Version 2.4.5, 2003-02-09

15. Experiments In Disk I/O Scheduling

10�Feb�2003�-�11�Feb�2003 (4 posts) Archive Link: "[PATCH] SFQ disk scheduler"

Topics: Disks: IDE, Disks: SCSI, Version Control

People: Jens Axboe,�Andrea Arcangeli

Jens Axboe announced:

Here's a simple stochastic fairness queueing disk scheduler, for current 2.5.59-BK. It has known limitations right now, mainly because I didn't bother making it complete. But it should suffice for some rudimentary testing, at least.

I'm not going to go into great detail about how it works, see Andrea's initial post of the paper referenced. This version may not be completely true to the SFQ concept, but should be close enough I think. It divides traffic into a fixed number of buckets (64 per default), and perturbs the hash every 5 seconds (hash shamelessly borrowed from networking atm, see comment).

To avoid too many disk seeks, when it's time to dispatch requests to the driver, we round robin all non-empty buckets and grab a single request from each. These requests are sorted into the dispatch queue.

For performance reasons, io scheduler request merging is still a per-queue function (and not per-bucket).

In closing, let me stress that this version has not really been tested all that much. It passes simple SCSI and IDE testing, should work on any hardware basically.

He and Andrea Arcangeli talked it over briefly, and the Jens said:

In the nature of taking this concept to the extreme, here's a CFQ disk scheduler (it should be obvious by now, that I'm simply making up a TLA as I see fit :-), or Complete Fair Queueing. It never suffers from queue collisions.

So how does it work? As with SFQ, a hash of busy queues is maintained. If a queue for a given queue doesn't exist, one is simply allocated. The actual queueing of requests works like the SFQ scheduler I sent out yesterday, with little twist: we try to put at least cfq_quantum number of requests on the dispatch queue. If only a small number of processes are waiting for io, then this significantly helps throughput by minimizing the time spent between finishing one request and starting a new one.

Other changes/fixes from SFQ:

Interestingly, dbench results show little variance between runs with this CFQ scheduler. Another point of interesting to folks may be that it would be trivial to add process io priorities on top of CFQ (or SFQ for that matter, but I consider CFQ to be the superiour scheduler).

If you play with this, let me know how it fares. Patch is against 2.5.60.

16. Open POSIX Test Suite 0.2.0 Released

10�Feb�2003 (1 post) Archive Link: "[ANNOUNCE] Open POSIX Test Suite 0.2.0 Released"

Topics: POSIX

People: Julie N Fleischer,�Jim Houston

Julie N Fleischer announced:

Release 0.2.0 of the Open POSIX Test Suite is now available at This second release contains POSIX conformance tests for 50-80% of the POSIX functions of threads, signals, and semaphores. It also contains the full timers suite (tags TMR and CS) released in 0.1.0 with bug fixes.

The release notes that appear on download describe how to compile and run these tests.

The README page and the Open POSIX Test Suite website (above) give more information on the project goals and progress as well as information on how to contribute or contact us if you are interested.

Many thanks to Jim Houston and other members of the POSIX testing community for their bug fixes, patches, and suggestions on how to improve the 0.1.0 suite.

The Open POSIX Test Suite is an open source test suite with the goal of creating conformance test suites, as well as potentially functional and stress test suites, to the functions described in the IEEE Std 1003.1-2001 System Interfaces specification. Initial work is focusing on timers, threads, semaphores, signals, and message queues.

Feel free to contact if you would like further information.

Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.