Kernel Traffic
Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic

Kernel Traffic #120 For 28�May�2001

By Zack Brown

linux-kernel FAQ | subscribe to linux-kernel | linux-kernel Archives | | LxR Kernel Source Browser | All Kernels | Kernel Ports | Kernel Docs | Gary's Encyclopedia: Linux Kernel | #kernelnewbies

Table Of Contents

Mailing List Stats For This Week

We looked at 1372 posts in 5102K.

There were 415 different contributors. 190 posted more than once. 161 posted last week too.

The top posters of the week were:

1. SMP Race In ext2

26�Apr�2001�-�18�May�2001 (91 posts) Archive Link: "[PATCH] SMP race in ext2 - metadata corruption."

Topics: FS: ReiserFS, FS: ext2, SMP

People: Alexander Viro,�Linus Torvalds,�Chris Mason,�Andrea Arcangeli

Alexander Viro reported:

Ext2 does getblk+wait_on_buffer for new metadata blocks before filling them with zeroes. While that is enough for single-processor, on SMP we have the following race:

getblk gives us unlocked, non-uptodate bh
wait_on_buffer() does nothing
read from device locks it and starts IO
we zero it out.
on-disk data overwrites our zeroes.
we mark it dirty
bdflush writes the old data (_not_ zeroes) back to disk.

Result: crap in metadata block. Proposed fix: lock_buffer()/unlock_buffer() around memset()/mark_buffer_uptodate() instead of wait_on_buffer() before them.

Andrea Arcangeli confirmed the race, and speculated that other filsystems were probably affected as well. Chris Mason confirmed that reiserfs had the race; he added that he was working on a fix, but had to make sure he didn't hurt the balancing code.

Linus Torvalds got into the discussion, saying he saw the race, but didn't see how it could ever be triggered. It looked like a non-issue to him. He said, "Now, I don't disagree with your patch (it's just obviously cleaner to lock it properly), but I don't think this is a real bug." He added, "We used to have "breada()" do physical read-ahead that could have triggered this, but we've long since gotten rid of that. Or am I overlooking something?" Alexander pointed out that simply doing a 'dd' from the disk could trigger the race. True, he acknowledged, in that particular case the 'dd' could only be expected to produce garbage, and so no one in their right mind would do it, but he said:

Suppose /dev/hda1 is owned by root.disks and permissions are 640. It is mounted read-write.

Process foo belongs to pfy.staff. PFY is included into disks, but doesn't have root. I claim that he should be unable to cause fs corruption on /dev/hda1.

Currently foo _can_ cause such corruption, even though it has nothing resembling write permissions for device in question.

IMO it is wrong. I'm not saying that it's a real security problem. I'm not saying that PFY is not idiot or that his actions make any sense. However, I think that situation when he can do that without write access to device is just plain wrong.

Andrea agreed. They had a little back-and-forth, and at one point Linus remarked:

Note that I think all these arguments are fairly bogus. Doing things like "dump" on a live filesystem is stupid and dangerous (in my opinion it is stupid and dangerous to use "dump" at _all_, but that's a whole 'nother discussion in itself), and there really are no valid uses for opening a block device that is already mounted. More importantly, I don't think anybody actually does.

The fact that you _can_ do so makes the patch valid, and I do agree with Al on the "least surprise" issue. I've already applied the patch, in fact. But the fact is that nobody should ever do the thing that could cause problems.

The discussion went on, with various suggestions of legitimate actions that might trigger the race; some of which caused peals of laughter on both sides.

2. Serious Problems With Current 2.4 Kernels

9�May�2001�-�15�May�2001 (14 posts) Archive Link: "2.4.4 kernel freeze for unknown reason"

Topics: Virtual Memory

People: Vincent Stemen,�Alan Cox

Jacky Liu kept experiencing what appeared to be random lockups every couple days under 2.4.4; Vincent Stemen confirmed:

I have been experiencing these same problems since version 2.4.0. Although, I think it has improved a little in 2.4.4, it still locks up. The problem seems to be related to memory management and/or swap, and is seems to do it primarily on machines with over 128Mb of RAM. Although, I have not tested systematically enough to confirm this.

I have been monitoring the memory usage constantly with the gnome memory usage meter and noticed that as swap grows it is never freed back up. I can kill off most of the large applications, such as netscape, xemacs, etc, and little or no memory and swap will be freed. Once swap is full after a few days, my machine will lock up.

If I turn swap off all together or turn it off and back on periodically to clear the swap before it gets full, I do not seem to experience the lockups.

I am running on an AMD K6-400 with 256 Mb RAM but we have experienced these problems with various other machines as well.

Alan Cox replied cryptically, "The swap handling in 2.4 is somewhat hosed at the moment." He added, "I can give you a tiny patch that should fix the lockups and instead it will kill processes out of memory but thats obviously not the actual fix 8)" Later, he also said, "I switched my desktop box back to 2.2 a while back until the VM works." Yikes!

3. LVM Development Policy

11�May�2001�-�16�May�2001 (19 posts) Archive Link: "LVM 1.0 release decision"

Topics: Disk Arrays: LVM

People: Heinz Mauelshagen,�Jeff Garzik,�Alan Cox

Heinz Mauelshagen explained:

As most of you probably know, we've got criticism a couple of weeks ago about our Linux kernel patch policy causing the LVM vanilla kernel code to differ from the one we release directly.

In order to avoid this difference we provide smaller patches more often now. We have started already with a subset of about 50 necessary patches.

Even though we get kind support from Alan Cox to get those QAed and integrated, the pure amount of patches will take at least a couple of weeks to make it in.

This leads to the dilemma, that trying to avoid further differences between our LVM releases and the stock kernel code would force us into postponing the pending LVM 1.0 release accordingly which OTOH is incovenient for the LVM user base.

In regard to this situation we'ld like to know about your oppinion on the following request: is it acceptable to release 1.0 soon *before* all patches to reach the 1.0 code status are in vanilla (presumed that we provide them with our release as we always did before)?

We'll gather your answers for some days and will send the conclusion to the lists.

Jeff Garzik replied, "Are you sending them all in one batch (50 e-mails to Linus at once), or trickling them to Linus a few at a time? It might be faster to send him a batch (though not necessarily 50), noting with each e-mail, after each patch's description, that the particular patch depends patches C, F, and H that have come before it. That way Linus can apply 8 out of 10 patches, and then you synchronize with him and start the cycle over again."

4. Fix For Exar ST16C654 In Stable Series

11�May�2001�-�17�May�2001 (8 posts) Archive Link: "[PATCH] drivers/char/serial.c bug in ST16C654 detection"

People: Val Henson,�Stuart MacDonald

Val Henson posted a patch, and explained:

This fixes a bug in the autoconfig_startech_uarts function in serial.c. The problem is that 0's are written to the baud rate registers in order to detect an XR16C850 or XR16C854. This makes the Exar ST16C654 go kablooey. Saving and restoring the baud rate registers after the test fixes it.

I'm assuming that the XR16C85[04] detection works as is and doesn't need the original baud rate restored. If I'm wrong, I'll rewrite the patch.

Stuart MacDonald, who'd worked on the code in question, asked for clarification and offered some criticisms of the patch. In particular, he wanted to know the version numbers for the serial driver, the kernel, and the distribution Val used to get the Exar to "go kablooey". He felt Val's patch was good, but he wanted to understand the behavior Val saw. Val replied that she was using the serial driver version 5.05a (2001-03-20) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled, and kernel version 2.4.5-pre1. They went back and forth, but Stuart was unable to reproduce the crash.

5. Device Numbers; Developer Discontent

14�May�2001�-�21�May�2001 (365 posts) Archive Link: "LANANA: To Pending Device Number Registrants"

Topics: FS: devfs, FS: ramfs, Virtual Memory

People: H. Peter Anvin,�Jeff Garzik,�Alan Cox,�Richard Gooch,�Linus Torvalds,�Rik van Riel

H. Peter Anvin explained:

Linus Torvalds has requested a moratorium on new device number assignments. His hope is that a new and better method for device space handing will emerge as a result.

Alan Cox has requested that I maintain a forked registry for his -ac kernel patch tree. I have agreed to do so once I have forked off the "final" version of the registry for Linus' tree. At that time I will process the backlog for the benefit of the -ac registry only. Please have patience until I can get that to happen.

Please note that this is not my decision (in fact, I have serious concerns with it.) In particular, /dev namespace coordination still applies.

Jeff Garzik replied:

Here's my suggestion for a solution.

Once I work through a bunch of net driver problems, I want to release a snapshot block device driver (freezes a blkdev in time). For this, I needed a block major. After hearing about the device number freeze, I was wondering if this solution works:

Register block device using existing API, and obtain a dynamically assigned major number. Export a tiny ramfs which lists all device nodes. Mounted on /dev/snap, /dev/snap/0 would be the first blkdev for snap's dynamically assigned major. (Al Viro said he has skeleton code to create such an fs, IIRC)

This solution

  1. keeps from grot-ing up /proc even more [I had considered proc_mknod() until viro talked me out of it]
  2. does not require centrally assigned majors and minors.
  3. does not require devfs. most distros ship without it afaik, and

switching to it is not an overnight process, and requires devfsd to be useful in the real world.

H. Peter replied that Jeff's proposal did not, however, "manage permissions, nor does it provide for a sane namespace (it exposes too many internal implementation details in the interface -- in particular, the driver becomes part of the namespace, and devices move around between drivers regularly.)" But Jeff replied that this was taken care of by devfs and devfsd. But Alan Cox said, "As to devfsd well Al Viro was reporting races in it long ago that I don't believe Richard has had time to fix nor has anyone else fixed. What is the state on devfs there ?" Richard Gooch replied, "Actually, it was devfs, not devfsd that Al was complaining about. Fortunately these races are hard to trigger without deliberately trying to trigger them, otherwise I'd be inundated with bug reports :-/" He added regarding status, "Getting very close now. This last weekend was my first time for ages that I've had an uninterrupted weekend to hack on Linux and didn't have other really urgent stuff to deal with."

At this point the discussion of increasing the size of dev_t came up again, with Linus starting it off:

a 32-bit (or 64-bit) dev_t does NOT make it any easier to manage permissions or anything like that anyway. Look at the current mess /dev is. Imagine it an order of magnitude worse.

Big device numbers are _not_ a solution. I will accept a 32-bit one, but no more, and I will _not_ accept a "manage by hand" approach any more. The time has long since come to say "No". Which I've done. If you can't make it manage the thing automatically with a script, you won't get a hardcoded major device number just because you're lazy.

End of discussion.

To the last sentence, Rik van Riel replied:

I've been doubting whether to work on both the -ac kernels and the -linus tree, but this is a pretty good argument for sticking with -ac and just ignoring the -linus tree...

Lets see what happens...

Alan Cox replied:

Time will make that decision. Linus kindly gave us all the power to vote with our feet. One thing I absolutely refuse to do is to let a disagreemnt over some specific device implementation turn into an excuse for a wider difference in the trees.

So yes -ac might have static majors but the rest of it I intend to keep merging with Linus and tracking closely to his tree. Certainly not ignoring the -linus tree.

Rik replied, "Agreed. However, if this thing means I cannot use the -linus tree without devfs, then it will also mean my VM stuff only gets tested on -ac kernels..."

Alan also replied to Linus directly, regarding device numbers, saying, "on that issue I'm so convinced you are wrong I'm prepared to maintain sensible Unix device behaviour in the -ac pretty much indefinitely." He added that he also didn't like Linus' "end of discussion" approach. He said (over the course of several posts) that Linus was making a similar mistake to the Plan9 OS -- a beautiful system, in his opinion, but with no significant user base because it had compatibility problems. As far as vendors' willingness to go along with Linux incompatibilities, he said, "I bet the vendors in question dont think the sun shines out of linus backside any more." This did not impress Linus, who replied at one point, "I know what I want, and I've let the current mess go on for too long. If it takes some pain to fix it, then so be it. It needs to be fixed, even if people suddenly start thinking that the light of my a** dimmed a bit. That's ok."

Elsewhere there followed a long discussion over the best ways to handle device numbers, devfs, and compatibility.

6. New rootfs For 2.5

16�May�2001 (21 posts) Archive Link: "[PATCH] rootfs (part 1)"

Topics: FS: ramfs, FS: rootfs

People: Alexander Viro,�Alan Cox,�Linus Torvalds

Alexander Viro posted a patch and explained:

Linus, patch is the first chunk of rootfs stuff. I've tried to get it as small as possible - all it does is addition of absolute root on ramfs and necessary changes to mount_root/change_root/sys_pivot_root and follow_dotdot. Real root is mounted atop of the "absolute" one.

More interesting stuff lives in the next parts - once we have rootfs we can get rid of a lot of cruft in fs/super.c around mounting real root and switching it after initrd. In particular, we can get rid of the umount_root flag in do_umount() and kill_super() which allows much cleaner handling of vfsmounts. I'll try to feed this stuff in small and obvious pieces.

It's transparent to userland - the only visible effect is an extra line in /proc/mounts. Moreover, it's transparent to the kernel - the only functions that really care are those that do the first mount.

One point that might be better done differently - since we need ramfs for boot I've just made fs/ declare CONFIG_RAMFS as define_bool CONFIG_RAMFS y. If ramfs grows (e.g. gets resource limits patches from -ac) we might be better off doing a minimal variant permanently in kernel (calling it rootfs) and making ramfs use rootfs methods. It's completely separate issue, so I've done it the simplest way for the time being.

Linus Torvalds and Alan Cox felt this was definitely 2.5 material, though Linus added that the patch itself looked OK. Alexander felt the patch was local enough to make it into 2.4, though he didn't have strong feeling either way. Elsewhere, he added:

I think that it's OK for 2.4, but then I'm obviously biased (mostly by the fact that I know how much it allows to clean up without breaking any compatibility, including binary compatibility in the kernel). Up to you, indeed.

There was a bit of technical discussion about the patch itself, and the thread ended.

7. Linux IA-32 Init Doc Available

18�May�2001 (1 post) Archive Link: "[announce] Linux 2.4 x86 init."

People: Randy Dunlap

Randy Dunlap announced:

I've documented x86/i386/IA-32 Linux kernel init (after loaders). It's fairly large, so there may be too much detail here and I may cut back on it some if that seems to be needed.

Comments, feedback, corrections, and additions are welcome. As I say in the intro, hopefully some of you (or us) will find this useful/helpful.

It's viewable at

There was no reply.

8. JFS 0.3.2 Available

18�May�2001 (1 post) Archive Link: "Announcing Journaled File System (JFS) release 0.3.2 available"

Topics: FS: JFS

People: Steve Best

Steve Best announced:

Release 0.3.2 of JFS was made available today.

Drop 32 on May 18, 2001 (jfs-0.3.2-patch.tar.gz) includes fixes to the file system and utilities.

Function and Fixes in release 0.3.2

For more details about the problems fixed, please see the README.

There was no reply.

9. Very Old Config Bug; CML2 Discussion

19�May�2001 (8 posts) Archive Link: "Brown-paper-bag bug in m68k, sparc, and sparc64 config files"

Topics: Kernel Build System

People: Eric S. Raymond,�John Levon,�Mike Galbraith,�Benedict Bridgwater,�Miles Lane,�Keith Owens

Eric S. Raymond posted a patch and reported:

This bug unconditionally disables a configuration question -- and it's so old that it has propagated across three port files, without either of the people who did the cut and paste for the latter two noticing it.

This sort of thing would never ship in CML2, because the compiler would throw an undefined-symbol warning on BLK_DEV_ST. The temptation to engage in sarcastic commentary at the expense of people who still think CML2 is an unnecessary pain in the butt is great. But I will restrain myself. This time.

John Levon added, "in fact it was originally in i386 too. I noticed and fixed it, didn't even think about the other archs." Mike Galbraith suggested, "Erm.. if this bug is _that old_ and nobody noticed, isn't the right fix to just delete the dead option?" There was no reply to this, but elsewhere Benedict Bridgwater objected sarcastically:

So a shortcoming of the CML1 tools justifies the CML2 language?

I guess the next bug found in the Python2 interpreter will justify writing CML3 in FORTRAN.

Miles Lane came down on Benedict in measured tones, with, "IIRC, Eric floated the CML2 idea over a year ago, provided some rudimentary code and requested feedback. There has seemed, for a long time, to be agreement amoungst most kernel developers that there were serious problems with CML1 and that it needed to be junked. There are many things that CML1 was not going to allow us to do that will be possible with CML2 (subsetting of the code tree, etc). I don't think Eric's statement about this particular brown-paper-bag bug means that he thinks that this alone justifies migrating to CML2. There are a lot of good reasons for the migration. It isn't, perhaps, a perfect solution, but it is one that Eric has implemented with a year's worth of effort, with full knowledge of the kernel development community and with an open invitation for contributions and feedback. To rag on it now seems belated and unhelpful. It would be more useful if you helped Eric improve CML2, since there appears to be agreement that it will be used in 2.5." But Benedict argued, "CML2 itself may be well justified (although not on grounds of a diagnostic that CML1 tools could be changed to generate!), but there's no reason the tools utilizing the CML2 based rules shouldn't present a *superset* of the existing functionality. To present a dumbed down UI targeted for "Aunt Millie" or whoever against the protests of the mainstream kernel tool audience makes zero sense to me." But Keith Owens chided, "How many times do we have to say this? CML2 supports everybody from Aunt Millie (novice mode) through non-standard machine configurations (expert mode) through Linus (vi .config, make oldconfig). Pick the level of configuration that you need." End of thread.

Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.