Kernel Traffic
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic

Kernel Traffic #264 For 25 Jun 2004

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 4102 posts in 21023K.

There were 876 different contributors. 481 posted more than once. 317 posted last week too.

The top posters of the week were:

1. Abortive Attempt To Enhance dnotify

9 May 2004 - 21 May 2004 (29 posts) Archive Link: "[RFC/PATCH] inotify -- a dnotify replacement"

Topics: Ioctls

People: John McCutchanChris WedgwoodDavide LibenziAlexander ViroAlexander Larsson

John McCutchan posted a patch and said:

I have been working on inotify a dnotify replacement.

inotify is a char device that has two ioctls: INOTIFY_WATCH and INOTIFY_IGNORE Which expect an inode number and an inode device number. I know that on some file systems the inode number and inode device number are not guaranteed to be unique. This driver is only meant for file systems that have unique inode numbers.

The two biggest complaints people have about dnotify are

1) dnotify delivers events using signals.

2) dnotify needs a file to be kept open on the device, causing problems during unmount.

As for 1) -- since inotify is a char device the events are delivered when the user calls read (). This easily integrates in to an applications select() loop.

I have a solution for 2) but I haven't implemented it yet, as I don't have the expertise. What I want to do is distinguish between some one holding a 'real' reference on an inode, and a reference that is used only for watching. This way in the unmount code we can make sure that no one is holding a real reference to an inode on the device, then while invalidating the inodes, clear the inodes watch list and send UNMOUNT events to the inotify device. Changes would have to be made so that inodes are kept in the inode cache when a watching reference is held.

I would like some pointers on how to implement 2). I am not sure the code path that takes place on unmount, or the best place to change the inode cache semantics.

I would also appreciate comments on the design in general and code review (this is my first kernel project).

In addition to John's two objections to dnotify, Chris Wedgwood added a third, "dnotify cannot easily watch changes for a directory hierarchy." But John said this didn't seem to be a concern for dnotify users like Alexander Larsson, maintainer of nautilus and gnome-vfs. But Davide Libenzi said that "A dnotify replacement that does not have the ability to watch for a hierarchy is pretty much useless." John said he did plan on adding the feature eventually, it just didn't have such a high priority. At one point he explained, "Inotify will support watching a hierarchy. The reason it was not implemented yet is because the one app that I really care about is nautilus and the maintainer of it says he doesn't care." He added, "The big feature that inotify is trying to provide is not having to keep a file open (So that unmounting is not affected). I asked for some guidance from people more familiar with the kernel so that I can implement this feature, it requires changes made to the inode cache, and how unmounting is done." However, at this point Alexander Viro came down hard on the whole idea, saying:

First of all, on quite a few filesystems inumbers are stable only when object is pinned down. What's more, if it's not pinned down you've got nothing even remotely resembling a reliable way to tell if two events had happened to the same object - inumbers can be reused.

Besides, your "doesn't pin down" is racy as hell - not to mention the way you manage the lists, pretty much every function is an exploitable hole. Hell, just take a look at your "find inode" stuff - you grab superblock, find an inode by inumber (great idea, that - especially since on a bunch of filesystems it will get you BUG() or equivalent) then drop refernce to superblock (at which point it can be destroyed by umount()) _and_ do iput() (which will do lovely, lovely things if that umount did happen). Moreover, you return a pointer to inode, even though there's nothing to hold it alive anymore. And dereference that pointer later on, not caring if it had been freed/reused/whatever.

Overall: hopeless crap. And that's a direct result of your main feature - it's really broken by design.

John objected that the feature he was trying to implement "is the only reasonable solution to a problem that needs fixing. You can't simply say that a file manager needing to be notified when directories change is broken. How would you solve this problem?" But the discussion petered out quickly after this.

2. Status Report On Serial ATA (SATA)

10 May 2004 - 17 May 2004 (7 posts) Archive Link: "Serial ATA (SATA) on Linux status report"

Topics: Disk Arrays: RAID, Disks: IDE, Hot-Plugging, PCI, Serial ATA

People: Jeff Garzik

Jeff Garzik posted a status report for Linux support of Serial ATA (SATA). He said:

Hardware support

Intel ICH5 and ICH5-R

Summary: No TCQ. Looks like a PATA controller, but with a few added, non-standard SATA port controls. Hardware does not support hotplug. "Coldplug" support is potentially feasible.

libata driver status: Production, but see issue #2, #3.

drivers/ide driver status: Production, but see issue #1, #2.

Issue #1: Depending on BIOS settings, IDE driver may lock up computer when probing drives.

Issue #2: Excessive interrupts are seen in some configurations.

Issue #3: "Enhanced mode" or "SATA-only mode" may need to be set in BIOS.

Intel ICH6 ("AHCI")

Summary: Per-device queues, full SATA control including hotplug and PM.

libata driver status: "looks like ICH5" support available in ata_piix. Preliminary driver with full AHCI support now exists, and is being integrated into libata mainline.

Promise TX2/TX4/SX4

Summary: Per-host queues on all controllers. Full SATA control including hotplug and PM on all but one controller.

libata TX2/TX4 driver status: Production, but see issue #5.

libata SX4 driver status: Production, but see issue #6.

Issue #5: Some boards appear to have PATA as well as SATA ports. PATA is not currently supported, and no plans have yet been made to rectify this. Ideally drivers/ide would drive PATA, but if they are the same PCI device, that would not be feasible.

Issue #6: The SX4 hardware is not fully utilized by the Linux kernel driver. The SX4 hardware includes an on-board DIMM and hardware XOR offload. Using the on-board DIMM as cache, and issuing each RAID transaction once (instead of once for each disk), will result in increased performance, but the driver doesn't do that yet. SX4 hardware is very "RAID friendly", particularly RAID1/5. Users may wish to use the Promise driver to fully utilize the hardware.

Silicon Image 3112/3114

Summary: No TCQ. Looks like a PATA controller, but with full SATA control including hotplug and PM.

libata driver status: Beta.

drivers/ide driver status: Production, but see issue #4.

Issue #4: Need to have the most recent fixes posted to lkml, for stable operation and full performance (where possible).

Silicon Image 3124

Soon, hopefully. Silicon Image has made documentation and sample hardware available to me (jgarzik) for development. Some code exists internally.


Summary: Huge per-device queues, full SATA control including hotplug and PM for the "Frodo4" and "Frodo8" boards. Apple K2 SATA, which also uses this chipset, has all the feature of Frodo4/8 save the host DMA queueing feature ("QDMA").

libata driver status: Beta, but no QDMA support yet.


Summary: No TCQ. Looks like a PATA controller, but with full SATA control including hotplug and PM.

libata driver status: Beta.


libata driver status: Beta


libata driver status: Beta


libata driver status: in progress

Software support

Basic Serial ATA support

The "ATA host state machine", the core of the entire driver, is considered production-stable.

The error handling is _very_ simple, but at this stage that is an advantage. Error handling code anywhere is inevitably both complex and sorely under-tested. libata error handling is intentionally simple. Positives: Easy to review and verify correctness. Never data corruption. Negatives: if an error occurs, libata will simply send the error back the block layer. There are limited retries by the block layer, depending on the type of error, but there is never a bus reset.

Or in other words: "it's better to stop talking to the disk than compound existing problems with further problems."

As Serial ATA matures, and host- and device-side errata become apparent, the error handling will be slowly refined. I am planning to work with a few (kind!) disk vendors, to obtain special drives/firmwares that allow me to inject faults, and otherwise exercise error handling code.

Queueing support

Even though some SATA host controllers on the market already support command queueing (a.k.a. "TCQ"), libata does not yet support it.

However, libata was designed from the ground-up to support queueing, so I need only change a few lines of code, and write two functions, to enable this behavior.

Queueing will be enabled in libata soon, but to do so requires a long stretch of testing on a large variety of controllers and drives. This is very time-intensive, and is the largest part of this task.

Tangent: Host-based queueing and Native Command Queueing

Queueing is the process of sending multiple commands to a single device, without waiting for prior commands to finish. This increases performance and reduces latency. There are three types of queueing in the ATA world:

1) "legacy TCQ" -- some PATA devices support this. Just ignore it, it's going away.

2) "host-based TCQ" -- the host controller supports a queue of drive commands, whether or not the drive supports it.

3) "Native Command Queueing" -- both host and drive cooperate in the queueing and execution of drive commands. This should provide the highest performance and lowest latency of all three options.

#1 is support by drivers/ide _only_. libata will not support this.
#2 will soon be supported by libata.
#3 will be supported by libata when hardware is available from drive manufacturers.

Hotplug support

All SATA is hotplug.

libata does not support hotplug... yet.

Power Management support

Over and above the power management specified in the ATA/ATAPI specification, one can aggressively control the power consumption of SATA hosts, the SATA bus, and the SATA device.

SMART support

Soon. Requires the capability to directly submit ATA commands from userspace to the low-level device, which must be added with care.

3. Reworking Capability Support In 2.6

11 May 2004 - 14 May 2004 (6 posts) Archive Link: "[PATCH 0/2] capabilities"

Topics: Backward Compatibility, FS: ext3

People: Andy LutomirskiAndrew MortonChris Wright

Andy Lutomirski said:

This reintroduces useful capabilities.

In this model, the inheritable mask is a limit on what capabilities the task or any of its children may have. Permitted and effective masks have their old meanings.

Init gets all capabilities (even CAP_SETPCAP) since it seems absurd to do otherwise.

Part 1 shouldn't change user-visible behavior; it just moves the vfs capability logic into fs/exec.c (where it IMHO belongs) and the setuid logic into apply_creds (where it IMHO belongs). I made no effort to make apply_creds pretty since it all gets deleted in the next patch anyway.

Part 2 redoes the capability logic.

This code is paranoid about setuid. A later patch will allow that to be overridden by a sysctl so that securelevel can be done usefully.

I left cap_bset in, even though I can't see a use for it anymore.

I put some user tools at; I've used themn to exercise this code somewhat.

It applies to 2.6.6-mm1. It does _not_ apply to 2.6.6 vanilla, but the fix should be trivial (it conflicts with something in fs/exec.c). It will not apply to earlier versions, as it depends on the compute_creds race fix.

Please let me know what you think and if you see any holes in this code or possibilities of exposing / introducing bugs in other programs (think Sendmail bug).

This should also eliminate the Oracle magic-group mess :)

Andrew- is this sufficiently non-scary for -mm?

Where this is going:
There is probably some dead code now in sys_capset. I'll check on that. Also, I have an ext3 xattr caps patch lying around somewhere. I can try to get it working again.

Chris Wright was very happy to see this, and said he'd test it out. Andrew Morton also replied to Andy's question about whether the patch was too scary to be included in the main kernel tree. Andrew said, "Scares the shit out of me, frankly ;)" . He clarified some of his objections, saying:

What if there are existing applications which are deliberately or inadvertently relying upon the current behaviour? That seems unlikely, but the consequences are gruesome.

If I'm right in this concern, the fixed behaviour should be opt-in. That could be via a new prctl() thingy but I think it would be better to do it via a kernel boot parameter. Because long-term we should have the fixed semantics and we should not be making people change userspace for some transient 2.6-only kernel behaviour.

In the course of a very brief discussion, it became clear that Andrew was willing to consider Andy's patch, so long as enough compatibility was maintained so that anything that would break given Andy's changes, would have been broken in one way or another already. So the possibility remained for Andy's changes to walk a tightrope of backward compatibility, in which stuff that worked in the old system would have to continue to work; while anything pathological in the old system, could break in other ways as well under Andy's new system.

4. DID Support For ICH6 And 6300ESB

12 May 2004 - 14 May 2004 (5 posts) Archive Link: "[PATCH] ICH6/6300ESB i2c support"

Topics: Disks: IDE, I2C, Sound: i810

People: Jason GastonJean DelvareLinus TorvaldsAndrew MortonGreg KH

Jason Gaston said, "This patch adds DID support for ICH6 and 6300ESB to i2c-i801.c(SMBus). In order to add this support I needed to patch pci_ids.h with the SMBus DID's. To keep things orginized I renumbered the ICH6 and ESB entries in pci_ids.h. I then patched the piix IDE and i810 audio drivers to reflect the updated #define's. I also removed an error from irq.c; there was a reference to a 6300ESB DID that does not exist. This patch is against the 2.6.6 kernel." Greg KH applied the patch to his tree, in queue to Andrew Morton and Linus Torvalds; and Jean Delvare asked Jason to produce a patch against the 2.4 kernels as well; and Jason said he'd give this a whirl.

5. Linux 2.6.6-mm2 Released

13 May 2004 - 17 May 2004 (66 posts) Archive Link: "2.6.6-mm2"

Topics: Kernel Release Announcement, POSIX, Virtual Memory

People: Andrew Morton

Andrew Morton announced Linux 2.6.6-mm2, saying:

6. Capabilities; The Saga Continues

13 May 2004 - 24 May 2004 (55 posts) Archive Link: "[PATCH] capabilites, take 2"

Topics: Capabilities, POSIX

People: Andy LutomirskiChris WrightValdis KletnieksOlaf Dietsche

Andy Lutomirski said:

This implements working capabilities.

Changes from the previous version:

The whole thing is now a module parameter -- specify commoncap.newcaps=1 if capabilities is built-in or newcaps=1 in modprobe or insmod if not.

Known issues:

  1. setxuid emulation is wrong (keepcaps doesn't work for setre(s)uid). I haven't fixed this because I don't what to change the semantics of PR_SET_KEEPCAPS. I'll probably add PR_SET_REALLY_KEEPCAPS to really keep capabilities.
  2. When newcaps=0 (the default), linuxrc gets all capabilities. This is different from the old behavior. Init and programs started from linuxrc still match the old behavior. I couldn't think of any remotely clean way around that without breaking linuxrc for newcaps=1
  3. I haven't tried it, but I imagine that unloading commoncap and then reloading it in a different mode may do unexpected things. So read the code before you try that. I see no reason to fix this one because the whole thing should go away eventually.

When newcaps=1, this extends the LSM interface: bprm->secflags & BINPRM_SEC_NO_ELEVATE means that privileges should not be elevated. So stacking modules (e.g. selinux) should set this before calling to the lower module and test it again before deciding whether to elevate privileges. That way, either all privs are elevated or none are.

I also added a flag (BINPRM_SEC_SECUREEXEC) with the obvious meaning. Otherwise cap_bprm_secureexec would have been a mess.

Chris Wright had some objections. He said:

I think it still needs more work. Default behavoiur is changed, like Inheritble is full rather than clear, setpcap is enabled, etc. Also, why do you change from Posix the way exec() updates capabilities? Sure, there is no filesystem bits present, so this changes the calculation, but I'm not convinced it's as secure this way. At least with newcaps=0.

I believe we can get something functional with fewer changes, hence easier to understand the ramifications. In a nutshell, I'm still not comfortable with this.

Also, it breaks my tests which try to drop privs and keep caps across execve() which is really the only issue we're trying to solve ATM.

Valdis Kletnieks replied:

The last time the "capabilities" thread reared its head a while ago, Andy made a posting that pretty conclusively showed that the Posix way was totally b0rken if you ever intended to support filesystem bits. So if you wanted to ever have a snowball's chance of supporting something like:

chcap cap_net_raw+ep /bin/ping

so you could get rid of the set-UID bit on 'ping', you had to toss the Posix propogation rules out the window. So we need to do either:

  1. Toss the Posix out the window
  2. Toss all the filesystems capabilities support out the window.

(I'm assuming that a suggestion that we make the choice a Kconfig option will be met with the sound of many kernel hackers either retching in disgust or screaming in horror ;)

Chris replied that the desired behavior still wasn't clear, and "it's very uncomfortable to change mainline in subtle ways that could break security during stable series." Olaf Dietsche added that he'd done some work on supporting the POSIX way himself, and gave a URL. He said, "This supports filesystem capabilities with the current (POSIX?) implementation. So, whatever Andy has shown, it has at least one counter evidence." but Valdis went to the page and noticed that it said at the top, "This implementation is likely *not* POSIX compatible." He asked what the deal was, and Olaf replied, "This refers to the tools I provide. I should emphasize this on the page, thank you. My patch doesn't change the rules, how the capability bits are mingled."

A long technical debate ensued close by, with no real resolution.

7. High Precision Event Timer (HPET) Driver

13 May 2004 - 19 May 2004 (26 posts) Archive Link: "[PATCH] HPET driver"

Topics: FS: procfs, FS: sysfs

People: Robert PiccoJeff GarzikAndrew Morton

Robert Picco said, "The driver supports the High Precision Event Timer. The driver has adopted a similar API to the Real Time Clock driver. It can support any number of HPET devices and the maximum number of timers per HPET device. For further information look at the documentation in the patch. Thanks to Venki at Intel for testing the driver on X86 hardware with HPET." Andrew Morton was unhappy with the lack of code comments or other documentation; and he and Jeff Garzik had other technical comments to make, including Jeff's preference for SysFS over Robert's choice of ProcFS for his driver. Jeff also pointed to a place to find HPET documentation (Chapter 5.18).

Robert, some days later, posted an updated patch which he hoped addressed some of Andrew's and Jeff's concerns; and Jeff was very pleased, and gave his OK for inclusion. There was a bit more technical discussion, but not much, and the thread petered out.

8. USB Updates For 2.6.6

14 May 2004 - 19 May 2004 (11 posts) Archive Link: "[BK PATCH] USB changes for 2.6.6"

Topics: USB, Version Control

People: Greg KHLinus TorvaldsOlaf Hering

Greg KH said:

Here are USB patches for 2.6.62. There are a bunch of different things here:

All of these (with the exception of a few minor patches from today) have been in the -mm tree for quite some time.

Please pull from: bk://

Patches will be posted to linux-usb-devel as a follow-up thread for those who want to see them.

Olaf Hering had some problem compiling the code on top of the current official tree; and after some intervention by Linus Torvalds ( "Replace all "led" with "cytherm". The code was crap, and would never have compiled with debugging on anyway." ) Greg said he'd post a new patch with clean fixes for this.

9. Guidelines For Writing IDE Drivers

15 May 2004 - 16 May 2004 (10 posts) Archive Link: "[RFC][DOC] writing IDE driver guidelines"

Topics: Disks: IDE, PCI

People: Bartlomiej ZolnierkiewiczJeff Garzik

Bartlomiej Zolnierkiewicz posted some guidelines for writing IDE drivers:

general rules:

new architecture:

architecture specific drivers:

PCI drivers:

Jeff Garzik offered some feedback. In particular, he felt that the handling of the ide_init_hwif_ports() situation was backwards from what it should be. Since it was an obsolete function, he reasoned, it would be better to make it available only to drivers that defined IDE_ARCH_OBSOLETE_INIT (as opposed to defining IDE_ARCH_NO_OBSOLETE_INIT for new drivers, as Bartlomiej had it). This way new drivers wouldn't have to have an essentially garbage definition just because certain old drivers did things a different way. Bartlomiej thought about this and said he'd change it to be as Jeff suggested.

Another of Jeff's comments related to the need to define ide_default_irq(), ide_init_default_irq() and ide_default_io_base() to (0) for new architectures. He suggested providing generic definitions, so that folks coding for new architectures wouldn't need to worry about this unless it was actually relevant to them. At one point in the discussion, he explained, "Your document appears to imply that each new arch should define the above three symbols. My suggestion is to devise a method by which new arches don't have to care about those symbols at all, unless required to do so by the underlying hardware." Bartlomiej said he'd look into how this could be done.

10. New Book On Linux Virtual Memory

15 May 2004 - 18 May 2004 (3 posts) Archive Link: "VM documentation and book"

Topics: Virtual Memory

People: Mel GormanMarc-Christian PetersenBruce Perens

Mel Gorman said:

A long time ago, I told a number of people that I was asked to write a book based on the Linux VM documentation I had up on . I am happy to announce that this book is finished and now available in online stores ( Yes, this is a plug! It is published under the OPL as per the criteria of the Bruce Perens Open Book Series ( meaning it will be available for free download after 90 days. In other words, I intended for it to be easily available like the online documentation was but the option of having your own shiny printed copy is there :)

The information on the web site will remain as it is for people that want to read it but the book has a lot more information in it including TLB management, shared memory filesystem, a lot more code commentary and extra material and clarifications throughout the whole book. Each chapter also has information on the 2.6 kernel as was known around about 2.6.0-test4 which will help anyone trying to figure out the more recent code for themselves.

It has been fun and I hope people enjoy the final result.

Marc-Christian Petersen voiced the feelings of many, when he replied, "I can't say how much I appreciate this and your effort. Thanks alot!"

11. PC9800 Sub-Architecture To Be Dropped From 2.6

15 May 2004 - 18 May 2004 (21 posts) Archive Link: "[patch] kill off PC9800"

Topics: Disks: IDE, Version Control

People: Randy DunlapAndrew MortonNorman DiamondJames BottomleyJeff Garzik

Randy Dunlap posted a patch to remove the entire PC9800 sub-architecture from the Linux kernel. By way of justification, he said that it was "incomplete, hackish (at least in IDE), maintainers don't reply to emails and haven't touched it in awhile. Can't even config it to try to build it without other patches to the kernel tree." Andrew Morton added, "the hardware is obsolete, isn't it? Does anyone know when they were last manufactured, and how popular they are?" Norman Diamond said, "they aren't popular any more, but the last ones were still respectably powerful and can run stuff like Windows 2000 Server." James Bottomley also replied to Andrew:

Hey, just being obsolete is no grounds for eliminating a subarchitecture...

However, I would have to say that being unmaintained is. Because of the penchant of x86 people to go "it compiles on my PC, ship it", the x86 subarchitectures are about the fastest bitrotting pieces of the kernel there are.

Since mach-pc9800 cannot currently be compiled and there's no evidence that it actually was, I'd remove it unless someone steps up quickly to maintain it (and get it to the point where it's actually compileable).

Andrew replied:

Well it's a question of whether we're likely to see increasing demand for it in the future. If so then it would be prudent to put some effort into fixing it up rather than removing it.

Seems that's not the case. I don't see a huge rush on this but if after this discussion nobody steps up to take care of the code over the next few weeks, it's best to remove it.

Jeff Garzik was sad to see this happening, and at one point he said, "The PC9800 people spent a good long while working with Alan and others to get what little bits got merged into the kernel." But he acknowledged, "I suppose disappearing and not maintaining the code is the overriding factor here..."

Close by, James Bottomley also remarked, regarding the preferable situation of actually finding a maintainer for the code, "I think the best way of making someone sit up and take notice is simply to remove it. After all, given that we have the kernel under source control it's not like it's going to be hard to put it back if someone actually does notice and screams..."

12. Linux 2.6.6-mm3 Released; Status Of KGDB

16 May 2004 - 25 May 2004 (31 posts) Archive Link: "2.6.6-mm3"

Topics: FS: sysfs, Virtual Memory

People: Andrew MortonTom RiniRobert PiccoAdrian BunkAmit S. Kale

Andrew Morton announce Linux 2.6.6-mm3, saying:

As discussed in Issue #264, Section #11  (15 May 2004: PC9800 Sub-Architecture To Be Dropped From 2.6) , Adrian Bunk posted several patches to remove portions of the PC9800 subarchitecture.

Regarding Andrew's question about KGDB, Tom Rini replied:

No one asked the ia64 folks who did that work "Hey, have you looked at the grand unified kgdb project on ?" would be my guess.

Having said that, if you're willing to go with a slightly late initalizing (I saw part of the early_param work get dropped again I think, so I'm gonna guess you don't wanna deal with that again yet) KGDB for i386 and PPC32, I can whip something up vs 2.6.6 in a day or so.

Robert Picco replied, "I did the ia64 port and started with Andrew's 2.6.4-mm2 i386 sources. I'm assuming the long term strategy is to move to a unified kgdb being done on sourceforge? If so, I'll take a look at this." Tom replied, "My long term strategy is to get everyone using the version on sourceforge that splits out the common portions of the stub from the arch-specific portions. If you could go ahead and get ia64 working on this as well I'd appreciate it. Right now it's still vs 2.6.5, but I'm going to try and fix that today or tomorrow to be vs 2.6.6." An hour and a half later he posted, saying he'd done the update to 2.6.6; Amit S. Kale was pleased to hear it.

13. Linux 2.4.27-pre3

18 May 2004 - 20 May 2004 (4 posts) Archive Link: "Linux 2.4.27-pre3"

Topics: FS: JFS, PCI, Power Management: ACPI, Sound: i810

People: Marcelo Tosatti

Marcelo Tosatti announced Linux 2.4.27-pre3, saying, "It contains network driver fixes, nForce2 hang PCI workaround, i810 audio fixes, JFS update, ACPI, sh64/ia64 arch updates, amongst others."

14. UMSDOS May Be Dropped From 2.6

19 May 2004 - 27 May 2004 (4 posts) Archive Link: "2.6: future of UMSDOS?"

Topics: Backward Compatibility, FS: UMSDOS, FS: ext2, Small Systems

People: Jan-Benedict GlawAdrian BunkJan-Benedict

Adrian Bunk asked if there were any users or developers of UMSDOS, or if it should be removed from the 2.6 kernel. Jan-Benedict Glaw said it would be nice to keep it around for historical reasons, as one of the old-time ways of showing Linux to DOS users; but he acknowledged, "However, one can achieve the same (with a lot more work) by placing a loop-mountable ext2 FS and start it from an initrd. Much more complicated, not that flexible (loop-mounted files don't typically grow:)" . Mark Beyer pointed out that some embedded systems still used it, so keeping it around for backward compatibility might be good; but Adrian replied that it was actually already broken, and that unless someone would actually maintain it, this would be a significant factor in deciding whether to drop it.

15. SquashFS Version 2.0 ALPHA Released

21 May 2004 (1 post) Archive Link: "[announce] Squashfs2.0-alpha released (compressed filesystem)"

Topics: Compression, FS: SquashFS

People: Phillip Lougher

Phillip Lougher said:

I'm pleased to announce the first version of Squashfs 2.0. A lot of changes to the Squashfs filesystem have been made under the bonnet (hood), to improve compression. Squashfs 2.0 has added the concept of fragment blocks and has increased the block size to 64K. This achieves a 5 - 20% compression saving, and allows Squashfs to achieve better compression than Cloop while retaining the I/O efficiency of a compressed filesystem. In addition, the maximum number of UIDs and GIDs has been increased to 256. This allows Squashfs to better support live CDs.

For a description of fragment blocks and the other changes, please go to the project page

16. Linux 2.6.6-mm5 Released; SATA Code Rough Around The Edges

22 May 2004 - 26 May 2004 (34 posts) Archive Link: "2.6.6-mm5"

Topics: Disk Arrays: RAID, Disks: IDE, Disks: SCSI, FS: ReiserFS, FS: ext3, Forward Port, Kernel Release Announcement, SMP, Serial ATA

People: Andrew MortonJeff GarzikChristoph Hellwig

Andrew Morton announced Linux 2.6.6-mm5, saying:

Regarding the state of the SATA code, Jeff Garzik said:

It's not too bad... but it looks more like a 2.2 driver forward ported to 2.4, than a 2.6.x driver. Needs some luvin' from the 2.6 scsi api crew.

Overall, it appears to be a message-based firmware engine like drivers/block/carmel.c, that hides the SATA details in the firmware.

Christoph Hellwig also pointed out that the driver submissions should always go to the linux-scsi mailing list, instead of just linux-kernel. Andrew and Adam Radford (who'd submitted the patch) confirmed that this had in fact been done, but the patch had been so big that the linux-scsi list software had rejected it.

17. Emulating Old CPUs

22 May 2004 - 27 May 2004 (28 posts) Archive Link: "i486 emu in mainline?"

Topics: Assembly, SMP

People: Christoph HellwigWilly TarreauJeff GarzikAlan CoxAndrew MortonDenis Vlasenko

Christoph Hellwig said, "These days gcc uses i486+ only instruction by default in libstdc++ so most modern distros wouldn't work on i386 cpus anymore. To make it work again Debian merged Willy Tarreau's patch to trap those and emulate them on real i386 cpus. The patch is extremely non-invasive and would certainly be usefull for mainline. Any reason not to include it?" Several folks including Andrew Morton looked over the patch and came back with criticisms, including potential OOPSes that could be caused by the code; and these were discussed briefly. Elsewhere, Willy Tarreau (the author of the patch) also replied to Christoph, with some of his own reservations:

I have mixed feelings about this because :

Other than that, I'm happy that someone found it useful, and happy too that someone did the 2.6 port :-)

Jeff Garzik disagreed with the opinion Willy attributed to Alan Cox, that this patch was a problem distributions had to deal with, and shouldn't ever be prat of the kernel. Jeff said, "I want to add "old Alpha" emulation code, so that older Alphas can run binaries built on the newer alphas." Alan (in a now-rare linux-kernel post) replied:

Well it always depends on the platform. cmov emulation isnt useful because i686 gcc generates too many for it to be of value. For alpha it depends on the commonness of the emulated instructions and the emulation cost.

Either way it is a user space problem in almost all situations. See

There were other responses to Willy's post, and a technical discussion ensued, with Alan taking a lead role; but not much conclusive came out of it.

18. Linux 2.6.7-rc1; Developer Concern Over Destablization

22 May 2004 - 24 May 2004 (12 posts) Archive Link: "Linux 2.6.7-rc1"

Topics: Kernel Release Announcement, Virtual Memory

People: Linus TorvaldsHorst von BrandJeff GarzikAlan CoxHugh DickinsAndrea ArcangeliAndrew Morton

Linus Torvalds announced Linux 2.6.7-rc1, saying, "This is stuff all over the map, but most interesting (or at least most "core") is probably the merging of the NUMA scheduler and the anonvma rmap code. The latter gets rid of the expensive pte chains, and instead allows reverse page mapping by keeping track of which vma (and offset) each page is associated with. Special kudos to Andrea Arcangeli and Hugh Dickins." Horst von Brand replied, "Not wanting to start a flamewar, but this sort of massive changes in a _stable_ series has got me quite confused... either 2.6.0 was premature, or the "just stabilize 2.6, new stuff only into 2.7 (when it opens)" got lost somewhere." He added later, "It looks like Andrew Morton is playing the role Alan Cox had during part of 2.5: Checking/filtering/testing "not quite ready" stuff, when he was supposed to be like Marcelo Tossati: Keeper of the stable series, don't let anything too risky get even near it."

Jeff Garzik said that the VM Subsystem would always be a work in progress, but that for the stable series, development should quiet down in order to actually become stable. He added later, "You've got all the major Linux vendors preparing (or releasing) 2.6.x-based product and IMO the 2.6 kernel is still a moving target, with non-trivial behavior (and sometimes API) changes every couple of kernel versions."

At one point, Linus argued that the changes that had been going into the 2.6 kernel had not been complete rewrites, but more "implementation details", and were thus OK.

There was some small discussion, but only enough to clarify the opposing positions.

19. Linus Proposes New Patch Attribution Convention

22 May 2004 - 27 May 2004 (82 posts) Archive Link: "[RFD] Explicitly documenting patch submission"

Topics: Version Control

People: Linus TorvaldsDavide LibenziLa Monte H.P. YarrollBen CollinsAlbert CahalanPaul MackerrasArjan van de Ven

It's unusual for Linus Torvalds to actually begin a discussion himself, unless it is a new kernel announcement; this time, he started off with:


This is a request for discussion..

Some of you may have heard of this crazy company called SCO (aka "Smoking Crack Organization") who seem to have a hard time believing that open source works better than their five engineers do. They've apparently made a couple of outlandish claims about where our source code comes from, including claiming to own code that was clearly written by me over a decade ago.

People have been pretty good (understatement of the year) at debunking those claims, but the fact is that part of that debunking involved searching kernel mailing list archives from 1992 etc. Not much fun.

For example, in the case of "ctype.h", what made it so clear that it was original work was the horrible bugs it contained originally, and since we obviously don't do bugs any more (right?), we should probably plan on having other ways to document the origin of the code.

So, to avoid these kinds of issues ten years from now, I'm suggesting that we put in more of a process to explicitly document not only where a patch comes from (which we do actually already document pretty well in the changelogs), but the path it came through.

Why the full path, and not just originator?

These days, most of the patches in the kernel don't actually get sent directly to me. That not just wouldn't scale, but the fact is, there's a lot of subsystems I have no clue about, and thus no way of judging how good the patch is. So I end up seeing mostly the maintainers of the subsystem, and when a bug happens, what I want to see is the maintainer name, not a random developer who I don't even know if he is active any more. So at least for me, the _chain_ is actually mostly more important than the actual originator.

There is also another issue, namely the fact that when I (or anybody else, for that matter) get an emailed patch, the only thing I can see directly is the sender information, and that's the part I trust. When Andrew sends me a patch, I trust it because it comes from him - even if the original author may be somebody I don't know. So the _path_ the patch came in through actually documents that chain of trust - we all tend to know the "next hop", but we do _not_ necessarily have direct knowledge of the full chain.

So what I'm suggesting is that we start "signing off" on patches, to show the path it has come through, and to document that chain of trust. It also allows middle parties to edit the patch without somehow "losing" their names - quite often the patch that reaches the final kernel is not exactly the same as the original one, as it has gone through a few layers of people.

The plan is to make this very light-weight, and to fit in with how we already pass patches around - just add the sign-off to the end of the explanation part of the patch. That sign-off would be just a single line at the end (possibly after _other_ peoples sign-offs), saying:

Signed-off-by: Random J Developer <>

To keep the rules as simple as possible, and yet making it clear what it means to sign off on the patch, I've been discussing a "Developer's Certificate of Origin" with a random collection of other kernel developers (mainly subsystem maintainers). This would basically be what a developer (or a maintainer that passes through a patch) signs up for when he signs off, so that the downstream (upstream?) developers know that it's all ok:

Developer's Certificate of Origin 1.0

By making a contribution to this project, I certify that:

    a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or

    b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or

    c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it.

This basically allows people to sign off on other peoples patches, as long as they see that the previous entry in the chain has been signed off on. And at the same time it makes the "personal trust" explicit to people who don't necessarily understand how these things work.

The above also allows for companies that have "release criteria" to have the company "release person" sign off on a patch, so that a company can easily incorporate their own internal release procedures and see that all the patches have gone through the right channel. At the same time it is meant to _not_ cause anybody to have to change how they work (ie there is no "extra paperwork" at any point).

Comments, improvements, ideas? And yes, I know about digital signatures etc, and that is _not_ what this is about. This is not about proving authorship - it's about documenting the process. This does not replace or preclude things like PGP-signed emails, this is _documenting_ how we work, so that we can show people who don't understand the open source process.

Arjan van de Ven suggested having developers register a public key with Linus, such that whenever they submitted a patch that had been signed with that key, there would be an implicit attestation similar to the three items listed above. Linus replied:

One reason that I'd prefer not to is simply the question of "who maintains the certificates?"

I certainly don't want to maintain any stateful paperwork with lots of people. This is why I personally would prefer it all to be totally state-less.

Also, there is a _fundamental_ problem with signing a patch in a global setting: the patches _do_ get modified as they move through the system (maybe just bug-fixes, maybe addign a missing piece, maybe removing a controversial part). So the signature ends up being valid only on your part of the communication, and then after that it needs something else.

And what I do _not_ want to see is a system where if somebody makes a trivial change, it then has to go back to you to be re-signed. That just would be horrible.

With those (pretty basic) caveats in mind, I don't see any fundamental problem in a PGP key approach, if it's a "local" thing between developers. In fact, I think PGP-signed patches are something we may want to look at from a "trust the email" standpoint, but I think it should be a _local_ trust. And part of that "local" trust might be a private agreement between ddevelopers that "it's ok to add the sign-off line for Arjan when the patch has come with that PGP signature" when the patch is passed on.

So to me, the sign-off procedure is really about documenting the path, and if a PGP key is there in certain parts of the path, then that would be a good thing, but I think it's a separate thing from what I'm looking for.

Davide Libenzi also said to Linus, "Andrew already puts the "From:" thing in the patch comment, so this should be simply a matter of replacing "From:" with "Signed-off-by:", preserving it in logs, and documenting the thing in the patch submission doc. No?" Linus replied:

Yes and no.

Right now it is _Andrew_ that does the From: line from you. In the sign-off procedure, it would be _you_ who add the "Signed-off-by:" line for yourself.

(And then Andrew would sign off on the fact that you signed off).

Not a big difference, I agree.

La Monte H.P. Yarroll replied, "Andrew's From comment is already a little lossy, e.g. most LKSCTP patches show up as from Sridhar or DaveM even though there's a whole subproject of developers working behind Sridhar. I think the proposed process will increase the amount of explicit credit being recognized--a very good thing IMHO, since this is the core currency of our gift culture."

Elsewhere, in the course of discussion, Linus remarked, "I don't expect this process to start taking effect for a while. Not only do we need to come to some level of agreement about it, but we need to give people the time to learn about it _without_ rejecting patches in the meantime. There is no real "flag-day" (and it's certainly not today), although I'm hoping that by the time I start up 2.7.x we'd have this in place."

Elsewhere, Ben Collins also replied to Linus' original post, asking, "Say the patch comes to me from some patch collection maintainer, who got it from the original author. So the original person never put a Signed-off-by, and neither did the person who sent me the patch, should I still add the eplicit Signed-off-by's to the patch, and add myself, before sending it to you?" Linus replied:

You should never sign off for somebody else.

You _can_ sign off as yourself, and just add a note of "From xxxx". That's what the (b) case is all about (ie "to the best of my knowledge it's already under a open-source license").

Of course, if it's a _big_ work with lots of original content, and you're unsure of exactly what the original author wanted to do with this, you obviously should _not_ sign off on it. But you knew that.

Elsewhere, under the subject: [PATCH][PPC64] Don't clear MSR.RI in do_hash_page_DSI, Paul Mackerras submitted a patch, using the "Signed-off-by" form suggested in Linus' original post. Linus replied:

Ok, looks like somebody has bought into the sign-off procedure. Great.

Except that I (and my tools) expected for the "Signed-off-by:" line to go into the comment section _before_ the patch (and after the "explanation") and obviously didn't make that part very clear.

The reason for that is partly because that's how all the current source control helper tools work by extracting the changeset comments (but that could certainly be changed), but more importantly because with a large patch, it's very very easy to overlook the sign-off at the end of the patch.

I only noticed after I applied this, so now you didn't get the distinction of being the first changeset ever to have the sign-off thing recorded ;^)

.. and the race is on.

(Seriously, while nobody has actually complained about the suggested rules, I don't think anybody should feel compelled to do the sign-off before we've had more time to let people argue over it. People who feel comfortable with the suggestion are obviously encouraged to start asap, though).

Albert Cahalan replied:

You're not known for bureaucracy.

The wordy mix-case aspect is kind of annoying, and for all that we don't get to differentiate actions. I count:

  1. came up with the design ideas
  2. wrote the original patch
  3. reviewed and passed on
  4. modified
  5. blindly passed on

Maybe "blindly passed on" needs nothing. So I'm thinking, if we must bother with all this...


Add "pirated:" if you like, so that searching for pirated code is easier than checking the evil bit.

Linus replied:

I actually really really don't want to differentiate actions. There's really no reason to try to separate things out, and quite often the actions are mixed anyway. Besides, if they all end up having the same technical meaning ("I have the right to pass on this patch") having separate flags is just sure to confuse the process.

So what I want is something _really_ simple. Something that is unambigious, and cannot be confused with something else. And in particular, I want that sign-off line to be "strange" enough that there is no possibility of ever writing that line by mistake - so that it is clear that the only reason anybody would write something like "Signed-off-by:" is because it meant _that_ particular thing.

In contrast, your suggestion of "modified:" is something that people might actually write when they write a changelog entry.

One reason for uniqueness is literally for automatic parsing - having scripts that pick up on this, and send ACK messages, or do statistics on who patches tend to go through etc etc.

20. linux-libc-headers Version Released

23 May 2004 (1 post) Archive Link: "[ANNOUNCE] linux-libc-headers"

People: Mariusz Mazur

Mariusz Mazur announced linux-libc-headers version, saying:

Available at Changes:

I should be releasing now, but I do not have enough time till the end of May, and the last bug in the above list proved to be a little too popular (at least that's what the number of bug reports would suggest). Since I've accidentally introduced it myself in the previous release, I can't force all those people to patch the thing by hand because of my mistakes :)

21. CREDITS File Maintainership No Longer Needed

23 May 2004 - 27 May 2004 (3 posts) Archive Link: "[patch] John A. Martin no longer maintains the CREDITS file"

Topics: CREDITS File, MAINTAINERS File, Version Control

People: John A. MartinAdrian Bunk

Adrian Bunk posted a patch to remove John A. Martin from the MAINTAINERS file, where he'd been listed as maintainer of the CREDITS file. John replied:

I cannot remember the last time I've seen a request for CREDITS file maintenance. The frequency of requests seemed to drop off sharply at some point several years ago, perhaps when access to CVS was broadened, IIRC.

I'd be more than happy to continue/resume maintenance of the CREDITS file if that is wanted.

Adrian replied:

The CREDITS file changes usually come from the people listed, and new people are most often added as part of the patch that adds the code they've written.

Your work in the past is appreciated, but effectively Linus and Andrew maintain the CREDITS file today.

End Of Thread.

22. User-Mode Linux Version 2.6.6-1 Released

24 May 2004 (1 post) Archive Link: "uml-patch-2.6.6-1"

Topics: User-Mode Linux, Version Control

People: Jeff Dike

Jeff Dike said:

This patch updates UML to 2.6.6. Aside from the update, there were a few small bug fixes.

The 2.6.6-1 UML patch is available at

BK users can pull my 2.5 repository from

For the other UML mirrors and other downloads, see

Other links of interest:

The UML project home page :
The UML Community site :

23. megaraid Driver Version 2.20.0-rc2 Released

24 May 2004 (1 post) Archive Link: "[ANNOUNCE]: megaraid driver version 2.20.0.rc2"

Topics: Disk Arrays: RAID, PCI

People: Atul MukkerMarcelo TosattiJames BottomleyMatthew WilcoxArjan van de VenMatt DomschChristoph HellwigMukker

Atul Mukker said:

We are pleased to announce the megaraid release candidate (since it is still in test labs at LSI) for lk 2.6

This driver incorporates the inputs from Paul Wagland, James Bottomley, Matt Domsch, Christoph Hellwig, Arjan van de Ven, Matthew Wilcox, Marcelo Tosatti, and many others on the scsi and kernel lists. As always, the feedback is greatly appreciated.

Highlight of this release

  1. Fully qualified PCI identifiers to identify MegaRAID controllers
  2. PCI shutdown notification routine with hba and devices sync
  3. Support for random drive deletion
  4. Fully re-entrant hot-path w/ data structures protected by their respective locks. No longer rely on "host_lock". Should boost performance by 5-10% and hopefully better CPU utilization
  5. Better abort and reset handling.

The patch for lk 2.6.6 and the driver is available at

24. Comprehensive Kernel Contributor List

25 May 2004 (5 posts) Archive Link: "[RFC] Kernel origins and maintainers"

Topics: CREDITS File, MAINTAINERS File, Spam, Version Control

People: Peter A. Van TassellLinus Torvalds

Peter A. Van Tassell said:

I'm busy putting together a list of all the contributors and maintainers that I can find, using the CREDITS and MAINTAINERS files. I have not seen this done in one place before; is it redundant effort? The people at Grokline (affiliated with Groklaw) are interested in this project, in an effort to trace and verify the origin of everything that ever happened.

Please see here:

and here:

How does this overlap with the new Developer's Certificate of Origin? I have the distinct impression that these projects should be working closer together.

Please cc me off the list in addition to on-the-list, just in case; my spam filter occasionally screws up. Thanks to all in advance for any input or advice.

Linus Torvalds replied that he hadn't seen a project like this before, adding, "For the last two years, you can find a lot of email addresses in BK, and you can clean them up with the name translations found in the "shortlog" script (part of "BK-tools" at" Regarding the new sign-off idea proposed by Linus in Issue #264, Section #19  (22 May 2004: Linus Proposes New Patch Attribution Convention) , he also said, "Well, right now you won't get any real information from the sign-off lines, since only a few people have started using it (7 people at the time of this writing ;), so you're better off just doing statistics on the output of "bk changes -a" or something."

25. VIA Velocity Gigabit Ethernet Driver Available

26 May 2004 (3 posts) Archive Link: "VIA "Velocity" Gigabit ethernet driver"

Topics: Networking

People: Alan Cox

Alan Cox said, "A cleaned up and 2.6 ported VIA velocity driver is now available on ftp://people/ This is an initial clean up and port to 2.6. It isn't by any means polished yet. Please send test results and patches to me (I don't currently read linux-kernel). It should work on both 32 and 64bit little endian, it won't work on big endian boxes yet."

26. Linux 2.6.7-rc1-mm1 Released

27 May 2004 - 31 May 2004 (8 posts) Archive Link: "2.6.7-rc1-mm1"

Topics: Device Mapper, Kernel Release Announcement, Kexec, SMP

People: Andrew MortonEric W. Biederman

Andrew Morton announced Linux 2.6.7-rc1-mm1, saying:

Eric W. Biederman, slightly dismayed, asked, "What happened to the kexec reserve system call number patch that was in mm4? I thought we had that all straightened out." But Andrew reassured him, "It was merged into 2.6.7-rc2." Eric was relieved.

27. Linux 2.4.27-pre4 Released

30 May 2004 (2 posts) Archive Link: "Linux 2.4.27-pre4"

Topics: FS: JFS, FS: XFS, Networking

People: Marcelo Tosatti

Marcelo Tosatti announced Linux 2.4.27-pre4, saying, "It contains a TCP update (backporting some 2.6.x work), XFS/JFS updates, some architecture updates, driver fixes, the usual..."







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.