Kernel Traffic #275 For 2 Oct 2004

By Zack Brown

Table Of Contents

1. New blktool Tool To Replace hdparm

15 Aug 2004 - 19 Aug 2004 (20 posts) Archive Link: "new tool: blktool"

Topics: Disk Arrays: RAID, Disks: IDE, Disks: SCSI, I2O, Version Control

People: Jeff GarzikAlan Cox

Jeff Garzik said:

I just posted "blktool" on my SF page, http://sourceforge.net/projects/gkernel/ and in BitKeeper at bk://gkernel.bkbits.net/blktool

blktool aims to be an easier to use, and more generic version of the existing utility 'hdparm'. For example,

        $ hdparm -c1 /dev/hda
                becomes
        $ blktool /dev/hda pio-data 32-bit

        and

        $ hdparm -L0 /dev/hda
                becomes
        $ blktool /dev/hda media unlock

The utility is currently still fairly specific to IDE devices (as hdparm is), but that will change in the coming weeks as SCSI, I2O, and possibly some bits of hardware RAID control are added.

The audience for this application, like hdparm, is fairly narrow, specific to people who tweak their storage devices and _know what they are doing_. Improper use of this tool, like hdparm, can turn your disk into a doorstop.

Alan Cox objected to the command line format, saying, "So you've replaced hdparm's weird but unixish command line with an even more demented non linuxish one that doesn't handle regexps for drive names?" He suggested a '--option=value' format. Jeff said he preferred his original mechanism because it seemed to have more structure, especially as the list of potential commands grew larger. But he said he could also implement Alan's preference as well.

There seemed some general agreement that Jeff's basic idea was sound; and folks started hacking on it.

2. New waitid() System Call For POSIX Conformance (Or Improvement)

15 Aug 2004 - 24 Aug 2004 (13 posts) Archive Link: "[PATCH] waitid system call"

Topics: BSD, POSIX

People: Roland McGrathAndi Kleen

Roland McGrath said:

This patch adds a new system call `waitid'. This is a new POSIX call that subsumes the rest of the wait* family and can do some things the older calls cannot. A minor addition is the ability to select what kinds of status to check for with a mask of independent bits, so you can wait for just stops and not terminations, for example. A more significant improvement is the WNOWAIT flag, which allows for polling child status without reaping. This interface fills in a siginfo_t with the same details that a SIGCHLD for the status change has; some of that info (e.g. si_uid) is not available via wait4 or other calls.

I've added a new system call that has the parameter conventions of the POSIX function because that seems like the cleanest thing. This patch includes the actual system call table additions for i386 and x86-64; other architectures will need to assign the system call number, and 64-bit ones may need to implement 32-bit compat support for it as I did for x86-64. The new features could instead be provided by some new kludge inventions in the wait4 system call interface (that's what BSD did). If kludges are preferable to adding a system call, I can work up something different.

I added a struct rusage field si_rusage to siginfo_t in the SIGCHLD case (this does not affect the size or layout of the struct). This is not part of the POSIX interface, but it makes it so that `waitid' subsumes all the functionality of `wait4'. Future kernel ABIs (new arch's or whatnot) can have only the `waitid' system call and the rest of the wait* family including wait3 and wait4 can be implemented in user space using waitid. There is nothing in user space as yet that would make use of the new field.

Most of the new functionality is implemented purely in the waitid system call itself. POSIX also provides for the WCONTINUED flag to report when a child process had been stopped by job control and then resumed with SIGCONT. Corresponding to this, a SIGCHLD is now generated when a child resumes (unless SA_NOCLDSTOP is set), with the value CLD_CONTINUED in siginfo_t.si_code. To implement this, some additional bookkeeping is required in the signal code handling job control stops.

The motivation for this work is to make it possible to implement the POSIX semantics of the `waitid' function in glibc completely and correctly. If changing either the system call interface used to accomplish that, or any details of the kernel implementation work, would improve the chances of getting this incorporated, I am more than happy to work through any issues.

Andi Kleen and others offered some criticism, and Roland posted an updated patch the next day. Michael Kerrisk offered his criticism, particularly that POSIX conformance might not be the best goal to have here. Michael said that various other systems like Solaris and HP-UX had chosen to avoid full POSIX compliance in this area. Roland asked for some more complete descriptions of how these other systems behaved, and Michael provided some results of his experiments. The discussion petered out at this point, but it seemed that Roland was interested in possibly modifying his approach, if there were truly a way to improve on POSIX in that case.

3. Linux 2.6.8.1-mm1 Released

16 Aug 2004 - 19 Aug 2004 (38 posts) Archive Link: "2.6.8.1-mm1"

Topics: Disk Arrays: RAID, Disks: IDE, Hot-Plugging, Kernel Release Announcement, Profiling

People: Andrew MortonSam RavnborgBartlomiej ZolnierkiewiczAlan Cox

Andrew Morton announced Linux 2.6.8.1-mm1, saying:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.8.1/2.6.8.1-mm1

A bunch of folks reported bugs and other issues. Sam Ravnborg said:

The following two patches can be dropped from -mm. The functionality has moved to scripts/mksysmap

handle-undefined-symbols.patch
Fail if vmlinux contains undefined symbols

sparc32-ignore-undefined-symbols-with-3-or-more-leading-underscores.patch
sparc32: ignore undefined symbols with 3 or more leading underscores

Bartlomiej Zolnierkiewicz also said that an ITE RAID driver patch in -mm should not be pushed upstream, because "it duplicates _a lot_ of functionality present in drivers/ide / libata (Alan Cox has native drivers/ide driver, although I would still prefer libata based driver) and contains code for RAID metadata handling which should belong to user-space." But Alan Cox replied:

Some of the metadata handling is needed kernel side. I'm hoping we can avoid most of it with the drive hotplug code. The corner case causing the problem is when no arrays are configured so there is no device/hwif present.

I looked at the libata stuff - it's part of the reason I sent Jeff the error dump/translate patch but right now libata is woefully ignorant of a large number of IDE/EIDE/ATA considerations.

4. Some ext3 Documentation Updates

17 Aug 2004 - 19 Aug 2004 (8 posts) Archive Link: "[RFC] ext3 documentation (lack of)"

Topics: Access Control Lists, FS: ext3

People: Diego Calleja

Diego Calleja said, "Lots of people think that ext3 is very slow. While I'm not claiming that ext3 is the fatest fs in the world, I told some people to look at Documentation/filesystem/ext3.txt and try to tweak it before doing some benchmarks. To my surprise, several ext3 mount options were not documented (not even in the source) except in some sites spread across the internet, so it's not a surprise that lots of people ignores some mount options when doing benchmarks, like the commit interval." He posted a documentation patch, saying, "This documents commit (or it tries, sorry for my english), groups the journal-related options in the same place of the document and adds other mount options without documenting them (like the ones related to acl, xattr, resizing, reservations, barriers)." Various folks offered suggestions and further data, which Diego collected and converted into new versions of the document. These he posted throughout the discussion.

5. VFS Mount Option Extensions

18 Aug 2004 - 20 Aug 2004 (7 posts) Archive Link: "[PATCH] Bind Mount Extensions 0.05"

People: Herbert PoetzlChristoph Hellwig

Herbert Poetzl said, "The following patch extends the 'noatime', 'nodiratime' and last but not least the 'ro' (read only) mount option to the vfs --bind mounts, allowing them to behave like any other mount, by honoring those mount flags (which are silently ignored by the current implementation in 2.4.x and 2.6.x)" . Several folks expressed a lot of interest in getting this patch into the main kernel tree; but Christoph Hellwig said the patch had problems with its interface, and would not be accepted in its current form. The feature, he affirmed, was good; and he expected to see it in the main kernel within a year; though perhaps not in the 2.6 timeframe.

6. dmraid Version 1.0.0-rc3 Released

18 Aug 2004 - 19 Aug 2004 (3 posts) Archive Link: "*** Announcement: dmraid 1.0.0-rc3 ***"

Topics: Disk Arrays: RAID

People: Heinz Mauelshagen

Heinz Mauelshagen said:

dmraid 1.0.0-rc3 is available at http://people.redhat.com:/~heinzm/sw/dmraid/ in source, source rpm and i386 rpm.

dmraid (Device-Mapper Raid tool) discovers, [de]activates and displays properties of software RAID sets (ie. ATARAID) and contained DOS partitions using the device-mapper runtime of the 2.6 kernel.

The following ATARAID types are supported on Linux 2.6:

Highpoint HPT37X
Highpoint HPT45X
Intel Software RAID
Promise FastTrack
Silicon Image Medley

This ATARAID type is only basically supported in this version (I need better metadata format specs; please help): LSI Logic MegaRAID

Please provide insight to support those metadata formats completely.

Thanks.

See files README and CHANGELOG, which come with the source tarball for prerequisites to run this software, further instructions on installing and using dmraid!

7. Manuel Estrada Sainz, Firmware Loader Maintainer, Deceased

18 Aug 2004 - 20 Aug 2004 (5 posts) Archive Link: "[PATCH] Firmware Loader is orphan"

Topics: CREDITS File

People: Jesse BarnesHorst von Brand

Ramóon Rey Vicente said that Manuel Estrada Sainz, the kernel firmware loader maintainer, had died that May. He posted a patch to list the feature as orphaned, and asked if anyone else was going to maintain it. Horst von Brand offered his condolences; and folks agreed to add Manuel to the CREDITS file along with the late Leonard Zubkoff. Jesse Barnes remarked, "Adding the fact that they're deceased is probably a good idea, lest people email them, expect a response and try to flame them or something (as happened recently with Leonard iirc)."

8. Linux 2.6.8.1-mm2 Released; Includes Reiser4

19 Aug 2004 - 25 Aug 2004 (47 posts) Archive Link: "2.6.8.1-mm2"

Topics: Compression, FS: ReiserFS, FS: ext2, Kernel Release Announcement

People: Andrew MortonRyan CummingRik van RielHans ReiserTony LuckChris WedgwoodWilliam Lee Irwin III

Andrew Morton announced Linux kernel 2.6.8.1-mm2, saying:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.8.1/2.6.8.1-mm2/

Ryan Cumming took umbrage at the language of the ReiserFS help text. He quoted the help text, "ReiserFS V3 is the stablest Linux filesystem, and V4 is the fastest. In regards to claims by ext2 that they are the de facto standard Linux filesystem, the most polite thing to say is that many persons disagree, and it is interesting that those persons seem to include the distros that are growing in market share. See http://www.namesys.com/benchmarks.html for why many disagree." Ryan said, "These statements seem inflammatory at best, and bloat an already large help text. Looks like it could use a little editing." Chris Wedgwood said that this was just 'Hans-speak', and that folks who didn't like it should just ignore it.

Rik van Riel took a look at the ReiserFS code and had some problems with it. He said:

The reiser 4 system call sys_reiserfs seems to need an additional patch, which is craftily hidden inside reiser4-only.patch

That patch creates fs/reiser4/linux-5_reiser4_syscall.patch, which I can only assume reiser 4 users should apply...

Kind of ugly.

Looking further, the horrors only increase. It looks like sys_reiser4() is an interface to load programs into the kernel, with reiserfs4 containing an interpreter.

I'll leave aside the issues of having a scripting language inside the kernel, since I'm sure other people will comment on it.

However, I am absolutely flabbergasted that Hans Reiser is using a syscall here, instead of a filesystem interface.

Furthermore, why do the parsing in the kernel, instead of compiling the human-readable strings in userspace and loading something easy to use into the kernel, like the selinux subsystem does?

Since this code is bound to be horribly controversial, it may be an idea to remove this from the reiserfs4 core patch. That way the battles over the filesystem, and its interactions with the rest of the kernel can be fought first, without having the whole reiserfs4 filesystem strand in the quicksand of "why do we need an interpreted language with completely new filesystem semantics in the kernel?"

William Lee Irwin III was also troubled by having a scripting language in the kernel, but there was no further discussion on that point. Close by, regarding the sys_reiser4() system call, Andrew said:

It's my understanding that sys_reiser4() is basically defunct at this point.

It will probably be revived at some time in the future but we'd be best off crossing that bridge when we arrive at it, and ignoring the syscall part of the code at this time.

For review purposes it would be better if the syscall code and all the namesys debug support code simply weren't present in the patch. But one can sympathise with the need to keep it there for the time being. Please just read around it.

Rik agreed that ignoring that section of the patch would work for him; and was relieved that the system call was, for the moment, not in use. Hans Reiser also responded to Andrew, saying that 'defunct' was not such an accurate description of the situation. He offered:

I would say unfinished and in need of a code review by me before anyone starts using it, instead of defunct. There is no good reason for it to be sent to Andrew as a patch file, and the guy responsible is on vacation. What it should be in as is an experimental do not touch config option turned off by default.

sys_reiser4 is needed for these purposes:

Now that the core reiser4 functionality is stable, the lead programmers and I can spare some time to review sys_reiser4 and the compression plugin (also not yet ready for prime time). This will take us 6-12 weeks I would guess, as Digeo is keeping us 50% busy with work that earns our paychecks at the moment, darpa is also keeping me busy with www.namesys.com/blackbox.html (http://www.namesys.com/blackbox.html) , and I expect there will be a few bugs found in the core code over the next few months also.

9. Gamma DRM Driver To Be Dropped From The Kernel

19 Aug 2004 (1 post) Archive Link: "gamma drm driver.."

People: Dave Airlie

Dave Airlie said if the Gamma DRM kernel driver:

After a bit of discussion on the dri lists, we have come to the decision that it is probably necessary to retire the above driver, no dri developer is currently using the above hardware and the driver is so different from the others it makes a lot of hacks in the drm needed...

If anyone does actually use this driver and hardware let us know or it'll be marked as BROKEN soon and then it will actually break :-)

10. Cleaning Up #include Statements In 2.6; Stability Not Necessarily Top Priority

19 Aug 2004 - 25 Aug 2004 (16 posts) Archive Link: "includes cleanup."

People: Dave JonesWilliam Lee Irwin IIITim Schmielau

Dave Jones said:

I noticed that every file that could be built as a module was sucking in sched.h (and therefore, every other include file under the sun).

This patch

I've not done any measurements to see if this is noticable on a compile, as I'd expect it to be mostly in the noise anyway (though last time I did this in 2.5.early, it did shave off the best part of a minute off my worst-case-scenario build), but untangling the spaghetti of includes a little should at least mean gcc uses less memory during the build.

William Lee Irwin III remarked, "sched.h is such an extreme garbage can header I wouldn't mind seeing the whole thing torn completely apart. Every little trimming is good. =)" Some days later he replied to himself:

I hereby declare open season on linux/sched.h!

In preparation for moving all user-related bits out of sched.h and coopting linux/user.h for this purpose, this patch converts all inclusions of linux/user.h to asm/user.h

Tim Schmielau got into it, posting a huge patch and replying:

OK, let's go! ;-)

Let's see how often we can kill it's include lines. To start from a clean base, I looked at vanilla 2.6.8.1 first before trying out your patches.

analysis was i386-only, my personal config builds, allyesconfig does not (neither does it with an unpatched kernel)

There was no actual discussion during the rest of the thread, but William posted several additional patches as well, all fairly invasive. At one point Tim remarked, "I've postponed my work in late 2.5 for 2.7, but with the new development model it seems we are asked to destabilize 2.6 instead ;-)"

11. Linux 2.6.8.1-mm3 Released; Successfully Boots On A 512-CPU Altix

20 Aug 2004 - 24 Aug 2004 (49 posts) Archive Link: "2.6.8.1-mm3"

Topics: Framebuffer, Hot-Plugging, Kernel Release Announcement, Version Control

People: Andrew MortonJesse BarnesNick PigginWilliam Lee Irwin III

Andrew Morton announced Linux 2.6.8.1-mm3, saying:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.8.1/2.6.8.1-mm3/

There were various bug reports and other issues, and Jesse Barnes also remarked at one point, "Woo-hoo! This boots *without changes* on a 512p Altix! Now to re-run the profiles and try wli's new per-cpu profiling buffers." He replied to himself an hour later, "I applied wli's" [William Lee Irwin III's] "per-cpu profiling patch, added some tweaks that he and I discussed on irc and things look pretty good. We can now profile all 512 CPUs in the system w/o livelocking :)" He posted some performance stats, and various folks debated the merits.

12. Linux 2.6.8.1-mm4 Released; Andrew Describes Some Patch Submission Policy

22 Aug 2004 - 26 Aug 2004 (44 posts) Archive Link: "2.6.8.1-mm4"

Topics: Kernel Release Announcement, Kexec, Version Control

People: Andrew MortonWilliam Lee Irwin III

Andrew Morton announced Linux 2.6.8.1-mm4, saying:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.8.1/2.6.8.1-mm4/

In the course of various sub-discussions, William Lee Irwin III posted a patch and Andrew said:

I'd prefer it if you (and everyone else) could give a meaningful English-language Subject: to patches, please.

A well-chosen patch Subject: becomes a sort of globally-unique key by which the patch is tracked - I munge it into a patch filename and it propagates all the way into bitkeeper. It can be used for searching email folders, googling, inter-developer discussion, etc, etc.

Tim Bird leaped onto this one, patching the Documentation/SubmittingPatches to quote Andrew's email.

13. rng-tools Updated

24 Aug 2004 (1 post) Archive Link: "rng-tools updated"

Topics: Random Number Generation

People: Jeff Garzik

Jeff Garzik said:

Just posted version 2 of rng-tools at http://sourceforge.net/projects/gkernel/

This release fixes a problem related to 2.6.x kernels.

rng-tools is currently for users of hardware random number generators (RNGs), and the included daemon rngd fill the kernel entropy pool from userspace with the results of the output.

Future directions include:

 

 

 

 

 

 

Sharon And Joy
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.