Kernel Traffic #27 For 15 Jul 1999

By Zack Brown

Table Of Contents


My computer is crashing again, though not as bad as last time (kt19990327_11.html#0) (yet). BTW what ended up happening last time was, I bought a desktop and sent this laptop back for repairs. The motherboard got replaced and the machine was shipped back to me about a month and a half ago. Now I am in SF and my desktop is in NY, so all I have is this shitty laptop that is about to die. Currently, the system has already been partially hosed, but I don't know how bad. Some data has definitely been lost, and as I write this, KT is still not finished. I don't expect any long term KT or KC downtime though, because I'm at Linuxcare and there are a lot of computers here. :-)

Mailing List Stats For This Week

We looked at 1146 posts in 4613K.

There were 430 different contributors. 193 posted more than once. 178 posted last week too.

The top posters of the week were:

1. Treating Directories As Files

25 Jun 1999 - 7 Jul 1999 (66 posts) Archive Link: "[RFC] File flags handling - proposal for API."

Topics: BSD, FS: FAT, FS: ext2, Ioctls, Raw IO

People: Alexander ViroLinus TorvaldsChris EvansStephen C. Tweedie

Continuing the discussion from Issue #25, Section #5  (20 Jun 1999: Treating Multiple Files As One) , of treating directories as files, Alexander Viro proposed:

With all talks about mixed directory/symlink objects, etc. we *will* need a decent way to work with such attributes. I dearly hope that it will not be done via weird ioctl() as it's currently done for ext2 flags (for one thing because the object we are operating with may not be openable - e.g. symlink).

Notice that there is a difference between the attributes affecting the VFS behaviour (append-only, etc.) and ones the VFS (or even kernel) doesn't care of (no-dump, etc.). For the latter we probably want to pass the raw attributes to the fs and let it figure out what it wants. For the former the common representation is needed. So we want a way to deal with raw and generic attributes.

I think that the right way here is to add 6 (ouch) syscalls: {l,f,}chflags() for setting those attributes and {l,f,}new_stat() for reading them. The reason for latter 3 is simple - potential races.

Differences between lfoo, ffoo and foo are the same as in cases of chmod, chown, etc.

int chflags(char *name, int is_raw, int flags) would set the attributes - raw if the second argument is true, generic ones if it is false.

int new_stat(char *name, struct new_stat * buf) would fill the buf with extended variant of struct stat - additional fields being generic and raw attributes. This will be redundant, indeed - that would allow fs-specific programs to deal with raw attributes in redundant way, but at the same time spare normal programs from the need to know the representation of generic attributes on particular filesystems.

Modulo the distinction between raw and generic attributes this was tested on 4.4BSD - it works fine there.

I am ready to write the thing - I have the patch mostly done. It's non-intrusive, in sense that any filesystem that doesn't want to know about those animals will require no changes. If you are OK with the proposed API I'll do it. Comments?

Linus Torvalds replied, "I've long wanted to have "flags" system calls, because I hate ioctl's with a passion. However, nobody was bothered enough by it to actually implement it.." but added that 3 syscalls, rather than 6, would be better:

just make it do

    {l,f,}chflags({name,fd,name}, u32 *old, u32 *new);

and then you can read and write the flags with just one system call. I do not want to extend on stat() yet again.

(Several days later when the thread had ended, Stephen C. Tweedie pointed out that he had indeed implemented something like this awhile ago and included it in his old raw IO patches.)

Meanwhile, Alexander replied to Linus about extensibility, "OK. But there is one but - we definitely have flags on ext2, ufs and <urgh> FAT ("System" == Immutable). I'm more than sure that other filesystems will join too - we are risking to run out of bits if we will go for plain union and we will get the whole lot of nasty bit-shuffling in bargain. I still think that distinction between generic and fs-specific deserves to be done in API. I.e. we are either passing raw attributes and then the result is fs-specific (and it's fs responsibility to recalculate generic attributes) or we are interested in generic attributes and then we don't want to mess with fs differences. <generic> + <ext2-specific> + <ufs-specific> + ... in one bitmap will lead to major PITA. <generic vs. raw> + {<generic attributes> | <attributes of whatever fs it sits on>} will give us decent bandwidth and make extensions *much* easier."

He went on to suggest, "Hmm... Methink I know what should be done here: chflags(name, level, old, new) where level being either FL_VFS or FL_EXT2 or FL_UFS, etc. IOW, the scheme similar to setsockopt(). Comments? That way we would get an additional safety - if the call is done for the object on wrong fs it will return -ENOPROTOOPT ;-) IMO it's cleaner and will reduce the future clutter."

Regarding extensibility, Linus replied:

That's why I had the new flags through a pointer also: not only can the pointer be NULL, but it implies that we can more easily extend the set of bits.

HOWEVER. I don't think we want all that many bits at all. My preferred suggestion is to go with

        u32 generic_bits;
        u32 fs_bits;

and NOTHING more. Otherwise we'll just encourage people to go crazy with the bits, and I do not want that.

(This is why I hate code that tries to be too generic and take everything into account - it's not the code that is bad, but it's the _implication_ of the code that I dislike).

In fact, maybe we should codify it with a structure:

        struct file_flags {
                unsigned int generic_flags;
                unsigned int fs_specific_flags;

and just pass in the structure pointer.

And to Alexander's suggestion, Linus went on:

I would not be disappointed with that kind of approach either. But if so, do limit the flags to 32 bits (and then if somebody _really_ wants to go wild, he can just specify multiple "levels").

Oh, and if you do this, please reserve levels 0-255 or something like that for "generic" flags. I may not like excessive generic features, but if done, they should at least be done _right_.

Alexander said, "OK, I'll go for that variant." This ended the subthread.

(Meanwhile, Chris Evans said, "Alexander, if you are working in the area, could you remove the restriction that a block device can't be made immutable? I doubt anyone has any objections to the principle of an immutable block dev?" and Alexander replied, "Yes, and it goes for free - with the current system the thing doesn't work only because ioctls are naturally redirected to the device driver - ext2fs doesn't see them." )

2. Capabilities

1 Jul 1999 - 6 Jul 1999 (26 posts) Archive Link: "[security]: kernel ioctl()'s [3]"

Topics: Capabilities, FS

People: Linus TorvaldsChris EvansTheodore Y. Ts'o

Chris Evans thought he found a logical flaw in the implementation of filesystem capabilities, where users with CAP_IMMUTABLE could change the 'immutable' flag on files they didn't own. He included a patch; Theodore Y. Ts'o liked it and recommended it be applied to 2.3.x, if not 2.2.x (since it was very low-risk, in his opinion). Chris said he'd shoot it over to Linus Torvalds, in that case.

3. linux-kernel Mailing List Errors

2 Jul 1999 - 6 Jul 1999 (13 posts) Archive Link: "Mailbox"

Topics: Mailing List Administration

People: Keith Owens

Fabian Frederick asked a question, but the linux-kernel list software incorrectly attributed his post to Keith Owens. Keith pointed this out, saying, "There were two mails from Fabian.Frederick that immediately followed one of my mails and on at least one l-k archive, all three are incorrectly assigned to me."

There was no reply on that point, although there was some discussion of Fabian's original question.

4. Bug In 2.2.10ac6 (and ac7)

2 Jul 1999 - 7 Jul 1999 (5 posts) Archive Link: "Help with Linux 2.2.10ac6"

People: Alan Cox

Gary L. Hennigan decided to try an -ac patch for the first time, and got compiler errors with 2.2.10ac6:

drivers/char/ 170: can't handle dep_tristate condition
make[1]: *** [] Error 1
make[1]: Leaving directory /usr/local/src/linux-2.2.10.ac6/scripts'
make: *** [xconfig] Error 2

Alan Cox confirmed that this was a bug, but added that he had already uploaded ac7 with the same bug, a few minutes earlier.

5. kernel/resource.c Backwards Compatibility

3 Jul 1999 - 5 Jul 1999 (9 posts) Archive Link: "kernel/resource.c breakage"

Topics: Backward Compatibility

People: Mikael PetterssonDavid HindsJeff Garzik

Mikael Pettersson reported, "kernel/resource.c, in kernel 2.3.7 and later, is semantically incompatible with previous versions, and may break existing working code." He zoomed in with, "The main problem is release_region(). Previously it would discard all information about the region. Now it _keeps_ the region description (base+extent), but gives it a magic marker (name==NULL). The new check_region() and request_region() will allow an existing "magic" region to be reallocated in its entirety. Other allocations that partially interfere with a "magic" region will not succeed." He added that he had discovered this when ftape broke after an upgrade.

David Hinds, author of resource.c (among other things), said, "You are correct that the semantics of release_region() have subtly changed, and that in unusual cases, drivers will need to be tweaked to accommodate it. I had not thought that any drivers did the sort of thing that ftape apparently does. I think that fixing the drivers is the correct course of action, not effectively reverting the new semantics as you suggest."

Jeff Garzik also replied to the original post, "There was disagreement from several people about this... Several people thought that the resource allocation code should not leave info around after it has been released, resulting in the behavior you describe." David replied, "The "leaving info around" is going to happen because it's the whole point. To safely configure a new device, you've got to know what resources are used by other hardware, whether or not a driver happens to currently own those resources."

6. QuickCam Driver

3 Jul 1999 - 6 Jul 1999 (2 posts) Archive Link: "Creative QuickCam"

People: Aaron Burt

Paul de Regt asked if anyone was working on a driver for Creative Lab's QuickCams, and Aaron Burt said there was one at

7. Tekram Bug, Hunt And Fix

4 Jul 1999 - 8 Jul 1999 (12 posts) Archive Link: "Oops with kernel 2.2.10-ac8"

Topics: Disks: SCSI, FS

People: Kurt GarloffAnzolin GianlucaAlan Cox

Anzolin Gianluca tried to load the Tekram DC390 SCSI controller module (tmscsim.o) under 2.2.10-ac8 and got an oops. Under 2.2.10-ac5 all was well. He posted the oops, and Kurt Garloff (author of tmscsim) said it was a bug in his module, triggered by Alan Cox's recent debugging code (which was originally added to try to track down the filesystem corruption some folks have been complaining about). Kurt said the problem seemed to be with the SRB (SCSI Request Block) queuing, and added, "Please note that I'm about to release 2.0e where I reworked the command queueing. I will install ac patches in order to see, whether I did recreate that bug. If you want to test it, please look out for 2.0d12 on my web page:"

Anzolin said he was willing to test patches, and reported a similar oops with the 2.0d12 driver. Kurt put up 2.0d13, adding, "Problem was that during the device scan, unneeded DCBs were removed, but a little bit later, some fields inside were updated. This is been taken care of, now," but Anzolin still saw the oops. He posted some information and another oops. Kurt felt he'd found the problem now, and put up 2.0d14. Anzolin replied, "Great !:) Now it works. I'm trying to burn a CD in order to see if everything goes well but now I can't get the oops any longer. Thanks for your help, you did a great job (a problem fixed in a day !! :)"

8. Dynamically Adding Syscalls, Revamping Driver System?

4 Jul 1999 - 5 Jul 1999 (4 posts) Archive Link: "Adding new syscalls via modules?"

Topics: Ioctls

People: Alex BuellNimrod ZimermanAlexander MaryanchicStephen C. Tweedie

Alex Buell said:

I'm tired of rebooting the machine everytime I make a new kernel, so I thought why not implement some mechanism through the kernel module loader such that one could load a module to install a new syscall or a sysctl.

I'd like to know if there's sufficient interest in this, I think it would be brilliant for testing new sysctls/syscalls simply by loading a module and unloading afterwards.

Outline how it could work

  1. Add a syscall to the kernel that does two things a) 'inserts' a new syscall into the table of syscalls and returns the number where the syscall is inserted into the table b) 'removes' a syscall from the table of syscalls. It will only remove only syscalls that it has added (to prevent the case of people trying to be clever and removing syscalls such as vfork...)
  2. Within the given module containing the syscall, when the kernel calls the module's initialisation function, this function would then make the syscall to insert its own syscall into the kernel's syscall tables. The reverse would happen on unloading, the deinitialisation function would remove the syscall from the table.

How's that? That could be used with sysctls to add new stuff quite easily.

Nimrod Zimerman replied:

You can do this manually now, though not in a safe manner, by directly changing the sys_call_table array of system calls. Also, in a proposed patch by Alexander Maryanchick (dated around Jun 25), you can do this through what appears to be a cleaner way. (Obviously, you'll have to come up with some way of synchronizing between different modules, or things wouldn't be too friendly).

I've done such things in the past (and am actually currently working on something that messes with it too), with no problems. I've been running a system that has used several 'new' system calls, as well as 'overwritten' some existing system calls, as a production system for lengthly periods of time with no observable trouble.

Alexander Maryanchic replied that his "cleaner way" was not perfect and that what was really needed was a whole new driver system. Under the Subject: [RFD] New driver system (was Adding new syscalls via modules) ( , he went on to say that the problem boiled down to a lack of extensibility, and proposed:

  1. All critical kernel structures must be isolated. They must not be accessed directly from other kernel parts.
  2. We must be ready to move the drivers to Ring1 and Ring2 when hardware (Merced?) will be acceptable quick for this.
  3. New drivers must have rights to register new services (syscalls?)
  4. New drivers must have an interface to be transparently intercepted.

There was not a lot of discussion, and key developers were unenthused. Stephen C. Tweedie pointed out that Alexander's #2 proposal would significantly impact performance, and his #3 proposal was already possible (he said, "ioctls let them register arbitrary new auxilliary services." )

At one point, Alexander said, "As I see, my idea of new driver system is not very popular. Well, may be I'm wrong. But I hope you'll change you opinion in the future."

9. SIGIO Bug Fixed

4 Jul 1999 - 5 Jul 1999 (6 posts) Archive Link: "SIGIO broken in 2.2 on unix domain socket close"

People: Malcolm BeattieAlexey Kuznetsov

Malcolm Beattie said, "Async I/O via SIGIO is broken in 2.2.5 when a Unix domain peer socket is released," and posted the C code for a test program that would work under 2.0.36 but not under 2.2.5 (he hadn't tested the later 2.2 kernels). Alexey Kuznetsov replied that this had been fixed in 2.2.8 and later.

10. Legacy Compatibility

4 Jul 1999 - 6 Apr 1999 (7 posts) Archive Link: "[PATCH] putting old-style lock handling back into 2.2.10"

Topics: Backward Compatibility, Disks: IDE, Executable File Format, FS: ReiserFS, FS: devfs, Microsoft

People: David ParsonsSteve Dodd

david parsons posted a patch and said, "I noticed (when it made one of my test machines blow up and die) that 2.2 doesn't support the old-style F_EXLCK locking anymore. This little patch (against 2.2.10) puts that locking back, so that people using a.out gdbm programs won't have them blow up on them."

Steve Dodd replied, "Uhhuh. I can quite understand not wanting to upgrade from libc4 / a.out binaries, but I can't understand sticking with libc4 & a.out, and then chucking the 2.2 / 2.3 kernel on it. That's just weird. Why not stick with 1.2.13 or 2.0.37? If there are outstanding bugs in those versions they can be maintained by whoever is interested -- I assume you patch libc4, so how would this be different?" He added, "The theory is that after a major version or two we can drop the really old stuff. It's not doing harm /per se/, but it's nice to view the current kernel source as a reference implementation (reasonably) unencumbered by cruft."

In defense of plunking recent kernels on his ancient software, David replied:

Because 2.2

I don't feel that Linux should emulate the more unsavoury aspects of Microsoft Windows -- in that when you update the kernel you have to willy-nilly update everything else so that it would work. Of course 2.2.x is basically obsolete now that the 2.3 branch is active, but it will make an adequate placeholder until 2.4 is released.

11. x86 AGP Support

5 Jul 1999 (5 posts) Archive Link: "AGP Chipset Support"

Topics: PCI

People: Paul SargentMichael B. TrauschGerard Roudier

Paul Sargent asked, "I wondering if there is any support for AGP specific functions in any of the x86 motherboard chipsets. I'm thinking about any configuration that might be needed to enable AGP Sideband, 1x, 2x and possibly 4x transfers." Michael B. Trausch replied, "Hm... Well, I can say that I've run Linux on a few AGP machines, and can use vesafb on them. X is starting to work on support for them -- but just *why* would the kernel have to have AGP specific code? (Forgive me if that's a stupid question :)."

Gerard Roudier replied:

It is not a stupid question, in my opinion. Using AGP Fast Write transactions instead of PCI write transactions for the frame buffer could speed up it significantly, for example. This also may apply to X servers.

But, the primary goal of AGP has been to allow 3D engines to execute textures directly from main memory.

Note that the AGP specifications seem to be close to obsolete before having really been used. The AGP 4X allows up to 1 GB/s memory bandwitch but some recent boards have an internal memory speed that is greater than twice this value.

In my opinion, if something that allows a 64 bit data path (32 bit currently) and faster memory acces (currently up to 4x66Mhzx32bits) will not be proposed by Intel in the short term, the AGP will fall into the "oubliettes" of the I.T. very quickly.

Paul also explained himself to Michael:

That's not really a stupid question in the context of user space graphics drivers, but I'm playing with kernel level abstraction (much like fbcon or kgi) and one thing that interested me was getting AGP DMA to work + a few of the other AGP specific functions.

As I understand it this involves some tweeking or the host chipset to set up (although it looks like the BIOS should do this at start of day). Also there is a address translation table called the GART which can be used for scatter-gather type operations (although I'm informed Intel is not telling anybody about how to use this at the moment).

And finally I've heard that any memory allocated which is going to be read by AGP DMA has to be a special type.

You may notice that my above comments sound a little unsure of themselves. That's because all of my information comes from people writing drivers under Windows (NT or 95/8) so I don't know if these issues are really hardware issues or MS-isms because they've (for example) decided to allocate AGP memory out of a different pool.

So I was just looking for people who may have already touched on it.

12. i386 S3 (Trio64) Frame Buffer Driver

5 Jul 1999 - 6 Jul 1999 (2 posts) Archive Link: "Anybody doing i386 S3 (Trio64) frame buffer video device driver?"

Topics: Framebuffer

People: Jeff GarzikVedran Rodic

Vedran Rodic asked if anyone was working on an i386 S3 (Trio64) frame buffer video device driver, and was interested in helping. Jeff Garzik said that the mailing list would be a good place to go for framebuffer questions. He added:

For the S3 Trio, I am combining S3triofb.c and cyberfb.c into a single, new driver. You can get the current development version at -- but be warned there are still many bugs to be fixed yet. You are of course more than welcome to help debug! :)

The driver is written so that, hopefully, it will take only a few lines to code to support the older S3 chips.

13. 'coma' Fix Missing From Latest Kernels

6 Jul 1999 - 8 Jul 1999 (15 posts) Archive Link: "Cyrix 'coma bug' fix was deleted from pre-2.3.10-4"

People: Steve DoddAlan CoxZoltan Boszormenyi

Zoltan Boszormenyi noticed that the Cyrix "coma" fix was missing from the latest unstable pre-releases, and wanted to know why. Steve Dodd didn't know the answer, but he noted that "Interestingly, 2.3.6 changed from the official work-around (setting the NO_LOCK bit, whatever that does) to something else, which I haven't managed to figure out as I don't have the Cyrix 6x86 data books kicking around." Alan Cox explained, "Because the fix in 2.3.10 is plain wrong, and because it can be done by set6x86 quite nicely anyway." Several folks suggested that the kernel should at least warn of the existence of the bug at bootup. Alan agreed and asked for patches.

14. '.config' Backward Compatibility

6 Jul 1999 (5 posts) Archive Link: "[PATCH] 2.2.10-pre4 IDE config problems"

Topics: Backward Compatibility, Disks: IDE

People: Andre HedrickAndrzej Krzysztofowicz

Andrzej Krzysztofowicz noticed that some old .config options were not being properly translated into the current format, under 2.3.10-pre4. He offered a patch to alter the various cases.

Andre Hedrick replied:

Yep, I know.

Since there are lots of other kernel changes that this tied togather, things are a lot ugly. IDE/BLOCK are in the process to part directories. Since "block" is now more than 85% IDE, there has been a decision to slipt the tree again.

I respect you changes and suggestions; however, I differ with the out come.

The moving of CONFIG_BLK_DEV_IDEDMA_PCI and CONFIG_IDEDMA_AUTO will most likely break other archs in the kernel. I spent five (5) days thing why is this needed. The only relfects the ifdefs in the code. This is a change that Linus started and I finished.

15. Adding Debugging To The Kernel

6 Jul 1999 (2 posts) Archive Link: "Ooops: 2.2.10-ac8 + ncr53c8xx driver"

Topics: Debugging

People: Douglas GilbertAlan CoxStephen C. TweedieKurt GarloffDan Hollis

Stefan Frank got his first oops ever, under 2.2.10-ac8; and posted the oops, his .config, and his dmesg info, as well as the versions of his kernel and libraries.

Douglas Gilbert replied:

2.2.10-ac8 was an experimental version from Alan Cox that has "cache poisoning" in it. This tripped up various bits of the kernel that used memory _after_ it was freed.

The oops you have reported is one such case in scsi.c scsi_unregister_device(). Kurt Garloff found it and posted a patch for it on linux-scsi. It will soon be fixed.

Dan Hollis if any other parts of the kernel would benefit from poisoning. Alan Cox replied, "I've talked to Stephen about playing with page poisoning when we free page cache pages. It would clobber performance but in the presence of strange errors might be instructive as a debugging aid" and Stephen C. Tweedie added, "We've found 4 or 5 problems in 2.2.10 using the slab poisoning stuff."

16. Linux Port To IBM Mainframe ESA/390

7 Jul 1999 - 8 Jul 1999 (3 posts) Archive Link: "IBM Mainframe Support"

People: Linas Vepstas

Linas Vepstas gave a URL ( and announced, "I have patches to the Linux kernel that add support for the IBM mainframe ESA/390 series of computers (and clones e.g. Hitachi but I don't know that anyone has yet tried that). Although these are still quite raw & not exactly functional (understatement), I figured it might be a good time ask about incorporating these into the e.g. 2.3.x series of kernels."

17. Extensive Buffer Patch Submitted

8 Jul 1999 - 10 Jul 1999 (9 posts) Archive Link: "[patch] 2.2.10 buffer patch"

Topics: FS

People: Andrea Arcangeli

Andrea Arcangeli stated:

I propose this patch for inclusion into 2.2.11.

This patch should be well tested (I tested heavily also the loopback and ramdisk device also in loopback over loopback). It seems ready for production. (btw the ramdisk still need some minor fixes but it's a separate issue)

18. Linux Port To The Hitachi SuperH SH-4

8 Jul 1999 (2 posts) Archive Link: "Porting Linux to a new architecture"

Topics: Executable File Format, FS

People: Mitchell Blank Jr

James K Whiting wanted to port Linux to Hitachi SuperH SH-4 based computers, for controlling robots in his MIT lab. He was looking for some pointers to documentation on Linux porting. Mitchell Blank Jr replied:

First check out: These folks are organizing a port to the hitachi sh-3 (which I guess is similar(?)). They don't seem to be far yet, but they might have developed some insight into the problem.

The first thing to do is determine how your hardware will load the kernel. If your existing bootstrap system can deal with an ELF executable then you're in business. Otherwise you have to decide if you're going to (1) make a new bootstrap, (2) make a two-stage loader, or (3) hack together a post-processor that turns the compiled kernel into something that your hardware will like. Make sure your bootloader can deal with initrd filesystems.

You should make sure that your toolchain is in order. The GNU tools all support the Hitachi, IIRC, but make sure that they can produce something resembling linux-elf.

Test your bootstrap and toolchain by making a tiny ELF executable that just does some simple hardware thing (like writing a string to a serial port).

Now, the kernel. Make a new include/asm-* and arch/* directory. These aren't completely documented but there are plenty of examples - just look around. One thing you'll have to is determine how the userland will interface with the kernel - i.e. what a system call will look like, etc. Sometimes the hardware makes the choice real easy, sometimes there's a lot of flexibility. Anyway, choose carefully since it'll be hard to fix later.

As for initial stuff to write - just get a console device up (usually a serial port is easiest). Don't worry about disk or network yet - you can boot off an initrd filesystem. (For an initrd filesystem, just make an "/sbin/init/index.html" that does a few simple system calls and loops) The other nice advantage of not doing disk or network is that you don't have to worry about getting DMA right initially.

Once that is working, you can easily start work in multiple directions. The obvious one is filling out the list of device drivers. The other one is to start porting libc (which is a whole 'nother mailing list :-) Once you have libc working you can start compiling a full userland. Congratulations, you're done.







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.