Kernel Traffic #26 For 8 Jul 1999

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1323 posts in 5405K.

There were 500 different contributors. 213 posted more than once. 179 posted last week too.

The top posters of the week were:

1. Big File Clarification

23 Jun 1999 - 28 Jun 1999 (29 posts) Archive Link: "RFC: BSD system call revoke?"

Topics: BSD, Big File Support, SMP

People: Alan CoxAlexander ViroMatthew Kirkwood

Bernhard Kaindl announced that he'd started porting the BSD revoke() system call to Linux. Several people pointed out that Matthew Kirkwood had already done this and it hadn't gotten into 2.2; Alan Cox said, "I would suggest you feed it to Al Viro, as it and also large file arrays (expanding the fd arrays on an SMP box is kinda fun 8)) are very much related to his vfs threading work and file handle locking." Alexander replied, "gimme ;-) That's *the* right moment - I'm in the middle of ->files->fd cleanup right now. Alan, IIRC you've said that Linus has objections on the large file arrays patch in the current form. Could you summarize them?"

Alan replied:

The two were

  1. That he was worried it would break stuff (applications) and confuse users. It defaults to sane fd set sizes so it doesn't. Vendors ship this patch.
  2. There were various small cleanups it was wrongfully reverting - not to give wrong code, going back from nice size macros to doing the maths explicitly"

2. Linus Okays Raw IO Patches

24 Jun 1999 - 28 Jun 1999 (13 posts) Archive Link: "direct (unbufferd) disk access"

Topics: Raw IO

People: Victor KhimenkoStephen C. TweedieMatthew WilcoxLinus Torvalds

Christian Hammers wanted to write to disk without caching the data in kernel memory, and Matthew Wilcox gave him a pointer to Stephen C. Tweedie's raw IO patches, at

Victor Khimenko said, "Just DO NOT ask about problems with your kernel when such patches are installed. Linus has STRONG feeling AGAINST such patches (and AGAINST raw IO at all!) so if you'll have problem with them do not bother him."

Stephen piped up, "That turns out not to be the case... The current raw IO patches use a clean internal architecture to pass the user space IO to the block device layers, based on a design Linus and I sketched out."

But he added that reports from folks using those patches should still go to him rather than Linus Torvalds.

3. The Buffer Cache In Development Kernels

27 Jun 1999 - 29 Jun 1999 (6 posts) Archive Link: "Status of the buffer cache in 2.3.7+"

Topics: SMP

People: Steve BergmanLinus Torvalds

Steve Bergman asked:

with the new unified page cache, is the buffer cache completely obsolete? There is still a number for it in top and vmstat (which comes from /proc, I suppose) which seems to increase practically without bound as block devices are read. I assume the number is bogus now.

Also, I am trying to understand how much difference this actually makes as far as memory usage is concerned. On a "typical system" (yeah, right ;-) how much duplication of cache was really occurring. And is this significant for low memory systems as well as large memory ones?

For the first question, Linus Torvalds replied:

The old buffer cache is still used for meta-data: the page cache only handles "real" file data.

We may at some point map meta-data too in the page cache (others have done so), but there really isn't all that much point to it. The buffer cache does exactly the right thing for meta-data.

For the second question, Linus added:

The cache duplication under "normal" load was often basically zero.

However, what is "normal" to some people is not normal to others. There are lots of real cases where the cache duplication was a major problem. In many cases you could never tell, but then occasionally there were really bad cases.

Also, the new code not only gets rid of the duplication, it also cleans up a lot of stuff that was rather ugly before. Shared writable mappings are squeaky clean - and they sure as h*ll did not use to be that way ;). So there are more "conceptual" reasons why the new code is just infinitely preferable to the old code.

The new code also scales much better under load and on SMP. The old code could probably have been made to scale too, but it was much easier with the page cache, as the page cache has a much clearer abstraction. Even so, it was a major project, and Ingo did a _lot_ of heavy lifting (on the kernel mailing list you mainly see the arguments about how it should be done, so don't get them wrong: Ingo has been doing some outstanding work to get it all working so cleanly).

4. Modular IPv4

28 Jun 1999 (6 posts) Archive Link: "modular ipv4"

Topics: Networking

People: Andi KleenMatti Aarnio

Arkadiusz Mi?kiewicz asked if IPv4 would be modularized in 2.3.x or 2.4.x; Andi Kleen replied, "It is a rather hard problem, so it is unlikely to happen for 2.4 if the "2.4 in the fall" plan should work. First other more important things need to be fixed." But Matti Aarnio said, "Hard ?? I have done it now TWICE, it took me about a week every time." He gave a URL ( , and added, "Care to spend a moment to bring that sync to 2.3 series kernels ?"

Andi asked, "So did you fix the numerous races caused by timers?" and Matti replied:

Possibly not, but eventually I was able to kill all timers whose usage had ended before I unloaded the module. (Failures to do so announced themselves very soon after unloading the module.)

My primary mission still was to modload the IPv4 only when needed, and after that it isn't likely so important to unload it...

5. Bugfix For Magneto-Optical Drives

28 Jun 1999 (1 post) Archive Link: "Removable media bug FOUND !!"

Topics: FS

People: Giuliano Pochini

Giuliano Pochini posted a patch and announced, "Finally I've located the bug that caused me so mush truobleas with my magneto-optical drive. The bug is in genhd.c::amiga_partition(), that's why all works fine to 99% of people..."

6. Loading And Unloading The Framebuffer Module

28 Jun 1999 - 4 Jul 1999 (4 posts) Archive Link: "Framebuffer question."

Topics: Framebuffer

People: Geert UytterhoevenJeff GarzikPetr Vandrovec

Someone was trying to fix some Matroxfb problems; they pointed out that it didn't seem possible to unload the framebuffer module once it was loaded, and asked why it was possible to compile it as a module in that case. Jeff Garzik suggested contacting Petr Vandrovec, the Matroxfb maintainer, and getting his latest test version (which might solve the problems the original poster was having). Jeff also recommended checking out the Framebuffer mailing list ( under majordomo, as well as reading linux/REPORTING-BUGS before posting problems to linux-kernel.

Jeff also added that unloading the framebuffer module was not possible, but Geert Uytterhoeven disagreed, saying it was possible as long as no VCs were mapped to a frame buffer device.

The question of how to map VCs back to non-framebuffer devices went unanswered.

7. Filesystem Corruption Saga Continues

29 Jun 1999 - 30 Jun 1999 (6 posts) Archive Link: "Re: Massive e2fs corruption with 2.2.9/10?"

Topics: Debugging, FS

People: Linus TorvaldsAlan Cox

The filesystem corruption discussed last week in Issue #25, Section #3  (16 Jun 1999: FS Corruption With Later 2.2.x?) is still getting reports. As Alan Cox said in his diary for June 29 ( , it looks like something is going on with the latest stable kernels, but it's really tough to find out what. This week there were more reports, but no solutions.

Later, under the Subject: fs corruption with pre-2.3.9-5 + this little patch ( , someone said they assumed that nobody had yet tracked down the 2.2.x FS corruption, and Linus Torvalds replied:

That's correct. There seems to be _some_ correlation with the aic7xxx driver, but even that's so weak that it might just be that it just shows up a lot because it is so popular. There's certainly a stronger correlation with overclocking and overheating ;)

It should be noted that _all_ the 2.3.x corruption problems that have been discussed have simply been due to bugs in the new page cache and buffer cache code, and their interactions with low-level filesystems etc. That's simply because a lot of things changed, and we're apparently finally reaching a stable level now.

("stable" wrt 2.3.x does not mean bug-free in this case: I'm sure we'll be hunting down races for the next few months, but the fundamental problems seem to be sorting themselves out quite nicely).

8. wait_queue Changes Summarized

1 Jul 1999 (6 posts) Archive Link: "Any documentation anywhere on the new wait.h?"

Topics: Big O Notation, SMP

People: Ingo MolnarClemens HuebnerHans Reiser

Hans Reiser that the 2.3.x wait_queue had changed, and was looking for some docs on it. Ingo Molnar replied:

there have been four major changes/goals wrt. the waitqueues changes:

  1. waitqueue heads were separated from waitqueue entries, data-structure wise. Formerly the head was a pointer, which was not generic enough, see later.
  2. the waitqueue list has been changed to be a double-linked never-zero ringlist. This has obvious micro-speed and algorithmical scaling benefits, formerly remove_from_wait_queue() had to potentially traverse all the waitqueue to remove a single entry. Now it's all O(1).
  3. the generic datastructures enabled us to add per-waitqueue spinlocks which makes us scale better on SMP. Particularly __wake_up() tends to hold the waitqueue lock while doing other stuff (well, waking up processes), so this is a definit win. It was also easy and seemless due to the generic data structures. The spinlock architecture is atm. 'dual', which means that it can be switched between readwrite and 'simple' spinlocks via a define. The 'simple' version was benchmarked to perform better, that one will probably stick and the rw-version will be removed.
  4. all these changes enabled to implement the primary goal that triggered all these changes and cleanups: it was possible to add wake-one semantics for wakeup() in a clean way. (see the TASK_EXCLUSIVE stuff)

compiler_warning is there to make old code generate more warning messages when you old-style initialize waitqueues.

the debugging stuff will be removed before 2.4 - the frequency of waitqueue-related bugs is already very low. There is still some small benchmarking work to be done wrt. the wakeup order of exclusive tasks.

Clemens Huebner posted a patch and replied, "Unfortunately the changes break sysv ipc. I submitted the attached fix, but it apparently got lost..."

There was no reply.

9. Filesystem Reorganization In Development Series

1 Jul 1999 - 4 Jul 1999 (2 posts) Archive Link: "FS reorg in 2.3.x"

Topics: FS: FAT

People: Victor Khimenko

Adam Schrotenboer noticed that the FAT FS was broken in the latest kernels and wouldn't compile. He was trying to fix some of the name changes and so on, but he was still getting linker errors. He added that he was not subscribed to linux-kernel.

Victor Khimenko replied, "If you think that just few functions was changed/renamed then you are not wrong. You are VERY wrong. Carefull review of FAT needed needed not "try to make it work with big hammer"." He added, "If you are not on list then you DEFINITELY must not try to fix it :-)) Yes, it's broken since filesystem in non-compileable state is by far better then filesystem with trash-full-your-disk feature built-in ..."

10. Module Packaging

1 Jul 1999 - 4 Jul 1999 (15 posts) Archive Link: "Standard for module delivery"

Topics: SMP

People: Stephen WilliamsMatthew WilcoxDavid WoodhouseSteve DoddTheodore Y. Ts'o

Stephen Williams said:

After seeing the various rants about interface changes and compatibility and so on and so forth, I have an ongoing nit that seems should be taken care of.

My problem is that I (my employer included) support Linux drivers for our various boards. Hell, we start there and do NT later, when development is stabilized. I write module-only drivers (Linux and NT) and I see no reason to expect the linux source tar bundle to include my driver.

However, that leads me to the problem of how to properly and portably make binary and source rpms, not to mention makefiles.

It's not like we are trying to hide the source. On the contrary, we GPL the driver code and some of it has even been seen on this very list. However, I still want to make a binary RPM that a customer can download (or a src rpm) and have the customer reasonably expect to be able to install that driver module.

Even if binary rpms are banned, I can't seem to make a working *src* rpm with a driver because of the following problems:

  1. Compiler parameters have to be guessed. If there were an includeable make header file that sets kernel mode compile options that would be great. I support Intel and alpha ('cause that's what I have) and need to guess different compiler flags for each.
  2. Where do I install the module? The "make modules_install" of the kernel source uses one convention, the redhat kernels use another, I'm afraid to ask what the other distributions do. Can't we just pick a place and use that?

I think that these are very much kernel issues because it relates to how driver writers directly interface with the kernel internals. Even the open source products need a way to interface with the compilation environment of the kernel.

I'm willing under Linux to require a C compiler be installed, but I'm sure many developers would like to see a standard means to support external source modules. If I can make rpms that a linux user can install, I will be happy.

David Woodhouse suggested doing

        cd /usr/src/linux
        make SUBDIRS=$MYDIR modules

Stephen was impressed, but didn't see why the kernel sources had to be installed. He'd compiled successfully with only headers installed. But Steve Dodd pointed out that that was the only way to tell whether to compile for SMP, etc.

Matthew Wilcox mentioned that Theodore Y. Ts'o had written a paper on the subject for Linux Expo 99; pages 241-252 of the proceedings. Ted posted URLs to the PostScript version ( (which he said was better quality) and the original PDF version ( .

11. Linux 2.2.10ac6 Announcement And Problems

1 Jul 1999 - 2 Jul 1999 (3 posts) Archive Link: "Linux 2.2.10ac6"

Topics: FS, Kernel Release Announcement

People: Stephen C. TweedieAlan CoxDavid S. Miller

Alan Cox announced 2.2.10ac6, but David S. Miller said that the drivers/char/Makefile change broke Sparc builds.

Under the Subject: New 2.2.10-ac6 compile error ( , Richard A Nelson reported compiler errors, and David said he'd sent Alan the fix.

Under the Subject: SLAB_POISON in patch-2.2.10ac6 ( , Stephen C. Tweedie pointed out that the debugging code Alan had put in to help track down the elusive filesystem corruption that some folks were seeing, was not quite right. He posted a patch, and said, "Anyone seeing fs corruption on 2.2.10 is invited to drop this in to enable extra debugging. It basically causes all kmalloc()s to be filled with a predetermined pattern (all 0x5a), and kfree() to be filled with all 0xa5. If we see those patterns cropping up in the oops messages then it will be a sure sign of somebody trying to access freeded memory, and we stand a much better chance of tracking the culprit down."







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.