Kernel Traffic #30 For 5 Aug 1999

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1051 posts in 4313K.

There were 379 different contributors. 157 posted more than once. 116 posted last week too.

The top posters of the week were:

1. FAT Inherently Broken; COMA Workaround Removed

19 Jul 1999 - 26 Jul 1999 (34 posts) Archive Link: "Linux 2.2.11pre2 proposed patch"

Topics: FS: FAT

People: Linus TorvaldsAlexander ViroZoltan Boszormenyi

Alan put 2.2.11pre2 up on ftp://ftp.* ( and posted a changelog against 2.2.10. Linus Torvalds replied, "Looks good, except aic7xxx is wrong version ;) Tssk, tssk."

One of Alan's changes was "FAT now uses cluster numbering for inode info", which Alexander Viro took exception to. Alexander replied to the announcement:

It doesn't. It generates inumbers on the fly. Cluster numbering is unusable for that - truncate() *shouldn't* change inumber. FAT *has* no file invariants that would survive (a) rename, (b) truncate(), (c) write and (d) umount. Of all those umount give the least pain wrt races. New code guarantees constant inumbers for opened files.

The bottom line - inumbers on FAT will suck anyway. There is no inodes in normal sense. And inumbers changing after reboot are *much* better than exploitable races. On FAT usage of (old) inumbers for any backup stuff was broken - rename() would go unnoticed.

Another one of Alan's changes was to remove the COMA workaround, and recommend people just use set6x86 if they have that Cyrix CPU bug. Zoltan Boszormenyi said sarcastically that in that case, they might as well remove the f00f bugfix as well. Alan defended the change, and there was a discussion about which fix was enabled in which version and then switched for which other fix.

2. TCP Vegas Patch Announced

20 Jul 1999 - 27 Jul 1999 (19 posts) Archive Link: "[PATCH] TCP Vegas implementation available"

Topics: Modems, Networking

People: Neal CardwellRik van Riel

Neal Cardwell announced that his TCP Vegas patch (for network congestion control) against 2.2.10/2.3.10 was at He added:

You may remember that 2.1 had a Linux implementation that was removed after 2.1.91, due to performance problems, i gather. Our implementation is unrelated to that 2.1.x implementation (except, obviously, that they're both Vegas implementations :-), and makes several critical improvements over that earlier Vegas implementation:

From looking at the older Vegas implementation, i'd guess that these fixes address many of the performance problems in that implementation. Other fixes and enhancements are described on the web page.

Rik van Riel asked if PPP over analog phone lines would benefit from Vegas, and Neal replied, "For scenarios where the performance of long TCP flows is suffering because traditional TCP congestion control is driving up the queues on the modem to the point of loss and suffering massive timeouts, Vegas should help by keeping queues shorter, keeping RTTs lower, and reducing loss."

3. PPP Driver Rewritten

24 Jul 1999 - 25 Jul 1999 (2 posts) Archive Link: "new PPP driver"

Topics: Networking

People: Paul Mackerras

Paul Mackerras announced:

I have rewritten the PPP driver, separating the generic part from the async serial part. The new code is IMHO a lot cleaner than the old ppp.c. There is now a ppp_generic.c, which implements the ppp network interface units and the /dev/ppp device, and ppp_async.c, which implements the PPP encapsulation and framing for async serial lines.

The patch is at

It should apply OK against either 2.2.10 or 2.3.10. You will need a new pppd, which you can get at

I did all this before I became aware of Daniel Marmier's post of a week or two ago. I'm looking at merging his changes and mine.

He replied to himself seven hours later, saying:

I should have mentioned that if you want to try my new PPP driver (with the new pppd), you will need to do

mknod /dev/ppp c 108 0

If you use module autoloading and have PPP as a module, you will need to add the following to /etc/conf.modules:

alias tty-ldisc-3 ppp_async
alias char-major-108 ppp_generic

4. New Raw I/O Patches Announced

27 Jul 1999 - 28 Jul 1999 (4 posts) Archive Link: "New raw IO patches available for 2.2, 2.3"

Topics: Capabilities, Ioctls, Raw IO

People: Stephen C. TweedieMatthew Kirkwood

Stephen C. Tweedie posted a readme and gave a pointer to his raw IO patches ( , "for unbuffered, direct disk IO via standard Unix character mode raw devices." He added that the character major number had moved from 111 to 162, and probably wouldn't change again. The new patches, "should work on most 2.2 kernels (2.2.9 upwards at least), and on 2.3.12-pre5. Due to the extensive file locking changes in 2.3 recently, it will not work on older 2.3 kernels."

Matthew Kirkwood wanted the /dev/raw RAW_SETBIND ioctl to require CAP_SYS_ADMIN, and Stephen didn't see the need for the check, but added that if Matthew really wanted it, he'd accept patches; Matthew gave him one against 2.3.7pre12.

Under the Subject: Updated raw IO diffs ( , Stephen announced a new version, explaining that there was a one line bug in the 2.2.x patch.

5. Timer Patch Announced And Rejected

27 Jul 1999 - 30 Jul 1999 (19 posts) Archive Link: "PATCH: POSIX 1003.1b timer minor fixes"

Topics: POSIX, Real-Time

People: Robert H. de VriesLinus TorvaldsJakub Jelinek

Robert H. de Vries posted some minor fixes to the timer code, relative to 2.3.12-pre6; Jakub Jelinek added some changes for the SPARC, and in the course of discussion, Robert explained:

the implementation I have submitted is the bare bones super minimum POSIX implementation. Now everybody can add her own special clock/timer to the kernel. For instance if I would want to add an IRIG-B clock I just needed to write a device driver for the card and hook it into the system call infrastructure I have provided. It is not necessary to provide timer and clock functionality for each CLOCK_^lt;type>. Some clocks cannot generate interrupts in order to be usable as a timer, but can only be used as a clock. In that case you just implement the clock_* functions.

Secondly I remember that not too long ago I heard someone ask for greater precision for the clock function gettimeofday. So someone is interested in nanosecond precision. So there it is, go ahead, and implement gettimeofday as clock_gettime(CLOCK_REALTIME, :-).

At some point, Linus Torvalds threw some cold water on the party. There had been criticism of the patch based on timing resolution and accuracy, and a larger discussion seemed about to erupt, when Linus cut it short with, "Note that the kernel patch got dropped due to other concerns (the non-real-time siginfo part of the patch made the task structure a lot larger, something I hadn't initially realized but rth set me straight on that. So the final 2.3.12 won't have it after all, and people should look into which parts (if any) of this really _has_ to be in the kernel."

6. Built-In Kernel Debugger v. 0.5 Announced

28 Jul 1999 (1 post) Archive Link: "[PATCH] Built-in Kernel Debugger version v0.5 is available for 2.2.10"

Topics: Debugging

Scott Lurndal gave a pointer to v. 0.5 of the built-in debugger for linux (kdb) ( and listed the many new features and fixes.

7. 2.3.12pre POSIX Timer Code Space Issue

28 Jul 1999 - 29 Jul 1999 (7 posts) Archive Link: "[RFC] RT signals"

Topics: POSIX

People: Jakub Jelinek

Jakub Jelinek posted a patch, and said, "The implementation of POSIX timers which made it into 2.3.12pre adds a siginfo_t nrt_info[SIGRTMIN] array to task structure. That's 4K if I count well, which seems a little bit dangerous to me." Several folks were deeply disturbed by this, and there was a discussion of how best to keep processes from having to pass so much data back and forth. No resolution, though.

8. 2.2 Winchip Support

30 Jul 1999 (1 post) Archive Link: "PATCH: Winchip support (2.2.10-ac12)"

People: Dave Jones

Dave Jones posted a patch against 2.2.10-ac12, to enable the use of the CMPXCHG8B instruction, and to report 3DNow! in /proc/cpuinfo for Winchip 2.

Under the Subject: PATCH: Winchip support (2.3.12) ( , Dane posted the same patch against 2.3.12.

9. MM; Threading

30 Jul 1999 (1 post) Archive Link: "Re: active_mm"

Topics: Virtual Memory

People: David Mosberger-TangLinus TorvaldsDavid Mosberger

Linus Torvalds had a private conversation David Mosberger-Tang, and Cc'd a lengthy explanation to linux-kernel. David had asked, "Is there a brief description someplace on how "mm" vs. "active_mm" in the task_struct are supposed to be used?"

Linus replied:

Basically, the new setup is:

To support all that, the "struct mm_struct" now has two counters: a "mm_users" counter that is how many "real address space users" there are, and a "mm_count" counter that is the number of "lazy" users (ie anonymous users) plus one if there are any real users.

Usually there is at least one real user, but it could be that the real user exited on another CPU while a lazy user was still active, so you do actually get cases where you have a address space that is _only_ used by lazy users. That is often a short-lived state, because once that thread gets scheduled away in favour of a real thread, the "zombie" mm gets released because "mm_users" becomes zero.

Also, a new rule is that _nobody_ ever has "init_mm" as a real MM any more. "init_mm" should be considered just a "lazy context when no other context is available", and in fact it is mainly used just at bootup when no real VM has yet been created. So code that used to check

        if (current->mm == &init_mm)

should generally just do

        if (!current->mm)

instead (which makes more sense anyway - the test is basically one of "do we have a user context", and is generally done by the page fault handler and things like that).

Anyway, I put a pre-patch-2.3.13-1 on just a moment ago, because it slightly changes the interfaces to accomodate the alpha (who would have thought it, but the alpha actually ends up having one of the ugliest context switch codes - unlike the other architectures where the MM and register state is separate, the alpha PALcode joins the two, and you need to switch both together).

David also asked what it meant for threads to be "more soft", since he'd seen references to that on linux-kernel. He asked if it was related to how segment registers were being handled on the x86. To that, Linus replied:

It's more a reaction to a (temporary and already removed) naming issue. Ingo got rid of the 1:1 mapping of "hard" thread structures on the x86 side, so now we only have one TSS per CPU, and all the kernel thread structures are completely independent of the CPU-imposed thread structure on x86.

While Ingo was working on it, he called the kernel thread structure a "soft_thread_struct", while the intel TSS structure was then called a "hard_thread_struct". That was purely temporary and never saw the light of day: the real names are "thread_struct" for the kernel thread structure, and "tss_struct" for the intel TSS.

10. CLONE_PPID Support In LinuxThreads

30 Jul 1999 (1 post) Archive Link: "CLONE_PPID support for LinuxThreads"

People: David WraggTim Hockin

David Wragg announced, "I have modified LinuxThreads to support Tim Hockin's CLONE_PPID kernel patch. The aim is to make the cost of a pthread_create() approach the cost of a clone(). In existing LinuxThreads, pthread_create() is many times more expensive due to a context switch to the manager thread, which does the clone to actually create the thread, and another context switch back to the thread that called pthread_create(). With CLONE_PPID the thread calling pthread_create() can do the clone() directly. On my dual PPro-200 machine this reduces the cost of a pthread_create() from about 23000 cycles to about 11000 cycles." He gave URLs to his patch against glibc/linuxthreads-2.1.1 ( and Tim's kernel patch ( , as well as the prebuilt libraries ( . He explained, "Once you have rebuilt glibc with the patch, the easiest way to test the changes is to place the libc and libpthread binaries into a directory and set LD_LIBRARY_PATH so that your existing LinuxThreads program will use them."

11. Imminent Feature Freeze

3 Aug 1999 - 10 Aug 1999 (44 posts) Archive Link: "Re: no driver change for 2.4?"

Topics: Feature Freeze, Networking

People: Linus Torvalds

In the course of discussion, Linus Torvalds said:

Feature freeze in about two weeks is the current plan.

In short, people who think they have major requirements had better get their act together. That means that if ISDN people actually want to try to get into a real release one of these years, they don't have all that much time to futz around any more.

And it does mean that if it's not in working order already, it probably won't make it into 2.4.







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.