Kernel Traffic
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic

Kernel Traffic #187 For 6 Oct 2002

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 2924 posts in 15472K.

There were 648 different contributors. 367 posted more than once. 198 posted last week too.

The top posters of the week were:

1. New Module Code Preventing Module Unloading

17 Sep 2002 - 3 Oct 2002 (38 posts) Subject: "[PATCH] In-kernel module loader 1/7"

Topics: Hot-Plugging

People: Roman ZippelRusty RussellGreg KHAlan Cox


Rusty Russell announced that he had rewritten his in-kernel module loader code to be much less invasive than earlier versions. He said it still needed work, but was basically functional, and would not interfere with preemption or CPU hotplugging. However, Roman Zippel felt that Rusty's solution added a lot of complexity in order to solve a relatively simple problem. He suggested moving a large part of the code into user-space, and added:

I can only refer to my own patch again, which has most of the basic things needed to sanely get out of this mess:

  1. Allow module exit to fail. This gives modules far more control over module management and the generic module code can be simplified.
  2. The new module layout simplifies module loading, much more than relocating isn't necessary, but keeps backward compability as long as necessary. This means new modules can be loaded with old modutils and modules using the old interface can be kept working for a while.

Rusty argued that his own implementation was simple and beautiful, and make the kernel smaller than before. He said a user-space solution would be much more complex. Roman replied, "Compared to the complexity of the current insmod I can I agree. On the other hand with my module layout, I could load a module with ld and a few lines of shell script (only the system calls are a bit tricky)." Rusty said, "Do that on sparc64, x86_64 or ppc64 and I'll be really impressed. And of course, good luck fitting libbfd into busybox!"

Part of Rusty's solution was to simplify things by making modules impossible to unload, while Roman felt that all modules should be unloadable, and should include API to report on whether they were free to be unloaded at a given time. But Greg KH replied, "And with a LSM module, how can it answer that? There's no way, unless we count every time someone calls into our module. And if you do that, no one will even want to use your module, given the number of hooks, and the paths those hooks are on (the speed hit would be horrible.) I'm with Rusty, just don't let people unload modules, unless you are running a development kernel, and "obviously" know what you are doing." But Alan Cox replied, "So the LSM module always says no. Don't make other modules suffer."

Roman and Rusty went back and forth on the whole issue for awhile, but Rusty had to go get married, and that was that.

2. New State Tracing System For The Kernel, Similar To LTT

18 Sep 2002 - 27 Sep 2002 (5 posts) Subject: "Release of LKST 1.3"

Topics: Ottawa Linux Symposium, SMP

People: Yumiko SugitaKarim YaghmourRichard MooreRobert Schwebel

Yumiko Sugita announced:

I'd like to announce publication of Linux Kernel State Tracer (LKST) 1.3, which is a tracer for Linux kernel.

LKST's main purpose is debugging, fault analysis and performance analysis of enterprise systems. For the purpose, LKST has these features,

  1. It is possible to change dynamically which events are recorded.

    Users can obtain information about the events which they concern only interesting events.

    And it reduces the overhead of components which is not related with a fault.

  2. It is possible to change each function invoked by each events. A default function invoked by events is just recording occuring of the events.

    But, if it is necessary, this function can be changed to another function.

    And LKST supports installing the function by using a kernel module LKST also supports a maskset, which controls what kind of events should be recorded, can be changed dynamically. For example, LKST usually traces a few events for good performance, and when the kernel be in a particular status, LKST can change a maskset to get more detail information.

  3. It is possible to create new buffers and change to one of them. By changing to other buffer, Users can leave the information which they want.

LKST binaries, source code and documents are available in the following site, (now updating)

We prepared a mailing list written below in order to let users know update of LKST.

To subscribe, please refer following URL,

And if you have any comments, please send to the above list, or to another mailing list written below.

Robert Schwebel asked what the difference was between this project and the Linux Trace Toolkit. Yumiko replied:

Let me first explain the background of our development work.

We began development of the Linux Kernel State Tracer (LKST) in response to a domestic need to improve Reliability, Availability, and Serviceability (RAS) with respect to enterprise systems. The following requirements were applied to LKST:

As we had to achieve a short development time, we elected to develop LKST using our own methodology (based on know-how of tracer development that we carried out for other OS's) different from other known tools such as LTT.
# This is not to say that we developed all functions on our own.
#LKST at present connects with Kernel Hooks (GKHI) and LKCD.

Consequently, LKST, which is oriented to enterprise systems, has the following features different from those of LTT.
# These LKST features are also being enhanced at this time.

  1. Little overhead and good scalability when tracing on a large-scale SMP system
    • To make lock mechanism overhead as little as possible, we designed that the buffers are not shared among CPUs.
  2. Easy to extend/expand the function (User-based extendibility)
    • Without recompiling kernel, user can change/add/modify the kind of events and information to be recorded at anytime. For example, LKST usually traces very few events for the purpose of good performance. Once the kernel get into the particular status that user specified, LKST will trace and record more detail information.
  3. Preservation of trace information
    • Recovery of trace information collected at the time of a system crash in connection with LKCD.
    • Saving of specific event information during tracing. For example, switching to another buffer after the occurrence of a specific event enables the information on that event to be left in the previous buffer.
  4. Collection of even more kernel event information
    • Information on more than 50 kernel events can be collected for kernel debugging.

The demand for RAS functions in Linux should grow in the years to come. It is our hope that LKST becomes one means of implementing such functions.

Karim Yaghmour had some comments. To the locking issues of item 1, he said, "Clearly this is not a problem for LTT since we don't use any form of locking whatsoever anymore. IBM's work on the lockless scheme has solved this problem and their current work on the per-CPU buffering solves the rest of the issue." To the usability issues of item 2, he said that the same features were available with LTT. To item 3, he said:

Connection with LKCD is really not a problem, but this points to the main purpose of the tool, which in the case of LKCD is kernel debugging. LTT isn't aimed as a kernel debugger, so although LKCD is on our to-do list, it's certainly not our priority.

As for handling multiple output streams (which LKCD can be one of them), we already have very detailed plans on how LTT is going to integrate this (as I've mentioned a number of times before on this list). However, before we go down this road we need to make sure that the core tracing functionality is lightweight and fits the general requirements set for kernel code. Once this core lighweight functionality is there, we can build a rich and solid feature set around it.

To item 4, Karim agreed that this did differentiate the two project. He said:

this is where LTT and LKST cannot be compared. If LKST is a kernel debugging tool, as it has always been advertised, then any comparison of LKST should be made with the other tracing tools which are used for kernel debugging, such as the ones mentioned by Ingo and Andi earlier on this list.

LTT was built from the ground up to help users understand the dynamic behavior of the system. As such, it cannot be compared to any kernel debugging tool since it isn't one.

Finally, Karim remarked, "There was a RAS BoF at the OLS this year where tracing was intensively discussed. All the attendees agreed to unify their efforts around LTT. At this meeting, Richard Moore of IBM presented a tracing to-do list ( which we are using a basic check list for our ongoing work. Instead of implementing yet another tracing system, I think that the LKST team would benefit much from contributing to LTT, which has already a proven track record and has been adopted by the community as much as the industry."

Several days later, Yumiko replied, "After future, we'll join community actively. We'll use LTT and want to concern LTT, so we'll join the discussion of you and other LTT developers about Linux RAS. We hope to co-operate you and other developers about Linux RAS."

3. Native POSIX Thread Library 0.1 Achieves 100,000 Concurrent Threads

19 Sep 2002 - 30 Sep 2002 (127 posts) Subject: "[ANNOUNCE] Native POSIX Thread Library 0.1"

Topics: POSIX

People: Ulrich DrepperLinus TorvaldsIngo Molnar

Ulrich Drepper announced:

We are pleased to announce the first publically available source release of a new POSIX thread library for Linux. As part of the continuous effort to improve Linux's capabilities as a client, server, and computing platform Red Hat sponsored the development of this completely new implementation of a POSIX thread library, called Native POSIX Thread Library, NPTL.

Unless major flaws in the design are found this code is intended to become the standard POSIX thread library on Linux system and it will be included in the GNU C library distribution.

The work visible here is the result of close collaboration of kernel and runtime developers. The collaboration proceeded by developing the kernel changes while writing the appropriate parts of the thread library. Whenever something couldn't be implemented optimally some interface was changed to eliminate the issue. The result is this thread library which is, unlike previous attempts, a very thin layer on top of the kernel. This helps to achieve a maximum of performance for a minimal price.

A white paper (still in its draft stage, though) describing the design is available at

It provides a larger number of details on the design and insight into the design process. At this point we want to repeat only a few important points:

The thread library is designed to be binary compatible with the old LinuxThreads implementation. This compatibility obviously has some limitations. In places where the LinuxThreads implementation diverged from the POSIX standard incompatibilities exist. Users of the old library have been warned from day one that this day will come and code which added work-arounds for the POSIX non-compliance better be prepared to remove that code. The visible changes of the library include:

The sources for the new library are for the time being available at

The current sources contain support only for IA-32 but this will change very quickly. The thread library is built as part of glibc so the complete set of glibc sources is available as well. The current snapshot for glibc 2.3 (or glibc 2.3 when released) is necessary. You can find it at

Final releases will be available on and its mirrors.

Building glibc with the new thread library is demanding on the compilation environment.

Once all these prerequisites are met compiling glibc should be easy. But there are some tests which will flunk. For good reasons we aren't officially releasing the code yet. The bugs are either in the TLS code which is not enabled in the standard glibc build, or obviously in the thread library itself. To run the tests for the thread library run

make subdirs=linuxthreads2 check

One word on the name 'linuxthreads2' of the directory. This is only a convenience thing so that the glibc configure scripts don't complain about missing thread support. It will we changed to reflect the real name of the library ASAP.

What can you expect?

This is a very early version of the code so the obvious answer is: some problems. The test suite for the new thread code should pass but beside that and some performance measurement tool we haven't run much code. Ideally we would get people to write many more of these small test programs which are included in the sources. Compiling big programs would mean not being able to locate problems easy. But I certainly won't object to people running and debugging bigger applications. Please report successes and failures to the mailing list.

People who are interested in contributing must be aware that for any non-trivial change we need an assignment of the code to the FSF. The process is unfortunately necessary in today's world.

People who are contaminated by having worked on proprietary thread library implementation should not participate in discussions on the mailing list unless they willfully disclose the information. Every bit of information is publically available from the mailing list archive.

Which brings us to the final point: the mailing list for *all* discussions related to this thread library implementation is

Go to

to subscribe, unsubscribe, or review the archive.

There was some confusion over whether the new code really did manage to achieve 100,000 concurrent threads. People couldn't believe their eyes. Even Linus Torvalds said at one point:

You didn't read the post carefully.

They started and waited for 100,000 threads.

They did not have them all running at the same time. I think the original post said something like "up to 50 at a time".

Basically, the benchmark was how _fast_ thread creation is, not now many you can run at the same time. 100k threads at once is crazy, but you can do it now on 64-bit architectures if you really want to.

But no, as Ingo Molnar corrected, the code really did manage to get 100,000 threads running all at once. He explained:

on the dual-P4 testbox i have started and stopped 100,000 *parallel* threads in less than 2 seconds. Ie. starting up 100,000 threads without any throttling, waiting for all of them to start up, then killing them all. It needs roughly 1 GB of RAM to do this test on the default x86 kernel, it need roughly 500 MB of RAM to do this test with the IRQ-stacks patch applied.

with 2.5.31 this test would have taken roughly 15 minutes, on the same box, provided the NMI watchdog is turned off.

with 100,000 threads started up and idling silently the system is completely usable - all the critical for_each_task loops have been fixed.

4. Linux 2.5.38 Released; Bug Prevents Compilation

21 Sep 2002 - 25 Sep 2002 (32 posts) Subject: "Linux 2.5.38"

Topics: FS: JFS, PCI, Power Management: ACPI

People: Linus Torvalds

Linus Torvalds announced 2.5.38 and said:

Trying to be a bit more timely about releases, especially since some people couldn't use 2.5.37 due to the X lockup that should hopefully be fixed (no idea _why_ that old bug only started to matter recently, the bug itself was several months old).

ia64 updates, a vm86 mode bug that bit XFree86 startup (and must have bitten dosemu too, but maybe people aren't using DOS much any more), PCI driver attach fixes, JFS, ACPI, net drivers etc.

Various folks posted small problems with this version, and one small-but-big problem that broke compilation on all platforms. Various fixes surfaced after a day or two.

5. Adeos Nanokernel Updated

22 Sep 2002 - 25 Sep 2002 (10 posts) Subject: "[PATCH] Adeos nanokernel for 2.5.38 1/2: no-arch code"

Topics: Microkernels: Adeos, Real-Time: RTAI, Real-Time: RTLinux, SMP, Virtual Memory

People: Pavel MachekKarim YaghmourJacob Gorm Hansen

Karim Yaghmour posted a patch to add the Adeos nanokernel to the Linux source tree. He referred to an earlier post for explanation. Pavel Machek asked, "Maybe adding Docs/adeos.txt is good idea... (sorry can't access web right now) -- so this is aimed at being free rtlinux replacement?" Karim promised to get Docs/adeos.txt going, and added:

I'm not sure "replacement" is the appropriate description for this. The scheme used by rtlinux and rtai is a master-slave scheme where Linux is a slave to the rt executive. Adeos makes the entire scheme obsolete by making all the OSes running on the same hardware clients of the same nanokernel, regardless of whether the client OSes provide hard RT or not. None of these OSes need to have a "other OS" task, as rtlinux and rtai clearly do. Rather, when an OS is done using the machine, it tells Adeos that it's done and Adeos returns control to whichever other OS is next in the interrupt pipeline.

To be honest, nothing in Adeos is "new". Adeos is implemented on classic early '90s nanokernel research. I've listed a number of nanokernel papers in the paper I wrote on Adeos. A complete list of nanokernel papers would probably have hundreds of entries. Some of these nanokernels even had OS schedulers (exokernel for instance). All Adeos implements is a scheme for sharing the interrupts among the various OSes using an interrupt pipeline.

Jacob Gorm Hansen asked, "are you planning to add spaces & portals, like in Space or Pebble?" And Karim explained:

I'm not sure whether what we plan to offer actually fits Space's definition of spaces, but domains already exist and portals should be trivial to implement over what we already have. For details on what plan to offer in terms of spaces, take a look at the paper I wrote describing how to implement Linux SMP clusters:

Basically, Adeos would hand over RAM regions according to each OS instance's requests. In such a case, each kernel would have its own virtual memory and communication would be possible using "bridges", shared physical RAM regions. Many OSes can coexist in the same virtual address space, but the mechanisms for managing the virtual address space are not up to Adeos.

Close by, Pavel continued his inquiry, asking, "Are you actually able to use Adeos for something reasonable? You can't run two copies of linux, because they would fight over memory; right? Do you have something that can run alongside linux?" As far as running two concurrent invocations of Linux, Karim agreed that currently this was impossible. But he added, "I've already detailed how to do this in a paper I wrote last july on how to obtain Linux SMP clusters with as few modifications to the kernel as possible." He gave the same link as above. To the question of whether there was anything that could run alongside Linux, he went on:

Certainly. According to some reports it's already used in some commercial systems and, as today's RTAI announcement reads, it will be the basis for the next release of RTAI.

What we need now is ports to other architectures than the i386. This should be fairly simple for anyone familiar enough with the Linux interrupt layer for any other arch.

Pavel remarked, "Good luck pushing it through linus."

6. Status Of 2.5 IDE

23 Sep 2002 - 27 Sep 2002 (7 posts) Subject: "2.5.38: modular IDE broken"

Topics: Disks: IDE

People: Alan CoxBob Tracy

Bob Tracy reported some breakage with IDE when compiled as a module, and in the course of discussion Alan Cox replied, "Let me give a simple clear explanation here. I don't give a flying ***k about modular IDE until the IDE works. Cleaning up the modular IDE after it all works is relatively easy and gets easier the more IDE is cleaned up. Until then its not even on the radar unless someone else wants to do all the work for 2.4/2.5 and verify/test them." Bob said, "Understood. My position is simply that I noted something broken, and I reported it during the development cycle. Would you prefer that I had waited until after 2.5.X became 2.6?" Alan agreed that it was better to report such things than not, but added, "its just I've had lots of equally helpful reports."

7. AccessFS 0.5 Ported To Linux Security Module

24 Sep 2002 - 30 Sep 2002 (9 posts) Subject: "[PATCH] accessfs v0.5 ported to LSM - 1/2"

Topics: FS: accessfs, Networking

People: Olaf DietscheGreg KH

Olaf Dietsche announced:

Accessfs is a new file system to control access to system resources. For further information see the help text.


This part (1/2) adds a hook to LSM to enable control based on the port number.

The patch is attached below. It is also available at: <>

It applies to 2.5.3[5-8] as well.

I did minimal testing using uml 0.58-2.5.34.

Greg KH was very happy with this, and suggested providing a patch against the current Linux Security Module code snapshot at, in which case Greg said he'd be happy to accept the patch into LSM. Olaf was happy to oblige, and soon posted an updated patch; and some folks discussed the implementation.

8. MMU-Less Patches Updated For 2.5.38

24 Sep 2002 - 26 Sep 2002 (12 posts) Subject: "[PATCH]: 2.5.38uc1 (MMU-less support)"

Topics: Framebuffer, Networking

People: Greg Ungerer

Greg Ungerer announced:

A new iteration of the uClinux MMU-less support patches. The all-in-one patch is at:

And new this time around I have broken this up into a number of smaller self-contained patches. Each is a nice logical unit (like a driver, or framebuffer, etc). This should greatly simplify any merging into the mainline code :-)

So the here they are:

. Motorola 5272 ethernet driverM

. Motorola 68328 and ColdFire serial drivers

. MTD driver patches for uClinux supported platforms

. Motorola 68328 framebuffer

. uClinux FLAT file format exe loader

. MMU-less support

. Motorola embedded m68k/ColdFire architecture support
(support for 68328, 68360, 5206, 5206e, 5249, 5272, 5307, 5407)

9. USAGI Project Bringing IPv6 Closer To Spec

24 Sep 2002 - 26 Sep 2002 (2 posts) Subject: "[PATCH] IPv6: Don't Process ND Messages with Invalid Options"

Topics: Networking

People: YOSHIFUJI HideakiDavid S. Miller

YOSHIFUJI Hideaki posted a patch and explained:

My name is YOSHIFUJI Hideaki. I'm from USAGI Project. Our project is trying to improve IPv6 implementation in Linux, and we'd like to continue contributing our efforts. Please see <> for further information.


Linux happened to process invalid ND messages with invalid options such as

Specification says that such messages must be silently discarded. This patch parses/checks ND options before it changes state of neighbour / address etc. and ignores such messages.

Following patch is against linux-2.4.19.

David S. Miller applied the patch, thanked YOSHIFUJI, and remarked, "Let us hope more patches like this one are coming :-)"

10. kksymoops Update For 2.5

25 Sep 2002 - 27 Sep 2002 (23 posts) Subject: "[ANNOUNCE] [patch] kksymoops, in-kernel symbolic oopser, 2.5.38-B0"

People: Ingo MolnarJ.A. MagallonLinus Torvalds

Ingo Molnar announced:

the attached patch is the latest version of 'kksymoops' for the 2.5 kernel. Kksymoops is an in-kernel symbol resolver, which enables nifty things like:

He went on:

i believe it's all for the better, much of the above featureset is also based on distributors' daily experience of how users report crashes and how it can be made sense of post-mortem. Tester feedback is often a scarce resource for distributors, so improving the quality of individual reports is of high importance. Even here on lkml the quality of oops reporting is often surprisingly low, especially taking the many years of education into account.

the cost of the feature is an in-kernel copy of the symbol table - most testers will not care, and it's default-disabled in the .config. This patch has proven to be very useful in my daily kernel development activities, hopefully others will find this just as useful.

I've tested the patch on x86, building and oopsing works both with kksymoops enabled and disabled.

The line of credit for kksymoops goes like this: Arjan took Keith's original kallsyms work and extended it to the area of kernel oopsing and stack trace printing - this was the 2.4 kksymoops patch. Which i ported to 2.5 and added some minor fixes, which Kai improved significantly - essencially Kai rewrote much of the original patch - it's now a nice patch that fits into the 2.5 build system properly.

Linus Torvalds had some cosmetic objections to do with output formatting, and folks went back and forth on that for awhile; and J.A. Magallon suggested the patch would be great to see in 2.4 as well.

11. Linux 2.4.20-pre8 Released; Announcement Policy

25 Sep 2002 - 27 Sep 2002 (4 posts) Subject: "Linux 2.4.20-pre8"

People: Marcelo TosattiAxel SiebenwirthLinus Torvalds

Marcelo Tosatti announced 2.4.20-pre8 but gave only the ChangeLog, no summary. Axel Siebenwirth asked if Marcelo could give a brief summary of changes as Linus Torvalds did in his releases; and Christer Nilsson agreed. Marcelo thanked them for the feedback and replied, "Indeed I should stop being lazy on that. Will remind myself next time I release a kernel."

12. Benchmark Results For Recent 2.4 Kernels

26 Sep 2002 (1 post) Subject: "[BENCHMARK] 2.4.20-pre8 contest results"

People: Con Kolivas

Con Kolivas reported:

Here are the contest ( results for 2.4.20-pre8 compared to previous kernels.

Kernel                  Time            CPU             Ratio
2.4.19                  70.42           99%             1.00
2.4.20-pre7             70.37           99%             1.00
2.4.20-pre8             70.47           99%             1.00

Kernel                  Time            CPU             Ratio
2.4.19                  85.21           80%             1.21
2.4.20-pre7             86.02           80%             1.22
2.4.20-pre8             86.65           80%             1.23

Kernel                  Time            CPU             Ratio
2.4.19                  165.56          45%             2.35
2.4.20-pre7             195.32          38%             2.77
2.4.20-pre8             167.14          45%             2.37

Kernel                  Time            CPU             Ratio
2.4.19                  100.70          76%             1.43
2.4.20-pre7             102.83          75%             1.46
2.4.20-pre8             102.14          76%             1.45

There's no significant change since 2.4.19 which is what we'd expect.

13. IPv6 Refinement

27 Sep 2002 (7 posts) Subject: "[PATCH] IPv6: Refine IPv6 Address Validation Timer"

Topics: Networking

People: YOSHIFUJI HideakiDavid S. MillerAlexey Kuznetsov

YOSHIFUJI Hideaki posted a patch against 2.4.19 and explained, "Current IPv6 address validation timer is rough and timing of address validation is not precise. This patch refines timing of address validation timer." David S. Miller and Alexey Kuznetsov had some slight problems with the implementation, and YOSHIFUJI posted corrections. Eventually David said, "I've applied the patch with the time_after() debugging check removed to both 2.4.x and 2.5.x"

14. procps 2.0.8 Released

27 Sep 2002 - 28 Sep 2002 (5 posts) Subject: "[ANNOUNCE] procps 2.0.8"

Topics: Clustering: Beowulf, Real-Time

People: Rik van RielRobert LoveAndrew Morton

Rik van Riel announced:

Procps is the package containing various system monitoring tools, like ps, top, vmstat, free, kill, sysctl, uptime and more. After a long period of inactivity procps maintenance is active again and suggestions, bugreports and patches are always welcome on the procps list.

The plan is to release a procps 2.1.0 around the time the 2.6.0 kernel comes out, with maybe one extra intermediary release between now and then. Various features and code cleanups are planned, the /proc changes in 2.5 are also sure to keep the procps maintainers busy...

You can download procps 2.0.8 from:

If you have feedback (or patches) for the procps team, feel free to mail us at:

NEWS for version 2.0.8 of procps

He replied to himself a few hours later with a warning about a trivial bug that had crept into the code. The VERSION string had been omitted, so anyone using the -V option would not see the current version number. He said he'd release 2.0.9 shortly.

15. JFS 1.0.23 Released

27 Sep 2002 (1 post) Subject: "[ANNOUNCE] Journaled File System (JFS) release 1.0.23"

Topics: FS: JFS, FS: NFS

People: Steve Best

Steve Best announced:

Release 1.0.23 of JFS was made available today.

Drop 61 on September 27, 2002 (jfs-2.4-1.0.23.tar.gz and jfsutils-1.0.23.tar.gz) includes fixes to the file system and utilities.

Utilities changes

File System changes

For more details about JFS, please see the patch instructions or changelog.jfs files.

16. Linux v2.5.39 Released

27 Sep 2002 - 28 Sep 2002 (5 posts) Subject: "Linux v2.5.39"

Topics: FS: JFS, FS: XFS, SMP, USB

People: Linus TorvaldsJens Axboe

Linus Torvalds announced Linux v2.5.39, saying:

Changes all over the map.

The most noticeable one may well be the new and much improved elevator by Jens Axboe, this one makes a big difference at least to me.

And Andrew found a nasty SMP deadlock on the tasklist lock.

And Ingo's been busy again, fixing some more threading issues he found (including the much-talked-about futex thing).

Other stuff all over the map: USB, JFS, XFS, networking, debugging etc.

17. Cleanups For /dev/random

27 Sep 2002 (8 posts) Subject: "[PATCH 0/7] /dev/random cleanup"

Topics: Random Number Generation

People: Oliver Xymoron

Oliver Xymoron posted several patches, and explained:

The following patches against 2.5.39 clean up the RNG support substantially. Please pay special attention to the first patch, which fixes two major bugs in the reseeding logic. They can be easily demonstrated by running 'cat /dev/random | hexdump' on a quiescent system. When it blocks, lightly tapping the mouse generates a large stream of additional output, despite very little entropy being added.

The second and third patches introduce my fixes for the more theoretical issues and should address all the issues that have been raised.

The fourth and fifth make the pool and reseeding logic much more clear and create a new pool for /dev/urandom that avoids starving /dev/random readers.

Six and seven propagate the new API to the rest of the kernel and remove dead code.

In subsequent posts containing the individual patches, he explained each more thoroughly. For patch 1:

This fixes a bug where entropy transfer takes more from the primary pool than is there and credits the secondary with 1000 extra bits.

This also makes this code properly handle catastrophic reseeding by raising the wakeup threshold from 8 to 64.

You can test for both of these bugs by doing 'cat /dev/random | hexdump' and observing that the slightest tap of the mouse generates a large stream of output.

Consider the situation where the state of both pools is compromised and is known at time T1. If 8 bits of entropy appear in the primary pool, unblocking random_read, this function would transfer most of the primary pool to the secondary, then give a byte of data to the user at time T2. Given that byte and the known state at T1, the user can test the possible 256 input bits to the primary pool, generate the 256 possible outputs from the secondary, and reduce the possible known states at time T2 to a handful. This is dependent solely on the wakeup threshold and not on the transfer size. Raising the wakeup threshold to 64 means calculating 2^64 possible pool states, making state extension unreasonably hard.

The second clause of the xfer function was intended to handle this catastrophic reseeding, but given the weakness in the first clause, it added nothing.

For patch 2:

This makes irq and blkdev interrupts untrusted and allows adding a bit of entropy for a configurable percentage of untrusted samples, controlled by a new sysctl. This defaults to 0 for safety, but can be used on headless machines without a hardware RNG to continue to use /dev/random with some confidence.

This also smartens up and simplifies the batch entropy pool to allow unlimited amounts of untrusted mixing without blocking out trusted samples.

For patch 3:

This adds improved entropy estimation based on source timing granularity and a new API for registering entropy sources.

This also detects potential polling or back-to-back interrupt attacks that could be used to observe or force event timing. If a context switch doesn't occur between events, one of these two attacks might be occurring. We can rule out a polling attack by checking if the CPU is sleeping and we can rule out an interrupt flood if jiffies has changed since the last event.

This removes the improperly named "ln" function and replaces it with a call to the potentially arch-optimized fls. This also adjusts the entropy count appropriately taking into consideration the expected entropy in sections of a scale-invariant distribution (see "Benford's Law"). Thanks to Arend Bayer for additional help with this analysis.

For patch 4:

For patch 5:

Stop /dev/urandom readers from starving /dev/random for entropy by creating a separate pool and not reseeding if doing so would prevent /dev/random from reseeding.

This factors pool reseeding out of normal entropy transfer. This allows different pools to have different policy on how to reseed.

This patch also makes random_read actually use the entropy count in the secondary pool rather than tracking off the primary.

For patch 6: "This removes the old API and updates users to the new one. This also allows different input devices of the same class (eg mice) to have their entropy state tracked independently and removes hardwired source classes from the core."

For patch 7: "Remove long-unused MD5 code, unrolled SHA implementations, Linux 2.2 compatibility, and an unused structure."

18. Linux Kernel conf 0.7.1 Released

28 Sep 2002 (6 posts) Subject: "linux kernel conf 0.7"

People: Roman ZippelSam RavnborgJeff GarzikKai Germaschewski

Roman Zippel announced:

At you can find the latest version of the new config system. Besides the usual archive there is also now a patch against a 2.5.39 kernel and finally some documentation. This patch I also consider as my first release canditate, so please test this one carefully, this release contains pretty much everything I want from the first release to be integrated into the kernel.

Other changes:

An issue (which was also mentioned by Jeff Garzik) is the help text format. Jeff likes to have an endhelp, where I think it's redundant. The parser currently checks the amount of indendation to find the end of the help text, this makes the help text quite easy to read and parse. If someone prefers an endhelp (or has an even better idea), please speak up now, if enough people complain, I have no problem changing it.

After a report of a trivial bug, Roman put out 0.7.1, and after some testing Sam Ravnborg said it looked pretty good; and offered some implementation suggestions and feature requests.

19. RivaTV 0.8.1 Released

28 Sep 2002 (1 post) Subject: "[ANNOUNCEMENT] RivaTV 0.8.1"

People: Yuri van Oers

Yuri van Oers announced (and later gave a URL):

RivaTV version 0.8.1 has been released.

The RivaTV project is trying to produce Linux drivers for graphics boards with nVidia chips that have a video-in feature.

Changes in this release:

This release includes support for GeForce 4, a lot of new cards and several bugfixes. Also, tuner support has been improved and RivaTV now comes with the relevant BTTV modules to get your tuner going - simply and easily. Finally, the installation process tries to detect pitfalls preventing the use of RivaTV on your machine.

20. oprofile For 2.5.39 Released

28 Sep 2002 (9 posts) Subject: "[PATCH][RFC] oprofile for 2.5.39"

People: John Levon

John Levon announced:

Here is a new version of oprofile against 2.5.39. Thanks Andi, Christoph, and Alan for your comments. I think I should have fixed the things you mentioned.

As before, the full patch is available here : [100k]

with usage notes :

and more readable broken-out patches (but not applyable) :

Changes from last time :

21. EVMS 1.2.0 Released

30 Sep 2002 - 2 Oct 2002 (7 posts) Subject: "[ANNOUNCE] EVMS Release 1.2.0"

Topics: Disk Arrays: EVMS, FS: JFS, FS: XFS

People: Kevin CorryNeil Brown

Kevin Corry announced:

The EVMS team is announcing the next stable release of the Enterprise Volume Management System, which will eventually become EVMS 2.0. Package 1.2.0 is now available for download at the project web site:

EVMS 1.2.0 has full support for the 2.4 kernel, and includes patches for most kernels up to 2.4.19. It also has nearly full support for the 2.5 kernel (only the OS/2 and S/390 plugins have not been ported yet), and includes a patch for kernels 2.5.38 and 2.5.39.

Please send any questions, problem reports or bugs to the EVMS mailing list:

v1.2.0 - 9/30/02
Engine Core
 - Enable limited rediscovery
   - Only issue rediscover commands on disks affected by current changes,
     instead of every disk in the system.
 - Clean up stop-data that is no longer needed.
 - Improve plug-in validation.
 - No longer include kernel header files. Copy appropriate definitions,
   structures, code, etc. to user-space header files.
 - More keyboard accelerator keys for most windows.
 - Allow selecting multiple objects to remove or destroy.
 - Allow expanding containers through context popup menu.
 - Allow creating regions and segments from freespace objects through
   context popup menu.
 - Useability enhancements and terminology sync-up with other UIs.
Text-Mode UI (ncurses)
 - Support for commit status and progress indicators.
 - Add convert-to-compatibility-volume action to Volumes view.
 - Display an error on the status line if setting an option value failed.
 - Bug fixes
   - Segfault when attempting to select an item from an empty selection list.
   - Pressing "Enter" in an option panel when required options have no values.
   - Scrolling in available objects list.
   - Having to press spacebar twice when editing a string field.
Command Line
 - Parameter substitution
   - Commands can access parameters passed into the CLI when it was started.
XFS Filesystem Interface Module
 - New FSIM with mkfs, fsck, external log, and online expand support.
 - Online expand support. Requires JFS 1.0.21.
AIX Plugin
 - Create, delete and expand AIX containers.
 - Create, delete and expand AIX regions.
 - Correctly write COW table sectors on S/390.
MD Plugin
 - 2.5 kernel plugin has been rewritten based on Neil Brown's 2.5 MD code.

22. Linux 2.4.20-pre9 Released

3 Oct 2002 (1 post) Subject: "Linux 2.4.20-pre9"

Topics: FS: JFS, FS: ext3, USB

People: Marcelo Tosatti

Marcelo Tosatti announced 2.4.20-pre9, and said:

The Athlon problems introduced in pre3 should be gone now.

pre9 has some JFS/ext3 fixes, USB fixes and several network drivers fixes.

There are still some pending issues to be solved for 2.4.20 which I hope get worked out on the next -pre's...







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.