Kernel Traffic
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic

Kernel Traffic #256 For 2 Apr 2004

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1601 posts in 7819K.

There were 508 different contributors. 241 posted more than once. 224 posted last week too.

The top posters of the week were:

1. Status Of KGDB Support In 2.6

4 Feb 2004 - 4 Mar 2004 (52 posts) Archive Link: "kgdb support in vanilla 2.6.2"

Topics: Version Control

People: Pavel MachekTom RiniPaul MundtAndrew MortonLinus Torvalds

Pavel Machek noticed that there was some kgdb support in Linus Torvalds' kernel tree, at least for certain architectures. He asked, "That's great, can we get i386 kgdb, too? Or at least amd64 kgdb ;-). [Or was it a mistake? It seems unlikely that kgdb could enter Linus tree without major flamewar...]" Tom Rini replied, "there has been PPC32 KGDB support in for ages. OTOH, I'm quite happy that SH kgdb support came in (mental note made to talk to Henry about the KGDB merging stuffs)." Paul Mundt said:

The SH kgdb work is a combination of effort by Henry Bell and Jeremy Siegel, (ST and MV both had their own versions, Jeremy did the sync work between the two) neither of which have touched it since mid 2.4 or so when it was first merged into the LinuxSH tree.

Getting the SH kgdb stuff updated is on my TODO list, I'd definitely be interested in getting this stuff in sync with Amit's work as well. Any pointers?

Tom replied,
What Amit has is at What I've done on top of this is at bk:// and

Andrew Morton replied to Pavel's initial post, saying:

Lots of architectures have had in-kernel kgdb support for a long time. Just none of the three which I use :(

I wouldn't support inclusion of i386 kgdb until it has had a lot of cleanup, possible de-featuritisification and some thought has been applied to splitting it into arch and generic bits. It's quite a lot of work.

A bunch of folks started talking about what work still remained to do before kgdb could be solidly supported in the official kernel. There were also indications that some companies might be willing to pay to have the work done. At one point Andrew remarked, "there's a lot of interest in this and I of course am fully supportive."

2. Proposal: A Layered Kernel

24 Feb 2004 - 9 Mar 2004 (47 posts) Archive Link: "A Layered Kernel: Proposal"

Topics: BSD, Backward Compatibility, Disks: IDE, Executable File Format, FS: sysfs, Microkernels, Networking, Sound

People: Grigor GatchevRik van RielTimothy MillerMike FedykAlexander Viro

Grigor Gatchev said:

A Layered Kernel


The idea is to structure logically the kernel into two (or more) layers, with explicitly defined interfaces between them.

Not a Microkernel

Some of my friends decided that this is actually a (partial) microkernel idea. Not so:

A microkernel may be (and often is) logically single-layered. A many-layered kernel may be compiled as a big kernel.


Both models have advantages. Single layer provides for better integration, and apparently saving some work. Multi-layer provides for better security, abstraction, code, and eventually quality.

Traditionally, Unixes use a layered approach - its assets have proven to give more than the single-block. So, a layered kernel may be more natural to a Unix-rooted system, and may give better results.


Currently, it seems reasonable to form two layers: Resources and Personality.

The Resources layer is the natural place for the low-level stuff - device drivers, basic sheduler, basic memory management and protection, etc.

The Personality layer is the place for the Linux API and ABI, etc.

Some parts, eg. filesystems, TCP/IP stack etc, may bear a discussion over their exact place. If need, an intermediate layer may be defined.

Each layer draws resources from the lower one, adds functionality on its top, and possibly changes or hides some of its functionalities, class-like. It runs the upper layer in encapsulation, eg. protected mode, and works on top of the lower layer (Resources - over the physical hardware). Possibly a lower layer can run simultaneously two or more upper layers, in a multi-tasking way.

If a layer is a separate file, can be called like a program. If part of a "big kernel", can be called directly, or through exported hooks. Other models are possible, too.

A layer may be emulated by a layer interface emulator. For example, you may run a Resources emulator as a program, and a standard Personality over it, achieving "kernel over kernel". Configuring the emulator, you pass to the child kernel, and its users and software, a virtual machine of your choice.


Improved source: A well defined inter-layer interface separates logically the kernel source into more easily manageable parts. It makes the testing easier. A simple and logical lower layer interface makes learning the base and writing the code easier; a simple and logical upper layer interface enforces and helps clarity in design and implementation. This may attract more developers, ease the work of the current ones, and increase the kernel quality while decreasing the writing efforts. The earlier this happens, the better for us all.

Anti-malware protection: Sources of potentially dangerous content can be filtered between the kernel layers by hooked or compiled-in modules. As with most other advantages, this is achievable in a non-layered kernel too, but is more natural in a layered one. Also, propagation of malware between layers is mode difficult.

Security: A layer, eg. Personality, if properly written, is eventually a sandbox. Most exploits that would otherwise allow a user to gain superuser access, will give them control over only this layer, not over the entire machine. More layers will have to be broken.

Sandboxing: A layer interface emulator of a lower, eg. Resources layer can pass a configurable "virtual machine" to an upper, eg. Personality layer. You may run a user or software inside it, passing them any resources, real or emulated, or any part of your resources. All advantages of a sandbox apply.

User nesting: The traditional Unix user management model has two levels: superuser (root) and subusers (ordinary users). Subusers cannot create and administrate their subusers, install system-level resources, etc. Running, however, a subuser in their own virtual machine and Personality layer as its root, will allow tree-like management of users and resources usage/access. (Imagine a much enhanced chroot.)

Platforming: It is much easier to write only a Personality layer than an entire kernel, esp. if you have a layer interface open standard as a base. Respectively, it's easier to write only a Resources layer, adding a new hardware to the "Supported by Linux" list. This will help increasing supported both hardware and platforms. Also, thus you may run any platform on any hardware, or many platforms concurrently on the same hardware.

Heterogeneous distributed resources usage: Under this security model, networks of possibly different hardware may share and redistribute resources, giving to the users resource pools. Pools may be dynamical, thus redistributing the resources flexibly. This mechanism is potentially very powerful, and is inherently consistent with the open source spirit of cooperativity and freedom.


More work to start: Initially the model change will take more effort. (Most will be spent on clarifying and improving the code; most of the layered model requirements are actually code quality requirements. Without the new model, the bad design/code correction can be left for a later stage.)

Performance: Badly designed and/or implemented inter-layer interfaces may slow down the kernel speed, decreasing the system performance.

Compatibility: Even if the new model is 100% compatible with old upper-level software, in practice it will surface better the advantages of some newer stuff, eg. sysfs, and will increase the stress on them, thus obsoleting sooner some old software. With the old model, the change may be delayed further.

Other issues:

Need for it: Without any consideration for a layered model, the kernel source is already structured in a way convenient for going with it; in a sense, most of the work is already done. Which shows that the need for a layered model is inherent to the kernel, even if not explicitly defined and noticed. This way we just follow the system logic.

Authority: If not approved by the top developers, the restructuring may fork the kernel development. As a result, the layered model may not be able to gain sufficient resources to survive. The standard model will probably survive, but may also suffer a loss of developers.

The right moment: The best moment for starting the change is when a new kernel is declared stable version, and the tree for the next development version is to be formed. This doesn't happen often, and the change will be more difficult in another time. (That is why this proposal is made now.)

Why all this?

Like the mutations, radical ideas are most often bad. Banning them as a principle, however, may eventually doom Linux to a dinosaur fate. (And we tend to do it - lately big improvements come to Linux only from outside. The ELF format, sysfs, NUMA support... However, BSD, Solaris and IRIX seem to fade; soon we will have only Windows to rely on for new kernel-level things.)

The advantages of the idea above seems to overweight the disadvantages; even its disadvantages have strong positive traits. Is it possible that a discussion may prove it good?

Carlos Silva thought this would be a great thing for 2.7; and Rik van Riel also said:

Sounds like a reasonable description of how Linux already works, with the exception of badly written code that has layering violations.

I'm all for cleaning up the badly written code so it fits in better with the rest of the kernel ;)

Grigor replied, "Unhappily, cleaning up would not be enough. A separation of the kernel layers, to the extent that one may be able to use them independently, and to plug modules between them (having the appropriate access) may be better." And Rik replied:

Some parts of the kernel (eg. the VFS or the device driver layers) can already do that, while others still have layering violations.

I suspect that the least destabilising way of moving to a more modular model would be to gradually clean up the layering violations in the rest of the code, until things are modular.

Yes, I know it's a lot of work ...

At this point the discussion skewed into a debate over whether elegant code should win out over practical code, in other words whether any sort of overarching structure (like layering) coule possibly take account of all real-world needs. But at some point Grigor added to his previous proposal:

Here it is. Get your axe ready! :-)


Driver Model Types: A Short Description.

(Note: This is NOT a complete description of a layer, according to the kernel layered model I dared to offer. It concerns only the hardware drivers in a kernel.)

Direct binding models:

In these models, kernel layers that use drivers bind to their functions more or less directly. (The degree of directness and the specific methods depend much on the specific implementation.) This is as opposed to the indirect binding models, where driver is expected to provide first a description what it can do, and binding is done after that, depending on the description provided.

Chaotic Model

This is not a specific model, but rather a lack of any model that is designed in advance. Self-made OS most often start with it, and add a model on a later stage, when more drivers appear.


The model itself requires no design efforts at all.

No fixed sets of functions to conform to. Every coder is free to implement whatever they like.

Unlimited upgradeability - a new super-hardware driver is not bound by a lower common denominator.

Gives theoretically the best performance possible, as no driver is bound to conform to anything but to the specific hadrware abilities.


Upper layers can rely on nothing with it. As more than one driver for similar devices (eg. sound cards) adds, upper layers must check the present drivers for every single function - which is actually implementing an in-built driver model. (Where its place is not, and therefore in a rather clumsy way.)


Good for homebrewn OS alikes, and for specific hardware that is not subject to differencies, eg. some mainframe that may have only one type of NIC, VDC etc. Otherwise, practically unusable - the lack of driver systematics severely limits the kernel internal flexibility. Often upgraded with functions that identify for each driver what it is capable of, or requiring some (typically low) common denominator.

Common Denominator Model

With it, hardware drivers are separated in groups - eg. NIC drivers, sound drivers, IDE drivers. Within a group, all drivers export the same set of functions.

This set sometimes covers only the minimal set of functionalities, shared by all hardware in the group - in this case it acts as a smallest common denominator. Other possibility is a largest common denominator - to include functions for all functionalities possible for the group, and if the specific hardware doesn't support them directly, to either emulate them, or to signal for an invalid function. Intermediate denominator levels are possible, too.

The larger the common denominator, and the less emulation ("bad function" signal instead), the closer the model goes to the chaotic model.


It requires little model design (esp. the smallest common denominator types), and as little driver design as possible. (You may create an excellent design, of course, but you are not required to.) You can often re-use most of the design of the other drivers in the group.

It practically doesn't require a plan, or coordination. The coder just tries either to give the functionality that is logical (if this is the first driver in a new group), or tries to give the same functionality that the other drivers in the group give.

Coupling the driver to the upper levels that use it is very simple and easy. You practically don't need to check what driver actually is down there. You know what it can offer, no matter the hardware, and don't need to check what the denominator level ac ually is, unlike the chaotic model.

It encapsulates well the hardware groups, and fixes them to a certain level of development. This decreases the frequency of the knowledge refresh for the programmers, and to some extent the need for upper levels rewrite.


The common denominator denies to the upper level the exact access to underlaying hardware functionality, and thus decreases the performance. With hardware that is below the denominator line, you risk getting a lot of emulation, which you potentially could avoid to a large degree on the upper level (it is often better informed what exactly is desired). With hardware above the denominator line, you may be denied access to built-in, hardware-accelerated higher level functions, that would increase performance and save you doing everything in your code.

Once the denominator level is fixed, it is hard to move without seriously impairing the backwards compatibility. The hardware, however, advances, and offers built-in upper-level functions and new abilities. Thus, this model quickly obsoletes its denominator levels (read: performance and usability).

The larger the common denominator, the more design work the model requires. (And the quicker it obsoletes, given the need to keep with the front.)


This model is the opposite of the chaotic model. It is canned and predictable, but non-flexible and with generally bad performance. Model upgrades are often needed (and done more rarely, at the expense of losing efficiency), and often carry major rewrites of other code with them.


These two models are the opposites of the scale. They are rarely, if ever, used in clear form. Most often, a driver model will combine them to some extent, falling somewhere in the middle.

The simplest combination is defining a (typically low) common denominator, and going chaotic above. While it theoretically provides both full access to the hardware abilities and something granted to base on, the granted is little, and the full access is determinable like with the chaotic model, in a complex way.

This combination also has some advantages:

Where more flexibility and performance is needed, you may go closer to the chaotic model. And where more replaceability and predictability is needed, you may go closer to the CD model. The result will be a driver model that gives more assets where they are really needed, and also has more negatives, but in an area where they aren't that important.

If the optimum for a specific element, eg. driver group, shifts, you may always make the shift obvious. Then, moving the model balance for this element will be more readily accepted by all affected by it.

Another way to combine the models is to break the big denominator levels into multiple sublevels, and to provide a way to describe the driver's sublevel, turning this model into indirect binding type.

All this group of models, however, has a big drawback: really good replaceability is provided only very close to the common denominator end of the scale, where flexibiility, performance, upgradeability and usability already tend to suffer. Skillful tuning may postpone the negatives to a degree, but not forever. Attempts to solve this problem are made by developing driver models with indirect binding.

Indirect binding models:

With this model, drivers are expected to provide first a description what they can do, and what they cannot. Then, the code that uses the driver binds to it, using the description.

Most of these models take the many assets of the chaotic model as a base, and try to add the good replaceability and function set predictability of the common denominator model.

Class-like model

In it, the sets of functions that drivers offer are organized in a class-like manner. Every class has a defined set of functions. Classes create a hierarchy, like the classes of OOP languages. (Drivers do not necessarily have to be written in an OO language, or to be accessed only from such one.) A class typically implements all functions found in its predecessor, and adds more (but, unlike OOP classes, rarely reuses predecessor code).

Classes and their sets of functions are pre-defined, but the overall model is extendable without changing what is present. When a new type of device appears, or a new device offers functionality above the current classes appropriate for it, a new class may be defined. The description of the class is created, approved and registered (earlier stages may be made by a driver writer, later - by a central body), and is made available to the concerned.

Every driver has a mandatory set of functions that report the driver class identification. Using them, an upper layer can quickly define what functionality is present. After this, the upper layer binds to the driver much like in the direct binding models.


If properly implemented, gives practically the same performance as the chaotic model. Additional checking is performed only once, when the driver is loaded. Class defining may be fine-grained enough to allow for practically exact covering of the hardware functionality.

The upgradeability and usability of the specific drivers are practically the same as those of the chaotic model. And the model global extendability and upgradeability, if properly designed, are practically limitless.

If properly designed, gives nearly the same replaceability as the CD model. (The things to check are more, but much less than with the chaotic model. What you will find in each of them is usually well documented. And the check procedure is standard and simple.)


The model itself requires more design and maintenance work than the direct binding models (except the larger CD models). (Actually, the amount of maintenance work is the same as with any CD model, but the work comes before the need for it is felt by everybody.)


This is probably the best of all driver models I have examined more carefully. Unhappily, most implementations I have seen are rather clumsy, to say the least.

Function map model

This model is actually a largest common denominator model, extended with the ability to provide a map of the implemented functions. In the simplest case, the map is a bitspace, where every bit marks whether its function is implemented. In other cases, the map is a space of accesses (eg. function pointers).


In some architectures and platforms, this is a very convenient way to describe a function array.

The model is simple, and therefore easy to use.


The model has all disadvantages of a LCD model.


The advantages of the model are relatively little, while the disadvantages are big. For this reason, it is used mostly as an addition to another model - eg. to the class-like model.

Global discussion:

The models list provided here is rather global, This is intentional: while designing, one must clarify one level at a time, much like with coding.

The list also is incomplete. For example, I never had the time to look properly for ideas into the OS/2 SOM, and it is said to work very well, and provide excellent performance. Of interest might be also more details of the QNX driver model. Someone with in-depth knowledge of these might be able to enhance this list.

In the course of discussion, Mike Fedyk, Theodore Y. T'so, Alexander Viro and others felt that Grigor's proposal was not specific enough. They said it was just hand-waving, with nothing concrete behind it. Grigor objected that it was important to do a thorough design before coding. At some point Timothy Miller said:

As one of the people who has been told "show me the code" before, let me try to help you understand what the kernel developers are asking of you.

First of all, they are NOT asking you to do the bottom-up approach that you seem to think they're asking for. They're not asking you to show them code which was not the result of careful design. No. Indeed, they all agree with you that careful planning is always a good idea, in fact critical.

Rather, what they are asking you to do is to create the complete top-down design _yourself_ and then show it to them. If you do a complete design that is well-though-out and complete, then code (ie. function prototypes) will naturally and easily fall out from that. Present your design and the resultant code for evaluation.

Only then can kernel developers give you meaningful feedback. You'll notice that the major arguments aren't about your design but rather about there being a lack of anything to critique. If you want feedback, you must produce something which CAN be critiqued.

Follow the scientific method:

  1. Construct a hypothesis (the document you have already written plus more detail).
  2. Develop a means to test your hypothesis (write the code that your design implies).
  3. Test your hypothesis (present your code and design for criticism).
  4. If your hypothesis is proven wrong (someone has a valid criticism), adjust the hypothesis and then goto step (2).

Perhaps you have not done this because you feel that your "high level" design (which you have presented) is not complete. The problem is that, based on what you have presented, no one can help you complete it. Therefore, the thing to do is to complete it yourself, right or wrong. Only when you have actually done something which is wrong can you actually go about doing things correctly. Actually wrong is better than hypothetically correct.

Then, you may be thinking that this will result in more work, because you'll create a design and write come code just to find out that it needs to be rewritten. But this would be poor reasoning. It would be extremely unrealistic to think that you could create a design a priori that was good and correct, before you've ever done anything to test its implications.

Mostly likely, you would go through several iterations of your spec and the implied code before it's acceptable to anyone, regardless of how good it is to begin with. Just think about how many iterations Con and Nick have gone through for their interativity schedulers; they've had countless good ideas, but only experimentation and user criticism could tell them what really worked and what didn't. And these are just the schedulers -- you're talking about the architecture of the whole kernel!

Mike agreed with this, but there was no further discussion.

3. New kpatchup Kernel Patching Script Version 0.02

2 Mar 2004 - 4 Mar 2004 (8 posts) Archive Link: "[ANNOUNCE] kpatchup 0.02 kernel patching script"

Topics: Version Control

People: Matt MackallZwane MwaikamboDave HansenRusty Russell

Matt Mackall said:

This is the first release of kpatchup, a script for managing switching between kernel releases via patches with some smarts:

Currently it knows about 2.4, 2.4-pre, 2.6, 2.6-pre, 2.6-bk, 2.6-mm, and 2.6-tiny.

Example usage:

 $ head Makefile
 $ kpatchup 2.6-mm
 2.6.2-rc2 -> 2.6.4-rc1-mm1
 Applying patch-2.6.2-rc2.bz2 -R
 Applying patch-2.6.2.bz2
 Applying patch-2.6.3.bz2
 Downloading patch-2.6.4-rc1.bz2...
 Applying patch-2.6.4-rc1.bz2
 Downloading 2.6.4-rc1-mm1.bz2...
 Applying 2.6.4-rc1-mm1.bz2
 $ head Makefile
 NAME=Feisty Dunnart
 $ kpatchup -q 2.6.3-rc1
 $ head Makefile
 NAME=Feisty Dunnart
 $ kpatchup -s 2.6-bk
 $ kpatchup -u 2.4-pre

This is an alpha release for people to experiment with. Feedback and patches encouraged. Grab your copy today at:

Zwane Mwaikambo was very excited by this, saying, "Oh i definitely owe you one now, this is replacing the ugly shell script i had before, i'm mostly using this now to download and patch up trees before cvs import'ing them." Rusty Russell was also happy to see this, and offered his own scripts in case they had anything worth merging. Dave Hansen also liked Matt's script, but said:

it doesn't look like it properly handles empty directories. I tried this command, this morning, and it blew up. I think it's because this directory is empty because of last night's 2.6.4-rc2 release. I don't grok python very well but is the "return p[-1]" there just to cause a fault like this? Would it be better if it just returned a "no version of that patch right now" message and exited nicely?

[dave@nighthawk linux-2.6]$ kpatchup-0.02 2.6-bk
"Traceback (most recent call last):
  File "/home/dave/bin/kpatchup-0.02", line 283, in ?
    b = find_ver(args[0])
  File "/home/dave/bin/kpatchup-0.02", line 240, in find_ver
    return v[0](os.path.dirname(v[1]), v[2])
  File "/home/dave/bin/kpatchup-0.02", line 147, in latest_dir
    return p[-1]
IndexError: list index out of range

I think your script, combined with Rusty's latest-kernel-version could make me a very happy person.

They debugged for a bit, and the thread ended.

4. Linux 2.6.4-rc1-mm2 Released

2 Mar 2004 - 10 Mar 2004 (17 posts) Archive Link: "2.6.4-rc1-mm2"

Topics: FS: NFS, Kernel Release Announcement, Virtual Memory

People: Andrew Morton

Andrew Morton announced 2.6.4-rc1-mm2, saying:

5. Long-Time LBD Overflow Bug Caught

3 Mar 2004 - 4 Mar 2004 (3 posts) Archive Link: "[PATCH] LBD fix for 2.6.3"

People: Eric SandeenAndrew Morton

Eric Sandeen said, "A couple xfs users stumbled upon this problem while trying to use 2.6 + CONFIG_LBD on ia32 boxes - mkfs.xfs followed by xfs_repair was failing. At first we thought it was a raid/md problem, but it's more generic than that. There's a problem in __block_write_full_page()" . The posted a patch to adjust the data types of a couple variables, so they wouldn't overflow; and Andrew Morton said:

egads. That bug has been there from day one. CONFIG_LBD cannot possibly work correctly due to this error.


Actually, there more instances of this bug in buffer.c. This should fix them up, and also let's be clearer about discriminating between block numbers, pagecache indices and offsets-within-pages.

He posted a patch with some remaining fixes, and Eric Sandeen offered a patch for one that Andrew apparently missed.

6. Linux 2.6.4-rc2 Released

3 Mar 2004 - 9 Mar 2004 (5 posts) Archive Link: "Linux 2.6.4-rc2"

Topics: FS: XFS, Hot-Plugging, Kernel Build System, Kernel Release Announcement, PCI

People: Linus TorvaldsSam RavnborgJeff Garzik

Linus Torvalds announced 2.6.4-rc2, saying:

Here's mainly ARM, XFS, PCI hotplug and firewire updates. And some parport cleanups and fixes from Al.

And a fairly small merge from Andrew (s390 and random stuff).

Lukasz Trabinski reported a compilation error, and Jeff Garzik suggested running 'make oldconfig' to clear it up. Sam Ravnborg replied, "If this cured it I would like to know. Because kbuild should run "make silentoldconfig" if needed. Timestamps of all KConfig files are checked etc."

7. KGDB Documentation

4 Mar 2004 (1 post) Archive Link: "Added KGDB documentation"

Topics: Modems

People: Amit S. Kale

Amit S. Kale posted some KGDB documentation:


kgdb is a source level debugger for linux kernel. It is used along with gdb to debug a linux kernel. Kernel developers can debug a kernel similar to application programs with use of kgdb. It makes it possible to place breakpoints in kernel code, step through the code and observe variables.

Two machines are required for using kgdb. One of these machines is a development machine and the other is a test machine. The machines are connected through a serial line, a null-modem cable which connects their serial ports. The kernel to be debugged runs on the test machine. gdb runs on the development machine. The serial line is used by gdb to communicate to the kernel being debugged.

This version of kgdb is a lite version. It is available on i386 platform uses a serial line for communicating to gdb. Full kgdb containing more features and support more architecture is available along with plenty of documentation at

Compiling a kernel:

Enable Kernel hacking -> Kernel Debugging -> KGDB: kernel debugging with remote gdb

Only generic serial port (8250) is supported in the lite version. Configure 8250 options.

Booting the kernel:

Kernel command line option "kgdbwait" makes kgdb wait for gdb connection during booting of a kernel. If you have configured simple serial port, the port number and speed can be overriden on command line by using option "kgdb8250=portnumber,speed", where port numbers are 0-3 for COM1 to COM4 respectively and supported speeds are 9600, 19200, 38400, 57600, 115200. Example: kgdbwait kgdb8250=0,115200

Connecting gdb:

If you have used "kgdbwait", kgdb prints a message "Waiting for connection from remote gdb..." on the console and waits for connection from gdb. At this point you connect gdb to kgdb. Example:

   % gdb ./vmlinux
   (gdb) set remotebaud 115200
   (gdb) target remote /dev/ttyS0

Once connected, you can debug a kernel the way you would debug an application program.

8. Linux 2.4.26-pre2

6 Mar 2004 - 7 Mar 2004 (11 posts) Archive Link: "Linux 2.4.26-pre2"

Topics: FS: XFS

People: Marcelo Tosatti

Marcelo Tosatti announced kernel 2.4.26-pre2, saying, "Here goes -pre2 -- it contains networking updates, network drivers updates, an XFS update, amongst others."

9. Status Of Highmem Support On Non-Highmem Machines Under 2.6

7 Mar 2004 (4 posts) Archive Link: "Highmem emulation for 2.6?"

Topics: Big Memory Support, SMP

People: Michael FrankPavel MachekMarc-Christian Petersen

Pavel Machek asked if anyone had highmem emulation for the 2.6 kernel; Michael Frank and Marc-Christian Petersen offered patches for this. Michael said of his own patch, "It was in -mm until Andrew dropped it due to it causing problems on SMP, NUMA and with ramdisk. Ramdisk expects to be at the end of lowmem zone so it wont work in it's current implementation with this patch when memory is shiften into highmem zone."

10. Struggling To Get KGDB Into The Main Kernel Tree

8 Mar 2004 - 10 Mar 2004 (36 posts) Archive Link: "kgdb for mainline kernel: core-lite [patch 1/3]"

Topics: Networking

People: Amit S. KaleAndrew Morton

Amit S. Kale said:

Here is kgdb for mainline kernel in three patches. This is a lite version of kgdb available from I believe that all of us agree on this lite kgdb.

It supports basic debugging of i386 architecture and debugging over a serial line. Contents of these patches are as follows:

[1] core-lite.patch: architecture indepndent code
[2] i386-lite.patch: i386 architecture dependent code
[3] 8250.patch: support for generic serial driver

Andrew Morton thanked him for working on this, and asked what exactly made the patch 'lite', i.e. what features had been left out for submission. Amit said:

Here are features that are present only in full kgdb:

  1. Thread support (aka info threads)
  2. console messages through gdb
  3. Automatic loading of modules in gdb
  4. Support for x86_64
  5. Support for powerpc
  6. kgdb over ethernet [This isn't ready in the full version as well at this point of time]

It turned out that Andrew wanted most of those features back into the patch; but according to Amit, there was some question of whether the patch could remain as clean as it was if those features were reinserted. In particular, Info Threads seemed to be a must-have for Andrew, while Amit felt this would be a particularly ugly feature to add back into the patch. But Amit said he'd look into this further, since Andrew wanted it so badly. Various folks descended into the code, but nothing conclusive came out of it.

11. Update For powernow-k8-acpi Driver

9 Mar 2004 - 10 Mar 2004 (4 posts) Archive Link: "powernow-k8 updates"

People: Pavel Machek

Pavel Machek said:

This adds powernow-k8-acpi driver, which likes on more machines than powernow-k8, but depends on acpi. I'd like to get this into 2.6.5... Does it look okay?

Also if you have problems with your eMachines cpufreq, apply this and switch to -acpi driver. It should fix it for you.

12. Emulex Goes Open Source

9 Mar 2004 - 10 Mar 2004 (9 posts) Archive Link: "[Announce] Emulex LightPulse Device Driver"

Topics: PCI

People: James SmartStefan SmietanowskiJeff GarzikJames BottomleyPete Zaitcev

James Smart said:

Emulex is embarking on an effort to open source the driver for its LightPulse Fibre Channel Adapter family. This effort will migrate Emulex's current code base to a driver centric to the Linux 2.6 kernel, with the goal to eventually gain inclusion in the base Linux kernel.

A new project has been created on SourceForge to host this effort - see . Further information, such as the lastest FAQ, can be found on the project site.

We realize that this will be a significant effort for Emulex. We welcome any feedback that the community can provide us.

Stefan Smietanowski remarked:

I wish to just tell you that I think you're doing the Right Thing(TM).

There are people that don't buy hardware for which the source isn't either available or included in the standard kernel, even if there are more patches or newer driver versions external to the main tree.

Good work and good luck!

Jeff Garzik also congratulated Emulex for making the move to Open Source, adding:

I'm only part way through a review of the driver, but I felt there is a rather large and important issue that needs addressing... "wrappers." These are a common tool for many hardware vendors, which allow one to more easily port a kernel driver across operating systems. Unfortunately, these sorts of abstractions continually lead to bugs. In particular, the areas of locking, memory management, and PCI bus interaction are often most negatively affected.

In particular, here is an example of such a bug:

elx_sli_lock(elxHBA_t * phba, unsigned long *iflag)

        unsigned long flag;
        LINUX_HBA_t *lhba;

        flag = 0;
        lhba = (LINUX_HBA_t *) phba->pHbaOSEnv;
        spin_lock_irqsave(&lhba->slilock.elx_lock, flag);
        *iflag = flag;

It is not portable for code to return the value stored in the 'flags' argument of spin_lock_irqsave. The usage _must_ be inlined. This fails on, e.g., sparc64's register windows.

But this bug is only an example that serves to highlight the importance of directly using Linux API functions throughout your code. It may sound redundant, but "Linux code should look like Linux code." This emphasis on style may sound trivial, but it's important for review-ability, long term maintenance, and as we see here, bug prevention.

It may not be immediately apparent, but elimination of these wrappers also increases performance. Many of the Linux API functions are inlined in key areas, intentionally, to improve performance. By further wrapping these functions in non-inline functions of your own, you eliminate several compiler optimization opportunties. In the case of spinlocks (above), you violate the API.

So I would like to see a slow unwinding, and elimination, of several of the wrappers in prod_linux.c.

  1. elx_kmem_alloc, elx_kmem_free: directly use kmalloc(size, GFP_KERNEL/ATOMIC) in the driver code.
  2. eliminate all *_init_lock, *_lock, and *_unlock wrappers, and directly call the Linux spinlock primitives throughout your code.
  3. strongly consider eliminating elx_read_pci_cmd, elx_read_pci, and simply calling the Linux PCI API directly from the lpfc driver code.
  4. eliminate elx_sli_write_pci_cmd hook, elx_write_pci_cmd wrapper, and directly call the Linux PCI API in the code.
  5. eliminate elx_remap_pci_mem, elx_unmap_pci_mem
  6. fix unacceptably long delay in elx_sli_brdreset(). udelay() and mdelay() functions are meant for very small delays, since they do not reschedule. Delays such as

            if (skip_post) {
            } else {

    should be converted to timers or (if in kernel thread context) schedule_timeout().

  7. eliminate elx_sli_pcimem_bcopy(,,sizeof(uint32_t)) in favor of "u32 foo = readl()"
  8. replace code such as
           ((SLI2_SLIM_t *) phba->slim2p.virt)->un.slim.pcb.hgpAddrHigh =
      (uint32_t) (psli->sliinit.elx_sli_read_pci) (phba, PCI_BAR_1_REGISTER);
      Laddr = (psli->sliinit.elx_sli_read_pci) (phba, PCI_BAR_0_REGISTER);
      Laddr &= ~0x4;

    with calls to pci_resource_start() and/or pci_resource_len()

  9. call pci_set_master() when you wish to enable PCI busmastering. This will set the busmaster bit in PCI_COMMAND for you, as well as set up the PCI latency timer.
  10. call pci_dma_sync functions rather than elx_pci_dma_sync()

That should get you started ;-)

James Bottomley also congratulated Emulex, and replied to Jeff's post, saying:

Actually, it would be my interpretation of the FAQ that most of this work is already intended (although Jeff gave specific instances than the generalities in the FAQ).

There were many more places than this in the driver that caused me to go "good grief no". However, it probably makes more sense if you work down your todo list and come back for a review when you're nearing the end of it. That way you don't get boat loads of comments about things you were planning to fix anyway.

James Smart replied:

First, thanks to those that have actually taken a look at the FAQ and source already. Do not believe your time was in vain - we will use every comment we receive.

We know there are a lot of issues we need to address. We echo many of the same sentiments. We had hoped the FAQ would explain where we are and where we are heading, so that people can judiciously choose when to evaluate the code base. That said, we will welcome any comments, at any time, at any detail level. I would hope that, even while the code base is changing, that we receive comments. There are constructs in the driver that are likely not going to change, such as the logging facility. How contentious is this ? What about the IP interfaces? and so on. Anything we receive, especially on the larger concepts in the driver, only helps us understand what's ahead.

Our plans are to complete most of the work list on the FAQ by early April. We'll try to make weekly drops on SourceForge, with each snapshot containing a log of the changes. Once the code base matures, we will ping the lists again, asking for feedback.

Elsewhere, Pete Zaitcev also responded to Jeff's long critique, saying, "I agree completely that Emulex code is infested with wrappers so much that it's harmful. However, the particular example you selected you interpret wrong." He went on:

Flag problem on sparc is fixed by Keith Wesolowsky for 2.6.3-rcX, and it never existed on sparc64, which keeps CWP in a separate register.

Why it took years to resolve is that the expirience showed that there is no legitimate reason to pass flags as arguments. Every damn time it was done, the author was being stupid. Keith resolved it primarily because it was an unorthogonality in sparc implementation.

Jeff replied:

You would never know there were so many sparc people, until I post something incorrect about it. <grin>

I stand corrected. As someone mentioned in private, it's actually a shame that was fixed, since that's one less argument that can be used against such wrappers ;-)

13. Linux 2.6.4-rc3 Released

9 Mar 2004 (1 post) Archive Link: "Linux 2.6.4-rc3"

People: Linus Torvalds

Linus Torvalds announced, "Hmm. Nothing earth-shaking here, most of the changes end up being minor code cleanups and fixes for things like memory leaks in some error handling paths etc."







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.