Kernel Traffic #131 For 3 Sep 2001

By Zack Brown

linux-kernel FAQ (http://www.tux.org/lkml/) | subscribe to linux-kernel (http://www.tux.org/lkml/#s3-1) | linux-kernel Archives (http://www.uwsg.indiana.edu/hypermail/linux/kernel/index.html) | kernelnotes.org (http://www.kernelnotes.org/) | LxR Kernel Source Browser (http://lxr.linux.no/) | All Kernels (http://www.memalpha.cx/Linux/Kernel/) | Kernel Ports (http://perso.wanadoo.es/xose/linux/linux_ports.html) | Kernel Docs (http://jungla.dit.upm.es/~jmseyas/linux/kernel/hackers-docs.html) | Gary's Encyclopedia: Linux Kernel (http://members.aa.net/~swear/pedia/kernel.html) | #kernelnewbies (http://kernelnewbies.org/)

Table Of Contents

Introduction

Well, I'm back from vacation. It was fun, but too short. By the time I adjusted to the nine-hour time difference, it was almost time to go home. Oh well. This issue I tried to capture things from the three previous weeks, but there was a lot of stuff.

On the good news front: Cedric Duval has been maintaining a Mutt patch that allows broken threads to be fixed (or, if you prefer, fixed threads to be broken). I'd been using an old version of this feature by David Welton for a long time, when Cedric took it under his wing, ported it up to the latest Mutt snapshots, and rounded out the features. The patch can be found at http://cedricduval.free.fr/mutt/, and I heartily recommend it. Please send feedback (mailto:cedricduval@free.fr?cc=zbrown@tumblerings.org) about the patch to Cedric.

Mailing List Stats For This Week

We looked at 3770 posts in 15525K.

There were 967 different contributors. 500 posted more than once. 267 posted last week too.

The top posters of the week were:

 

1. New Graphical Bootloader Under Development
6 Aug 2001 - 15 Aug 2001 (18 posts) Archive Link: "[ANNOUNCE] Gujin graphical bootloader 0.4"
Topics: Assembly, Compression, Executable File Format, Framebuffer, Power Management: ACPI, SMP
People: Etienne LorrainH. Peter AnvinEric W. BiedermanAlan CoxMatthias Andree

Etienne Lorrain announced:

I released the 0.4 version of the GPL Gujin bootloader.

Its homepage (with screenshoots) is at:
http://gujin.sourceforge.net/
Its documentation (FAQ, HowTo) is at:
http://sourceforge.net/docman/display_doc.php?docid=1989&group_id=15465
All other information and source/precompiled downloads at:
http://sourceforge.net/projects/gujin/

In short, this bootloader:

  1. Is for PCs with at least an 80386 and a VGA, VESA1 or VESA2+ graphic card.
  2. Loads Linux kernel images (zImage and bZimage) without the help of LILO, and short-circuit all real-mode code in the kernel to start at the first protected mode instruction of the kernel. You can use it to load unlimited size kernels and initrd, and easily select the text/graphic mode for running the kernel.
  3. is nearly completely written in C using GCC, so it is easy to modify/improve it. It uses e2fsprogs-1.22 to understand E2FS filesystems and a completely rewritten zlib to decompress the kernel. FAT12/FAT16 support is working.
  4. stay in real mode for the complete boot process, so should be compatible with all PCs.
  5. Can be installed as a bootloader (only on a floppy right now) or as a DOS/Win boot.exe
  6. 80 % of the code is already written, so I just need to write the last 80 % to have the 1.0 version -:) It is IHMO already a good rescue floppy disk or a good single floppy system loader (minimum configuration around 20 Kb).

Matthias Andree asked if there were any way to pass parameters to the kernel, and Etienne replied:

Right now, it is only possible at the installation of Gujin on the floppy or at the boot.exe generation, by using the "--cmdline=" option of "./instboot" - so it will append the parameters to all kernel booted. So you will need to recompile from the source.

It is still not possible to have different options for different kernels.

Have a look at "./instboot --help". The maximum size of this string is 64 bytes (compile time: see boot.h, struct gujin_param_str->extra_cmdline).

The target is to have a setup screen where this field can be modified, but before a more generic menu system is needed, and also a way to select the keyboard language.

Elsewhere, H. Peter Anvin pointed out that the feature to "short-circuit all real-mode code in the kernel to start at the first protected mode instruction of the kernel", "is a very bad idea. The kernel entry point is in real mode for a reason: it means that the kernel doesn't have to rely on the boot loader to provide the services it needs from real mode before entering protected mode once and for all. The interface to the real mode entry point is narrow and stable, the protected mode entrypoint is a kernel internal and doesn't have an interface that is guaranteed to be stable -- again, for good reason." Etienne replied:

it is time to explain why I did it this way - it is far to be the simplest solution.

First, I need to say that the Linux current interface is really stable: if you define "BOOT_KERNEL_BEFORE_2_0" in vmlinuz.[ch], Gujin will be able to boot pre- 1.0 Linux versions. Also, Gujin has always booted kernels this way.

Considering the kernel interface to the bootloader, I find it is IMHO a bit complex: you not only have to give parameters in memory (at 0x9000:0000 or at %esi on 2.4.1+) but also provide callbacks (BIOS calls) for quite a lot of things.

Alan Cox described it (incompletely) in his message in june, subject "draft BIOS use document for the kernel" at: http://marc.theaimsgroup.com/?l=linux-kernel&m=99322719013363&w=2

You also have to put in memory one or two compressed files (kernel and initrd) without any way to check if those hundred of Kbytes have been correctly read and are not corrupted. There will be no way - if an error is detected by the legacy loader at the decompression stage - to return to the bootloader saying: Fatal error, please select another kernel.

These two files in memory have also to be at fixed linear addresses in real mode - and if you have a memory manager (himem.sys) loaded, these address may not be free. Usually you will find at the bottom of the himem memory the smartdrv (disk cache) data. It is then impossible to load a file at a random memory address and stay in real mode to do futher processing. In this case, Gujin is just malloc'ing the memory (using himem.sys), loading and decompressing this file (checking its CRC32), and only then disable interrupts, switch to protected mode, copy the file at its intended linear address and jump to the kernel code.

Then, with the "old" method to load the kernel, you have the video selection menu. Well, it is a bit complex a thing to do in a bootloader, and its interface is complex (not speaking of the user interface), considering that you have already written everywhere in memory without knowing if (for instance) a video driver was loaded. All the video selection stuff has be to handled before selecting a kernel to load, IMO.

I could probably find other problem, but I do not want to do destructive criticism.

What do I propose? Lets put is simply:

  • The kernel file is a simple binary image to load at 0x100000, so you get it from the linux ELF file by just doing:
    objcopy -O binary -R .note -R .comment -S /usr/src/linux/vmlinux kernel
    gzip -9 kernel
    There is absolutely no limit on its size.
  • The initrd (if present) is a gzip of a filesystem image.
  • All information needed by the kernel is set in memory. It is clearly described on a C structure (vmlinuz.h::struct linux_param). This structure contains also (for reference) old fields which are no more used, or fields which were used only by previous bootloader. The is _no_ BIOS callback.
  • The hardware is setup correctly, for instance for the current video mode - if you are in graphic 8 BPP - the 256 color palette is set up with reasonnable values independant of the video card. Basically, hardware is immediately useable by the kernel.
  • There is a compatibility mode to load vmlinuz/zImage/bzImage.

    Note that all this is coded and working (I hope so). I know that with this solution, the bootloader and the kernel are more linked together, but I know also that the bootloader has to be a lot more intelligent (considering the number of related messages in the Linux lists). It should not try to load a Pentium compiled kernel on a 486. It should setup the video card (so be ready to get "invalid instruction exceptions" while executing BIOS calls). It should not try to run a corrupted kernel. It should not crash if a kernel/initrd file has been moved or two hard disks have been exchanged. It should display clear messages on what is wrong.

    It is up to the bootloader to crash if one BIOS call modify one register which is documented as constant (if it did not take care)

  • and in DEBUG mode send some informations to a serial/parallel port. When a new buggy BIOS appear, it is up to the bootloader to be upgraded, not to the Linux kernel.

In short it is a real and complete software which handle all/most of the (buggy) BIOSes in their natural environment: the i386 real mode. If there is few fields to add to "struct linux_param" (really unusual from history), then Gujin will be upgraded. Anyway it is GPL - and written in C so there is a lot of people around able to change it, unlike assembly code.

It is not as clean as described because of the APM / ACPI / PCI configuration BIOS32 calls; but that did not change by the Gujin bootloader. Note also that the SMP tables are passed in memory.

One last thing I have to add (for people who have read up to here), is that having removed the "header" of the vmlinuz file, I lack the only important parameter in this header: the root device (rdev command). Right now, I am guessing it in a lot of configurations, but that is not a perfect solution. I think one of the simplest way is to add a "root=/dev/hd**" in the described comment field of the GZIP header; that is still not coded.

From the same area, there is no way to know if a kernel will be able to boot in graphic mode (support of VESA framebuffer and which BPP are available). If vesafb is not compiled in and you start the kernel in graphic, the kernel boots but the display is like a crash... Right now the dirty way is the write-protect bit of the vmlinuz file - not a long term solution.

Close by in the thread, Eric W. Biederman added:

There are good reasons to use the kernel 32 bit entry point. In particular I routinely run linux on systems with exactly 10 16bit instructions. On one of them I don't even have normal memory between 640KB and 1MB. The only real parameter the kernel needs from the BIOS is memory size, and I could probably steel code from memtest86 so it wouldn't even need that.

However I do understand the instabilities, and the deficiets of the current 32bit entry point, and why having a standard 16bit entry point is a good thing. Which is why in 2.5 (if it ever starts) I intend to do the work required so we can have a standard cross platform native mode entry point to the kernel.

To keep linux portable we should never assume that on a given platform there is a specific kind of BIOS. Alpha-linux at least is nasty because of this. x86 linux is very nice because all you need to do on platforms that don't do support the classic BIOS interface is to drop the 16bit header. That is definentily a structure worth keeping.

Currently I have a stable easy to use structure that isn't even linux specific with just a few more details on how to encode parameters to work out. The structure is the ELF format for static executables, with a specific implementation of how parameters will be passed to it from the bootloader, before the bootloader goes away. In particular how to specify things like onboard ISA devices so we don't even have to assume what is or is not present on a motherboard for those devices that don't support probing and there is a firmware interface for finding them. The interesting case there is not so much how to encode the device but instead on how to represent the location of devices, and the connections between devices.

Being able to describe how an interrupt goes from a pci slot to an irq router to a legacy interrupt controller to a local apic and to the cpu, and simultanesly goes from the pci slot to an ioapic to a local apic to the cpu. And saying that pci slot is behind a PCIX<->pci bridge. Is an interesting question. Especially in data structures that have very few special cases. We are close in the kernel with struct resource, and struct pci_device but we haven't gone all of the way yet.

So no I don't think it is right to blast someone for using the 32bit entry point, while at the same time I do agree that right now it is a very questionable things to do.

H. Peter replied:

this wide interface is a pain to keep stable, and having bootloaders call it directly is a genuinely bad idea. It will lock us into an interface, or cause major breakage, when we have to do necessary revving of this interface.

Instead, the proper time to deal with this is at kernel link time. The PC-BIOS stuff should go in, say arch/i386/pcbios, and you then can have other platforms (say, for example, arch/i386/linuxbios) which has its own setup code. You then link a kernel image which has the appropriate code for the platform you're running on, and you're set.

He added:

I haven't blasted anyone; I said it is a bad idea. You're now encoding a ton of assumptions about what the kernel needs in each and every bootloader, which is bound to cause a major headache not too long down the road. For example, the stuff you describe above may very well be obsolete in 2 years with HyperTransport, Infiniband and 3GIO on the very near horizon. Now you have to suffer dealing with lots and lots of compatibility logic to make things work, which may not be possible, or we're going to have frequent breakage.

I do not believe this is a good idea. This kind of information belongs in the kernel image, although it should be abstracted out within the kernel tree.

Etienne reiterated that the interface had not changed in a long time, and that it would be very difficult to change in the future, since fewer and fewer people knew i8086 real-mode assembly. He described a better way of doing things but said he had no time to do it. He added, "Moreover, going from a simple solution (loading the binary image of an ELF file) to a complex one (as described) to solve problem which may appear in the future is not my way of thinking: it is already complex enought to do simple software." Eric replied that he was all set to start modifying that very portion of code that Etienne felt was so stable. There was a bit more discussion of technical issues.

 

2. Scalable Scheduling Patch And #ifdef Discussion
8 Aug 2001 - 14 Aug 2001 (20 posts) Archive Link: "[RFC][PATCH] Scalable Scheduling"
Topics: Ottawa Linux Symposium, Real-Time, SMP
People: Mike KravetzLinus TorvaldsDaniel PhillipsHubertus Franke

Mike Kravetz said:

I have been working on scheduler scalability. Specifically, the concern is running Linux on bigger machines (higher CPU count, SMP only for now).

I am aware of most of the objections to making scheduler changes. However, I believe the patch below addresses a number of these objections.

This patch implements a multi-queue (one runquue per CPU) scheduler. Unlike most other multi-queue schedulers that rely on complicated load balancing schemes, this scheduler attempts to make global scheduling decisions and emulate the behavior as the current SMP scheduler.

Performance at the 'low end' (low CPU and thread count) is comparable to that of the current scheduler. As the number of CPUs or threads is increased, performance is much improved over the current scheduler. For a more detailed description as well as benchmark results, please see: http://lse.sourceforge.net/scheduling/ (OLS paper section).

I would like to get some input as to whether this is an appropriate direction to take in addressing scalability limits with the current scheduler. The general consensus is that the default scheduler in the kernel should work well for most cases. In my opinion, the attached scheduler implementation accomplishes this by scaling with the number of CPUs in the system.

Linus Torvalds said he'd never apply the patch, since it used #ifdefs. Mike and Hubertus Franke replied that they'd rework the code to not use them, and Hubertus added that they were also interested in whether the overall approach would be acceptable. Linus replied:

I think what the code itself tried to do looked reasonable, but it was so distracting to read the patch that I can't make any really intelligent comments about it.

The only thing that looked really ugly was that real-time runqueue thing. Does it _really_ have to be done that way?

He added that he hadn't actually run the patch, so he had no idea what the performance was like. He asked, "I assume you've done lmbench runs across wide variety (ie UP to SMP) of machines with and without this?" And Mike replied, "Yes we have, we'll provide those numbers with the updated patch. One challenge will be maintaining the same level of performance for UP as in the current code. The current code has #ifdefs to separate some of the UP/SMP code paths and we will try to eliminate these." Daniel Phillips put in at this point:

Does it help if I clarify what Linus was suggesting? Instead of:

#ifdef CONFIG_SMP
.. use nr_running() ..
#else
.. use nr_running ..
#endif

write:

inline int nr_running(void)
{
#ifdef CONFIG_SMP
int i = 0, tot=nt_running(REALTIME_RQ);
while (i < smp_num_cpus) {
tot += nt_running(cpu_logical_map(i++));
}
return(tot);
#else
return nr_running;
#endif
}

Then see if you can make the #ifdef's go away from that too. (If that's too hard, well, at least the #ifdef's are now reduced.)

And Linus added:

Even more preferably, just have (in a header file)

#ifdef CONFIG_SMP
inline int nr_running(void)
{
...
}
.. other SMP cases ..
#else
#define nr_running() (__nr_running)
.. other UP cases ..
#endif

if you just cannot make an efficient function that just works for both.

No, we don't adhere to this everywhere. But we should (and largely _do_) try to.

Having the #ifdef's outside the code tends to have two advantages:

  • it makes the code much more readable, and doesn't split things up.
  • you have to choose your abstraction interfaces more carefully, which in turn tends to make for better code.

Abstraction is nice - _especially_ when you have a compiler that sees through the abstraction and can generate code as if it wasn't there.

 

3. Testing Recent 2.4 Kenel Performance
9 Aug 2001 - 10 Aug 2001 (6 posts) Archive Link: "Some dbench 32 results for 2.4.8-pre8, 2.4.7-ac10, and 2.4.7"
Topics: Disks: IDE, FS: ReiserFS
People: Steven ColeLinus Torvalds

Steven Cole reported, "I ran dbench 32 for 2.4.8-pre8, 2.4.7-ac10, and 2.4.7. Each set of three runs were performed right after a boot, running vmstat, and time ./dbench 32 with no pauses in between. The hardware is 384 MB, 450 P3, UP, IDE disk with ReiserFS on all partitions. The tests were done from a transparent Konsole and KDE2." He posted some verbose results, and a summary:

Run #1 Throughput 5.88569 MB/sec 2.4.8-pre8
Run #2 Throughput 5.95613 MB/sec 2.4.8-pre8
Run #3 Throughput 5.8547 MB/sec 2.4.8-pre8
 
Run #4 Throughput 7.84171 MB/sec 2.4.7-ac10
Run #5 Throughput 7.68447 MB/sec 2.4.7-ac10
Run #6 Throughput 7.85119 MB/sec 2.4.7-ac10
 
Run #7 Throughput 10.2184 MB/sec 2.4.7
Run #8 Throughput 10.0105 MB/sec 2.4.7
Run #9 Throughput 10.0215 MB/sec 2.4.7

Linus Torvalds replied:

Note that dbench performs best when no writeback actually takes place: the whole benchmark is completely optimizable. As such, the best numbers for dbench tend to be with (a) kflushd stopped, and (b) the dirty threshold set high.

Does the numbers change if you do something like

killall -STOP kupdated
echo 80 64 64 256 500 6000 90 > /proc/sys/vm/bdflush

to make it less eager to write stuff out? (That just stops the every-five-second flush, and makes the dirty balancing numbers be 80/90% instead of the default 30/60%)

In particular, the dirty balancing worked really badly before, and was just fixed. I suspect that the bdflush numbers were tuned with the badly-working case, and they might be a bit too aggressive for dbench these days..

Steven had a good night's sleep, and replied:

I re-ran dbench 32 with the settings above. Here are the results: (Same conditions as in the previous tests otherwise). Now, off to my day job.

Run #1 Throughput 12.7943 MB/sec 2.4.8-pre8
Run #2 Throughput 12.667 MB/sec 2.4.8-pre8
Run #3 Throughput 12.7091 MB/sec 2.4.8-pre8
 
Run #4 Throughput 13.7765 MB/sec 2.4.7
Run #5 Throughput 13.9632 MB/sec 2.4.7
Run #6 Throughput 13.9318 MB/sec 2.4.7

Linus replied:

Good, looks more like it

Now, the problem with dbench is that no way in hell should you optimize for dbench in general, because it is a sucky kind of benchmark.

For example, waiting until the last possible minute for writeouts is definitely the best setting for dbench, but it's a pretty horrible setting for usability.

I suspect that for optimal dbench performance we'll always have to let teh system admin do the above kind of horrible tweaking stuff, but at the same time I personally absolutely detest the need for tweaks in general, and I would like the default behaviour to be reasonable.

Killing kupdated, for example, is not really "reasonable". But I also suspect that now that dirty balancing works sanely, the "start writeout at 30% full" is a bit early too.

So instead of the 30/60% split (the first number is "when do we start writing things out", and the second number is "when do we start actively waiting for it"), a 50/75% setup might be more reasonable for regular loads, while making dbench at least a bit happier.

Are you (or others) willing to play around with the numbers a bit and look at both dbench performance and at interactive feel?

In general

echo x 64 64 256 500 3000 y > /proc/sys/vm/bdflush

will set the "start writeout" to 'x'%, and the "start synchronous wait" to 'y'% (and restart kupdated with "killall -CONT kupdated"). It would be interesting to hear where the sweet spot is.

Steven replied (testing 2.4.8-pre8):

I ran dbench 32 twenty-eight times, with the results below. The units are MB/sec throughput as reported by dbench. The rows are the first number, "start write out"; the columns are the second number, "start synchronous wait". For example, the lower left hand entry was obtained after echo 30 64 64 256 500 6000 60 > /proc/sys/vm/bdflush

60 70 80 90
90 8.693 8.690 8.853 8.875
80 8.822 8.724 8.703 8.910
70 8.558 8.557 8.785 8.420
60 8.194 8.141 8.006 7.847
50 6.662 6.738 6.904 6.892
40 6.347 6.252 6.380 6.274
30 5.687 5.802 6.209 5.898

I don't have a good quantifiable set of results for "interactive feel", other than it's never very good under heavy load. I'll try to get some useable results for interactive feel while running dbench again this weekend, looking for that elusive sweet spot.

 

4. Timpanogas Frees Sources To Netware Tools
10 Aug 2001 - 17 Aug 2001 (5 posts) Archive Link: "timpanogas.org - get it while you can."
Topics: Microsoft
People: Jeff V. MerkeyPavel Machek

Jeff V. Merkey said:

If anyone wants any of the netware file system stuff for NT or Linux, you'd best get it while the gettings good. I'll still be around working on Linux tape changers, etc. and SCI stuff, but the NetWare stuff is coming down Saturday and the FTP servers will be taken off line at this time that host any NetWare or Novell specific technologies. We will be filing litigation against Novell Sep 1, 2001 in an attempt to get the injunction off the defunct company and ourselves that's been around our necks for the past four years.

The site will still be up, but much changed. Downloads are at www.timpanogas.org and ftp.timpanogas.org. If you want anything, nows the time to get it. SCI code will still be maintained on this site.

When the new website goes up, folks may get a bit of a surprise since we've been running a plant genetics lab here alongside our software business. You will get a look at the results of our genetic manipulation of probiscidea and mesembrecanthea. We have isolated a cure for arthritis (not treatment, but a permanent cure) from plants from the Native American seedbanks and have been performing gene splicing and polyploid induction with several of these plants. Linux work on SCI will be continuing and in other areas.

Pavel Machek suggested GPLing the netwarefs tools, or no one would be able to use the source. Jeff replied, "I will cross mount the manos server to the FTP server this evening. This server has all of the source code for the entire TRG archive. I'll put all of it up this evening. It will be mounted off of /archive in the ftp home area. It also contains all of the Microsoft source code for all of our W2K file systems work. If it's helpful to anyone, they are welcome to it." Pavel reiterated, "Putting it off /archive is not enough, it does not make it GPLed. Those tools that are usefull should be marked with GPL so it is *legal* to use them. (You might also consider creating sourceforge.net project and putting your stuff there ;-)" Jeff replied, "It's all GPL, even the MS code. Sourcefroge is a great idea. I'll look into it."

 

5. Kernel Build 2.5, Release 1.1 Is Available
11 Aug 2001 - 18 Aug 2001 (24 posts) Archive Link: "Announce: Kernel Build for 2.5, Release 1.1 is available."
Topics: Kernel Build System
People: Keith OwensRussell KingPhilip Blundell

Keith Owens announced:

Release 1.1 of kernel build for kernel 2.5 (kbuild 2.5) is available. http://sourceforge.net/projects/kbuild/, Package kbuild-2.5, download release 1.1.

http://marc.theaimsgroup.com/?l=linux-kernel&m=99725412902968&w=2 contains information about the base release.

Changes from Release 1.

  • Upgrade to kernel 2.4.8. Nice to see how simple the DRM Makefile is now.
  • Correct a race when parallel building the global makefile, not all objects were being recognised as targets and were not recognised as candidates for recompile.
  • Replace hand coded rules with side_effect().
  • Document kbuild targets and C to assembler conversions. As always, Documentation/kbuild/kbuild-2.5.txt is your friend.
  • Remove the assembler() command. kbuild now works out if the source is .c or .S, no need for human intervention.
  • If you explicitly make foo.i or foo.s then kbuild automatically generates the required rules with the same flags as the corresponding .o file. Useful for debugging pre-processor or assembler problems, especially when gcc -save-temps does not work with multiple directories.
  • Standard generation of .s from .c files, where a .s file is required according to make. This includes tracking the dependencies of the .c file.

That last change lets me solve a long standing problem with kbuild 2.4. Every architecture has Assembler that requires offsets of fields within C structures or the mapping of C names to numbers. Assembler cannot include the C definitions so we need a mapping from C constructs to Assembler numbers. Every architecture has handled this problem in a different way, none of the methods are 100% accurate nor dependable.

i386 hard codes the offsets into the assembler code and hopes that the structure definitions never change.

Alpha uses a C program that generates the text for the assembler. This does not work in a cross compile environment because it assumes that HOSTCC == CC.

Cris generates a chunk of assembler from C then uses .include instead of #include, with some fancy conditional selection.

Mips, parisc, ppc, sparc generate assembler then extract and reformat lines from the assembler. This works in both local and cross compile mode and is getting close to the correct way of doing it. But it still has problems, see below.

IA64 in 2.4 is particularly loathsome. It uses different methods in native and cross compile modes, when the cross compile version would do for both. It ships a copy of the generated asm/offsets.h which is totally unreliable because the real offsets.h depends on the user's .config. To add insult to injury, offsets.h is included in processor.h and ptrace.h on ia64 which means that it pollutes almost every C file.

None of the above methods handle dependency checking at all. PPC makes an attempt but it is manually defined and is incomplete, no other arch even makes an attempt. All architectures assume that the user always runs make dep after any config changes that affect the assembler offsets. If the user forgets to run make dep and the assembler and C values do not match - oops.

kbuild 2.5 has a solution which works in all modes, is standard across all architectures and automatically tracks dependency changes. No more room for human error.

There were a few scattered comments and a few patches and upgrades; at one point Russell King mentioned, "I'm sorry, the ARM version of GCC does not support %c0 in a working state. The way we generate the offsets on ARM is here to stay for the next few years until GCC 3 has stabilised well enough for use with the kernel, and the ARM architecture specifically. Please don't rely on %c0 working." Philip Blundell replied, "I should think it can be made to work in 2.95.4. Did you try the patch I sent you a few months ago?" and Russell said, "No - I've not had a need to rebuild gcc yet, and the patch is low priority since the kernel has to build with the compilers that people already have, not the bleeding edge. Sorry, I don't do gcc, as I've explained before."

 

6. Decisions On Stability Of 2.4
11 Aug 2001 - 17 Aug 2001 (37 posts) Archive Link: "Hang problem on Tyan K7 Thunder resolved -- SB Live! heads-up"
Topics: SMP, Sound: ALSA
People: Jeffrey IngberAlan CoxManuel McLureLinus Torvalds

In the course of discussion, Jeffrey Ingber reported, "I noticed that the EMU10K1 driver was updated in 2.4.8 so I tried it. I had a lockup four times during audio playback, so I switched back to ALSA." Alan Cox replied, "The in kernel one seemed fine. The 2.4.8 update one is definitely broken on SMP boxes." Manuel McLure said, "I'm getting 2.4.8 Oopsen that seem to be in emu10k1 code on UP," and Alan said:

Yep. So far the new driver that Linus took from a non maintaier breaks

  • SMP
  • Some mixers
  • Uniprocessor with some cards
  • Surround sound (spews noise on cards)

so I think Linus should do the only sane thing - back it out. I'm backing it out of -ac. Of my three boxes, one spews noise, one locks up smp and one works.

Linus Torvalds replied, "The problem with backing it out is that apparently nobody has tried to really maintain it for a year, and if it gets backed out nobody will even bother to try to fix it. So I'll let it be for a while, at least." Alan remarked, "I thought this was a stable kernel tree not 2.5 ?" Linus replied:

Well, considering that the _old_ driver is also not stable and doesn't work on all machines, we're really screwed whichever way we turn.

If the old driver was a known working one, this would be a no-brainer. As it is, the old driver doesn't work for people _either_ - but they probably aren't piping up, because the old driver has been broken forever.

So we have a situation that the new driver works better on some machines, and the old driver works better on others. The old driver will obviously neevr get fixed (we've given it several years now), so the old driver is _known_ to be terminally broken. The new driver is a question mark in that regard.

So I'd rather give the new driver a chance, and see if people can get it fixed. For example, the oops that people have reported _seems_ to be due to initializing the tasklet before actually having initialized all the data structures the tasklet depends on. It may well be that moving the two "tasklet_init()"s down two lines would fix it.

There was no reply.

 

7. 2.4.9 And A Vacation
16 Aug 2001 (1 post) Archive Link: "Off for a week, linux-2.4.9..."
Topics: Big Memory Support, FS: FAT, FS: NFS, Kernel Release Announcement
People: Linus TorvaldsTrond MyklebustJens AxboeMark HemmentDavid GibsonJes Sorensen

Linus Torvalds announced Linux 2.4.9, saying:

I'm off to Finland for a week+, and will not be reading email or checking the newsgroups during that time. I've put up a 2.4.9 kernel on ftp.kernel.org, and would suggest that people try it out and discuss it on the mailing lists, but NOT email me. I'll be interested to hear about problems when I return, but I don't have a big hankering to have thousands of messages waiting for me.

Also, I've been getting a _lot_ of patches, and if yours didn't show up it's because I got too many. Never fear, there's always tomorrow. Except in this case it's "in a week or two".

He appended the changelog:

  • David Miller: sparc updates, FAT fs fixes, btaudio build fix
  • David Gibson: Orinoco driver update
  • Kevin Fleming: more disks the HPT controller doesn't like
  • David Miller: "min()/max()" cleanups. Understands signs and sizes.
  • Jens Axboe: CD updates
  • Trond Myklebust: save away NFS credentials in inode, so that mmap can writeout.
  • Mark Hemment: HIGHMEM ops cleanups
  • Jes Sorensen: use "unsigned long" for flags in various drivers

There was no reply.

 

8. New Kernel Hacker Attempts 0.01
16 Aug 2001 (4 posts) Archive Link: "installing .01"
Topics: Assembly, Virtual Memory
People: Linus TorvaldsTristanDavid A. FrantzAlbert D. CahalanNicholas Knight

Tristan Sloughter asked Linus Torvalds privately about running kernel 0.01; Linus replied privately:

Well, historically in order to be able to install the 0.01 kernel, you had to have a full minix-386 distribution - that was the only way to make the filesystem, and to copy the necessary files over etc.

And quite frankly, it's been ten years for me too, so I don't remember all the details.

You probably don't have access to minix-386 anyway, so I suspect that what you _should_ do is try to use a newer version of Linux to bootstrap with. You need a really old compiler to compile Linux-0.01 with - I think it needs gcc-1.40 or similar. It simply won't compile with newer compilers.

All in all, yes, it can be done. I haven't done it personally in ten years, and never hosted from Linux, though. I suspect your best option would be to try to get a discussion going on how to do all this on the kernel mailing list or similar, feel free to quote this email as a starting point.

Tristan did so, saying, "i just recently joined this mailing list because i need help. And Linus suggested i join the kernel mailing list, and quote his email to me as a starting point. I want to install the linux kernel 0.01 on my 386 machine, and im lost on how to do it. I am 16 and only worked with linux for the last two years. I see running .01 on my 386 as a learning experience into the depths of the machine, and how to use assembly and work on OSs." David A. Frantz replied:

kernel 0.01 is well before my Linux time, so I can't help with that at all. What I can suggest is that you get a version fo Redhat up and running on your PC. Then install find a Virtual Machine system that will allow you to install another version of linux that you can run in a VM session. Use this image as your learning environment. All that having been said, do understand that I've never done this, so others may have comments that are more relavant.

MY thoughts are these:

First; the kernel is constant being updated. Your best bet to get relavant help with any thing is to be working on current sources.

Second; many of the tools that are shipped with Linux are vastly improved in thier later revisions. Using these tools should result in an over all better experience.

Third; the quality of the kernel is constantly improving. This improved kernel should serve as a solid base to learn about operating systems on.

Albert D. Cahalan also replied to Tristan:

Go up a few versions, to 0.02 maybe, if you have any hopes of running a compiler on this system. You will need at least 4 MB of RAM for compiling, since the early kernels didn't support swap.

Minix-386 is a hacked up Minix. Minix is an educational OS for the 8088 that was just recently made free. There once was a collection of patches that would add 386 feature support. So you could get Minix running, patch it, then build Linux.

Try this:

The Minix filesystem can be 64 MB at most. Make a partition for that, one for swap, and one for a newer Linux distribution. This pre-Slackware thing would be a good choice: http://www.ibiblio.org/pub/Linux/distributions/MCC/2.0+/

Um, you really don't want to be building gcc on a 386. You might have to build intermediate versions, since the old gcc might not compile with the most recent gcc. Perhaps you have or can borrow a modern machine for builds.

Finding the extras, like libc and a shell, will be hard.

You might need a boot loader called "shoelace". You should be able to run this from LILO.

Nicholas Knight took exception to Albert's remark that Minix-386 was a "hacked up" Minix. He gave a pointer to The Minix 2.0.2 page (http://www.cs.vu.nl/pub/minix/2.0.2/) . The home page is at http://www.cs.vu.nl/pub/minix/ There was no reply.

 

9. Qlogic/FC Firmware Licensing Issues
21 Aug 2001 - 22 Aug 2001 (60 posts) Archive Link: "Qlogic/FC firmware"
Topics: BSD: FreeBSD, BSD: NetBSD, BSD: OpenBSD
People: David S. MillerJes SorensenAlan CoxMatthew Jacob

An irate David S. Miller noticed that the Qlogic/FC firmware had been removed from the Linux sources. He said:

Who removed it from the 2.4.x driver recently, and why?

I've been playing around, accidently corrupting my firmware a few times, and had to grab the firmware back from older trees to make my qlogic,FC card usable again.

Removing the firmware makes no sense, if the firmware was incorrect for some reason, simply correct it.

Jes Sorensen replied, "Alan did after I pointed out to him that it was incompatible with the GPL (BSD license with advertisement clause). Really hard to fix unless you get QLogic to change the license for you." Alan Cox added, "Also the firmware we were including was seriously out of date, was a release candidate (not a certified release) and took up tons of ram." Later on, David said, "If the firmware was out of date, update it to a known "Qlogic stamp of approval" version." Alan replied, "That requires sorting licensing out with Qlogic. I've talked to them usefully about other stuff so I'll pursue it for a seperate firmware loader module." Jes remarked:

getting firmware out of them tends to be up and down.

However I just looked through the QLogic v4.27 provided driver from their web site and it does in fact included firmware with a GPL license.

Dave, if you want to play with this and stick it into the qlogicfc.c driver then you will at least have something that sorta works for now (module all the other problems with that driver).

http://www.qlogic.com/bbs-html/csg_web/
adapter_pages/driver_pages/22xx/22linux.html

They do have a stupid 'read and agree' license in front of that page if you go in via the official qlogic.com door, however if the code inside is GPL then I asume it's GPL.

A couple posts later, Matthew Jacob said:

A few of pieces of information and comments:

The "BSD" copyright in the f/w I have from QLogic is there only because Theo Deraadt threatened to pull QLogic from the 2.7 OpenBSD release. I'd been pleading with them for over a year to do *something*. I should have done a BSD license w/o the advert clause, but, oh, well.....

To understand what shape you're in, it's really best to load firmware into the card's SRAM. That way you *know* it understands feature foo (like SCCLUN, for example). QLogic produces so many different flavors of firmware of the same nominal revision that it's hard to deduce the viability or safety of some firmware.

That said, it *is* possible in a non-BIOS environment to pull firmware out of flash rom, I believe- you have to do some hand strobing of registers that normally only the firmware touches, but I think it *is* technically possible to pull the contents of flash out into system memory, figure out where the 'resident' f/w image is in this goop and then download *that* into SRAM and restart the sequencer. Still- this leaves you in the same position as before.

IMO, the best thing is to do an ispfw load module that goes away after you've loaded SRAM. I've started down this path in FreeBSD (there's a separate ispfw module)- which will only be really useful if module memory ever gets reclaimed in FreeBSD :-). This is of particular importance for my driver, because in supporting 1020 (SBus (yes for NetBSD/OpenBSD, not yet for Linux) && PCI), 1080/1280 Ultra2, 12160 Ultra3, 2100 FC, 2200FC and 2300FC (next week)- you end up with an absurd amount of f/w images.

I'm sure there's something similar that can be done for Linux.

Alan- you say you can talk to QLogic- good. I've been finding it harder and harder to talk to them. If you can get them to put out 2100 and 2200 and 2300 f/w with GPL for Linux- great.

Later Matthew double checked and found that the code really was GPLed.

 

10. Status Of Revision Control For The Kernel
23 Aug 2001 - 24 Aug 2001 (12 posts) Archive Link: "source control?"
Topics: BitKeeper, Version Control
People: Andrew GroverNicholas KnightLarry McVoyAlan CoxGerard RoudierCort DuganLinus Torvalds

Andrew Grover asked, "Is Linux development ever going to use source control? This was talked about at the Kernel Summit, and I haven't heard anything about it since." Nicholas Knight replied, "The kernel has source control, its name is Linus Torvalds, CVS with a brain. Wether or not the mainstream kernel will ever go pure CVS or the like is really up to Linus, and so far I've not seen much indication that he's going to do it, at least not before he decides to retire from steering the kernel's development." Larry McVoy also replied to Andrew:

There are two features that Linus wants in BitKeeper that we haven't finished yet. One is the removal of the revision control files from the working tree and the other is the ability to break the tree up into logical units (we call 'em filesets) so that you can more easily pick and choose which patches you want in your tree.

Linus pinged me about these a while back, we've made some progress on them but they aren't done yet. When they are done we'll let Linus take it for a whirl and see what he thinks.

In the meantime, the PPC people maintain a pure Linux tree in BK, you can see it at http://ppc.bkbits.net and Ted Tso has recently done a nice import of the various kernel versions complete with Linus' change logs. I need to work with the PPC guys and Ted to get to one tree; it's not an easy issue because the PPC have a lot of changes in their tree but Ted's tree was done more nicely, he did some extra work to preserve timestamps and comments.

And to avoid yet-another-BK-flamewar, I'm not saying Linus will or will not use BitKeeper, all I'm saying is that we're making changes he wants and then he'll see if it is good enough for him. I will say that he has eased slightly off of his original position of "I'll use BitKeeper when it is the best" because I asked him if that meant what I think both he and I would mean, i.e., "it is not physically possible for it to be better" as opposed to "it's better than all the other crap out there". I think we agreed we have to be well past #2 but not necessarily to #1 (which is a good thing, at the rate we're going we'll hit the best sometime this century but that's as close as I want to call it :-)

There was no reply to this, but elsewhere, Alan Cox said about the possibility of Linux getting source control, "It does. Or at least many of the development teams do. That doesn't mean a general CVS is a good idea. CVS make it all to easy for other people to push crap into your tree." Gerard Roudier asked, "What other people? You can only allow trusted people to commit, and backing out crap is quite easy." Alan replied, "This is the model we use. The trust people list is Linus Torvalds." Gerard replied, "You just pointed out the problem. Linus being the only trusted committer for more than 100 MB of source base as he was for less than 1 MB 10 years ago. And our single committer got some other loads as he has a job, children, a boss, a mother-in-law :), may-be pets, etc..." He added, "The fact that Linux has great success does not mean that Linus is right on the way he wants the kernel maintainance to proceed. It just means that he hasn't been too wrong on this point, in my opinion. :-)"

Elsewhere, Cord Dugan said, "There is great benefit to making it very hard for people to get changes into a tree. It forces people to ask "Is this really worth all the effort?" several times. It's a great filter."

 

11. Oops In 3c59x Driver Under Recent -ac Kernels
24 Aug 2001 - 28 Aug 2001 (5 posts) Archive Link: "oops in 3c59x driver"
People: Wichert AkkermanAlan Cox

Wichert Akkerman reported, "After switching my laptop to a 2.4 kernel I've had it die occasionaly and I finally managed to get an oops out of it today (not running X makes that a lot simpler :)." He posted a decoded oops, and added, "This oops was made using 2.4.7ac11 (with freeswan 1.91 patch included but which is not used). I get the same problem on 2.4.8ac5 and all other 2.4 releases from the last few weeks as well." Alan Cox replied:

Beautiful trace. You took an IRQ during PnPBIOS call and your machine exploded. Do me a favour -

Change the semaphore in drivers/pnp/pnp_bios.c to a spinlock_irqsave and __cli/ spin_unlock_irqrestore. See if the crashes then go away.

Wichert replied that this seemed to have fixed it; there were a few more comments, and the thread ended.

 

12. 2.4.10-pre1 Is Available
27 Aug 2001 - 28 Aug 2001 (12 posts) Archive Link: "patch-2.4.10-pre1"
Topics: Big Memory Support, Disks: IDE, FS: UMSDOS, FS: ramfs, Kernel Release Announcement, USB
People: Linus TorvaldsJan KaraBrad HardsRogier WolffDaniel PhillipsDavid GibsonVojtech PavlikJeff HartmannRalf BaechleBen LaHaiseKeith Owens

Linus Torvalds announced 2.4.10-pre1, saying:

Ok, I'm back from Finland, and there's a 2.4.10-pre1 update on kernel.org. Changelog appended..

The most noticeable one (under the right loads) is probably the one-liner by Daniel that avoids some bad behaviour when swapping.

He posted a changelog:

  • Jeff Hartmann: DRM AGP/alpha cleanups
  • Ben LaHaise: highmem user pagecopy/clear optimization
  • Vojtech Pavlik: VIA IDE driver update
  • Herbert Xu: make cramfs work with HIGHMEM pages
  • David Fennell: awe32 ram size detection improvement
  • Istvan Varadi: umsdos EMD filename bug fix
  • Keith Owens: make min/max work for pointers too
  • Jan Kara: quota initialization fix
  • Brad Hards: Kaweth USB driver update (enable, and fix endianness)
  • Ralf Baechle: MIPS updates
  • David Gibson: airport driver update
  • Rogier Wolff: firestream ATM driver multi-phy support
  • Daniel Phillips: swap read page referenced set - avoid swap thrashing

 

 

 

 

 

 

We Hope You Enjoy Kernel Traffic
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License, version 2.0.