Kernel Traffic #113 For 30�Mar�2001

By Zack Brown

linux-kernel FAQ (http://www.tux.org/lkml/) | subscribe to linux-kernel (http://www.tux.org/lkml/#s3-1) | linux-kernel Archives (http://www.uwsg.indiana.edu/hypermail/linux/kernel/index.html) | kernelnotes.org (http://www.kernelnotes.org/) | LxR Kernel Source Browser (http://lxr.linux.no/) | All Kernels (http://www.memalpha.cx/Linux/Kernel/) | Kernel Ports (http://perso.wanadoo.es/xose/linux/linux_ports.html) | Kernel Docs (http://jungla.dit.upm.es/~jmseyas/linux/kernel/hackers-docs.html) | Gary's Encyclopedia: Linux Kernel (http://members.aa.net/~swear/pedia/kernel.html) | #kernelnewbies (http://kernelnewbies.org/)

Table Of Contents

Mailing List Stats For This Week

We looked at 904 posts in 3862K.

There were 364 different contributors. 166 posted more than once. 128 posted last week too.

The top posters of the week were:

1. Status Of DC-315U SCSI Driver

11�Mar�2001�-�19�Mar�2001 (7 posts) Archive Link: "About DC-315U scsi driver"

Topics: Disks: SCSI

People: Kurt Garloff,�Linus Torvalds

Someone asked about the DC-315U SCSI driver, which seemed not to have been updated since the version for kernel 2.4.0-test9-pre7, from December 2000. Kurt Garloff, the patch maintainer, replied that he'd been getting some strange bug reports, very difficult to track down or even reproduce. He explained that currently he'd added a lot of debugging code, and that until the problems were solved, he really couldn't submit the driver to Linus Torvalds. He added that the lack of good public docs was also a hindrance.

2. Compiler Recommendations

11�Mar�2001�-�21�Mar�2001 (6 posts) Archive Link: "make: *** [vmlinux] Error 1"

People: J.A. Magallon,�Alan Cox,�Krzysztof Halasa

Someone got errors compiling 2.4.2, and J.A. Magallon suggested, "If you are using pgcc, try getting a real less-buggy compiler, like egcs1.1.2 or gcc-2.95 (even 2.96 willl work)." Krzysztof Halasa reported problems with gcc 2.96, and Alan Cox replied, "2.96-69 is needed. 2.96-74 for DAC960 (packing assumptions changed in gcc cvs)." It turned out that the original poster had been using pgcc, and they said at one point that they'd switch back to gcc.

3. Status Of DPT Driver

13�Mar�2001�-�22�Mar�2001 (8 posts) Archive Link: "DPT Driver Status"

Topics: Disk Arrays: RAID, I2O

People: Marko Kreen,�Omar Kilani,�Jacek Lipkowski

Dalton Calford had a DPT/Adaptec DPT RAID V century card, and couldn't find a driver for the 2.2.18 kernel. He'd looked everywhere, and even contacted Deanna Bonds at Adaptec, with no luck. The most recent version he could find was six months old. Marko Kreen confirmed the problem and the experience with Adaptec, saying, "When I last contacted them, couple of months ago, through I-dunno-how-many-middle[wo]men they assured that "driver is in developement" and "soon we make a release"..." He added:

I have ported the 1.14 version of the driver to 2.2.18. Basically converted their idea of patching with cp to normal diff and dropped all reverse changes.

http://www.l-t.ee/marko/linux/dpt114-2.2.18p22.diff.gz

It was for pre22 but applies cleanly to final 2.2.18. The other soft was most current in valinux site:

http://ftp.valinux.com/pub/mirrors/dpt/

Omar Kilani also confirmed the trials and tribulations of Dalton and Marko, and added:

I too once felt your pain. Searched far and wide, etc. But then I stumbled upon ftp://ftp.suse.com/pub/people/mantel/next/ Which has patches for everything you could ever want, all integrated, if you choose them to be. Anyway, inside those .tgz's was version 2.0 of the DPT I2O drivers. I've separated them from the .tgz, and stuck them up here:

Kernel 2.2.18:
http://aurore.net/source/dpt_i2o-2.0-2.2.18.gz

Kernel 2.4.2
http://aurore.net/source/dpt_i2o-2.0-2.4.2.gz

Try 'em! :-) Not sure how they compare to Markos' version. I exchanged my ASR2100S for a Mylex AcceleRAID 170 - because DAC960 support is so much better ;-) and I loved reading through the DAC960 sources - so clean and easy to understand!

And Jacek Lipkowski added:

i also have a patched linux-2.4.2-ac20 tree for my own use at ftp://acid.ch.pw.edu.pl/pub/sq5bpf/linux-2.4.2-ac20+dpt.tar.gz that supports dpt smartraid V (i found a patch for 2.4.0-pre6 and hand patched it in). it seems to work with my ISP2150 (didn't crash yet :), after compiling with egcs 1.1.2 (some people warned about using anything else than gcc 2.7.2.3).

the only caveat is that if you set the ramsize 49152, root flags to 0 etc so it loads a floppy after a prompt, the dpt controller (and eepro100 that was also compiled in) gets initialised _after_ the root floppy is loaded. i'm not sure if this is a bug with the dpt patch or with the original kernel (will check tomorrow).

4. Status Of RAID HOWTOs

14�Mar�2001�-�19�Mar�2001 (5 posts) Archive Link: "State of RAID (and the infamous FastTrak100 card)"

Topics: Disk Arrays: RAID

People: Jakob Ostergaard

As part of making a separate point, Phil Edwards mentioned that the RAID-related HOWTOs at http://linuxdoc.org seemed out of date. Jakob Ostergaard replied:

Ok, I get the feeling it may be the Software-RAID howto you're referring to here... Let me explain why it's not updated.

Fact is, I haven't updated the document because 99% of what it says is still the perfect truth.

Software-RAID in 2.2 is buggy and requires patching to go to the so-called alpha versions (which the HOWTO explains are not alpha-quality but actually quite usable).

However, 2.4 is out and doesn't need patching, and I should probably update the howto to reflect that. But still, most of what's in the HOWTO is as correct as it can be.

5. Status Of Kernel Preemption

14�Mar�2001�-�23�Mar�2001 (26 posts) Archive Link: "[PATCH for 2.5] preemptible kernel"

Topics: Hot-Plugging, Real-Time, SMP

People: Nigel Gamble,�Rusty Russell,�Pavel Machek

Nigel Gamble announced his latest attempt to make the kernel preemptible. He alerted folks to the fact that this was definitely 2.5 material, but he wanted to share his work anyway, so he posted the patch. He pointed out that the patch alone could not guarantee low latency, because preemption could not happen while anything held a spinlock. So the patch needed cooperation from user-space, and, Nigel said, would enable better and better preemption as the kernel locking mechanisms became finer grained. As far as performance, he said, "I think this patch has a negligible effect on throughput. In fact, I got better average results from running 'dbench 16' on a 750MHz PIII with 128MB with kernel preemption turned on (~30MB/s) than on the plain 2.4.2 kernel (~26MB/s)."

Pavel Machek was very impressed by this result. Several folks offered their criticism of the patch, including Rusty Russell, who said:

I can see three problems with this approach, only one of which is serious.

The first is code which is already SMP unsafe is now a problem for everyone, not just the 0.1% of SMP machines. I consider this a good thing for 2.5 though.

The second is that there are "manual" locking schemes which are used in several places in the kernel which rely on non-preemptability; de-facto spinlocks if you will. I consider all these uses flawed: (1) they are often subtly broken anyway, (2) they make reading those parts of the code much harder, and (3) they break when things like this are done.

The third is that preemtivity conflicts with the naive quiescent-period approach proposed for module unloading in 2.5, and useful for several other things (eg. hotplugging CPUs). This method relies on knowing that when a schedule() has occurred on every CPU, we know noone is holding certain references. The simplest example is a single linked list: you can traverse without a lock as long as you don't sleep, and then someone can unlink a node, and wait for a schedule on every other CPU before freeing it. The non-SMP case is a noop. See synchonize_kernel() below.

This, too, is soluble, but it means that synchronize_kernel() must guarantee that each task which was running or preempted in kernel space when it was called, has been non-preemtively scheduled before synchronize_kernel() can exit. Icky.

Rusty's third point sparked a bit of technical discussion, but no one felt the problems were really intractable.

6. Amount Of Swap To Use In 2.4

15�Mar�2001�-�17�Mar�2001 (21 posts) Archive Link: "Is swap == 2 * RAM a permanent thing?"

People: Mike A. Harris,�Rik van Riel,�Christophe Barbe

Mike A. Harris asked, "Is the fact that we're supposed to use double the RAM size as swap a permanent thing or a temporary annoyance that will get tweaked/fixed in the future at some point during 2.4.x perhaps? What are the technical reasons behind this change?" Rik van Riel explained:

The reason is that the Linux 2.4 kernel no longer reclaims swap space on swapin (2.2 reclaimed swap space on write access, which lead to fragmented swap space in lots of workloads).

This means that a lot of memory ends up "duplicated" in RAM and in swap.

I plan on doing some code to reclaim swap space when we run out, but Linus doesn't seem to like that idea very much. His argument (when you're OOM, you should just fail instead of limp along) makes a lot of sense, however, and the reclaiming of swap space isn't really high on my TODO list ...

OTOH, for people who have swap < RAM and use it just as a small overflow area, Linus' argument falls short, so I guess some time in the future we will have code to reclaim swap space when needed.

Christophe Barbe asked what Rik meant by 'reclaiming swap space', and Rik explained, "When we swap something in from swap, it is in effect "duplicated" in memory and swap. Freeing the swap space of these duplicates will mean we have, effectively, more swap space."

7. Per-User Private Directories

19�Mar�2001 (2 posts) Archive Link: "Per user private directories - trfs"

People: Amit S. Kale,�Folkert van Heusden

Amit S. Kale announced:

Translators for providing per user private directories and restricting visibility of files and directories using the translation filesystem are available now at http://trfs.sourceforge.net/

Per user private directories:

Files created in a per user private directory are not visible to users other than the owner of the files. Per user view enables users to use shared directories as if they were private. Using a peruser view for a shared directory like /tmp allows users to have their own copy of the directory. It also helps reduce contention for directories like /var/spool/mail that undergo a large number of file creations and removals.

Restricted visibility of files and directories: Owner of a file can make it invisible to group (of the file) or others by restricting its visibility. A directory listing by a user shows only those files which are visibile to the user. Invisible files cannot be accessed even by using a stat system call.

Folkert van Heusden liked the concept, but would have implemented it differently. He offered up an alternative proposal, but there was no discussion, and the thread ended.

8. User Space Web Server Accelerator Support

19�Mar�2001�-�23�Mar�2001 (10 posts) Archive Link: "user space web server accelerator support"

Topics: Networking, Web Servers

People: Fabio Riccardi,�Zach Brown,�Erik Mouw,�David S. Miller

Fabio Riccardi announced and requested:

I've been working for a while on a user-space web server accelerator (as opposed to a kernel space accelerator, like TUX). So far I've had very promising results and I can achieve performance (spec) figures comparable to those of TUX.

Although my implementation is entirely sitting in user space, I need some cooperation form the kernel for efficiently forwarding network connections from the accelerator to the full-fledged Apache server.

I've made a little kernel hack (mostly lifted out of the TUX and khttpd code) to forward a live socket connection from an application to another. I'd like to clean this up such that my users don't have to mock with their kernel to get my accelerator to work.

Would it be a major heresy to ask for a new system call?

If so I could still hide my stuff in a kernel module and snatch an unused kernel call for my private use (such as the one allotted for tux). The problem with this is that the kernel only exposes the "right" symbols to the modules if either khttp or ipv6 are compiled as modules.

How could this be fixed?

David S. Miller suggested simply passing the file descriptors to apache over a UNIX domain socket. Fabio replied that, as far as he knew, FDs were private to a given process. But David said that UNIX sockets allowed "file descriptor passing", to allow one process to pass and FD to another. Fabio asked for docs, and Erik Mouw refered him to W. Richard Stevens, "Advanced programming in the UNIX environment", chapter 15.3. Zach Brown also added, "There are some patches in the apache source rpms in http://www.zabbo.net/phhttpd/ that shows how apache can connect to another daemon and get its incoming connections sockets from it."

A couple days later, Fabio said he'd implemented an FD-passing routine, but that now his benchmarks had slowed to a crawl. Zach confirmed that FD-passing would be very slow, and (having forgotten the beginning of the thread) suggested using Tux. Fabio replied that his whole point was to avoid an in-kernel solution.

End Of Thread (tm).

9. Big Slowdown In 2.4.2

20�Mar�2001�-�25�Mar�2001 (35 posts) Archive Link: "Linux 2.4.2 fails to merge mmap areas, 700% slowdown."

Topics: Version Control

People: Serge Orlov,�Linus Torvalds,�David S. Miller,�Kevin Buhr

Serge Orlov reported, "I upgraded one of our computer happily running 2.2.13 kernel to 2.4.2. Everything was OK, but compilation time of our C++ project greatly increased (1.4 times slower). I investigated the issue and found that g++ spends 7 times more time in kernel." He posted some debugging information, and Linus Torvalds replied, "Cool. Somebody actually found a real case. I'll fix the mmap case asap. Its' not hard, I just waited to see if it ever actually triggers. Something like g++ certainly counts as major." He asked if Serge would test out patches, and several folks volunteered. Kevin Buhr was a bit confused about what could be triggering the problem for Serge, and David S. Miller replied, "It is the garbage collector scheme used for memory allocation in gcc >=2.96 that triggers the bad cases seen by Serge."

At one point Linus asked for folks to test 2.4.3-pre7 and see if it made any difference. Kevin Buhr replied:

Under 2.4.3-pre7, I get the following disappointing numbers:

CVS gcc 3.0: Debian gcc 2.95.3: RedHat gcc 2.96:
real 16m10.660s real 7m58.874s real 10m36.368s
user 15m27.900s user 7m23.090s user 10m0.290s
sys 0m48.400s sys 0m40.350s sys 0m40.790s
maps: <20 lines ~10 lines ~10 lines

A huge win for 2.96 and absolutely no benefit whatsoever for 3.0, even though it obviously had a 10-fold effect on maps counts. On the positive side, there was no performance *hit* either.

As a blind "have not looked at relevant kernel source" guess, this looks like a hash scaling problem to me: the hash size works great for ~300 maps and falls apart in a major way at ~3000 maps, presumably when we get multiple hits per hash bin and start walking 10-member lists.

How this translates into a course of action---some combination of keeping your patch, enlarging the hash, and performance tweaking the list-walking---I'm not sure.

Linus discounted the 3.0 results, saying it probably had nothing to do with the mmap size. He said, "The 40 seconds of system time you see is probably mostly something else. It's not as if gcc _only_ does mmap's." Linus and Kevin went back and forth on it a bit, with no real resolution, although Kevin did feel that only the most extreme cases would benefit from the patch.

10. Fix For Loopback Problems In 2.4

20�Mar�2001 (4 posts) Archive Link: "Hang when using loop device"

People: Ville Herva,�Jens Axboe

Someone reported occassional system hang, during big operation on loopback devices under 2.4.1; 2.2.x worked fine. Ville Herva replied, "Jens Axboe fixed this. The fix is merged in 2.4.2ac20 and 2.4.3pre6. The fix will be in 2.4.3. Please search the mailing list archive before asking..."

11. Provoking System Failure

21�Mar�2001 (3 posts) Archive Link: "How to provoke kernel panic"

People: Keith Owens,�Richard B. Johnson

Oliver Antwerpen wanted to test some applications' awareness of system crashes, and asked how to provoke a kernel panic. Keith Owens suggested:

Create fs/example-module.c

#include <linux/config.h>
#include <linux/kernel.h>
#include <linux/module.h>

int init_module(void)
{

printk("module loading\n");
panic("test panic\n");
return 0;
}

Add "obj-m += example-module.o" to fs/Makefile. make modules, insmod fs/example-module.o and watch the bits fly.

But Richard B. Johnson suggested simply, as root, "cp /dev/zero /dev/mem" to get a true crash. He added, "This will demonstrate that most 'crash detector' programs are worthless (including some watchdog timers)."

12. Asymmetric Multiprocessor Support

21�Mar�2001�-�23�Mar�2001 (10 posts) Archive Link: "SMP on assym. x86"

Topics: SMP

People: Linus Torvalds,�Kurt Garloff

Kurt Garloff tried upgrading one of the processors on his dual-processor system, and found that 2.4.2 couldn't really handle it very well. He hacked around for awhile and fixed a bunch of the various problems, and posted his results. Linus Torvalds replied:

This is not really a configuration Linux supports. You can hack it to work in many cases, but I'm generally not inclined to make this a an issue for me because:

So I'm perfectly happy with you fixing it on your machine, but right now I have no incentives to make this a "real" option for a standard kernel.

I retain the right to change my mind, as always. Le Linus e mobile.

Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.