Kernel Traffic #136 For 8�Oct�2001

By Zack Brown

linux-kernel FAQ ( | subscribe to linux-kernel ( | linux-kernel Archives ( | ( | LxR Kernel Source Browser ( | All Kernels ( | Kernel Ports ( | Kernel Docs ( | Gary's Encyclopedia: Linux Kernel ( | #kernelnewbies (

Table Of Contents


For those of you interested in understanding the current situation in the Middle East, I'd like to recommend some stuff I've been reading lately. The first is a short, general history, The Arabs In History ( by Bernard Lewis. Without overwhelming the reader with too much detail or analysis, it does a very good (and enjoyable) job of summarizing the social and political history of the Middle East. I chose to read it first so that I'd be able to follow some of the more profound analysis that's been written about this stuff, such as the work of Edward Said (pronounced "Sah-eed"), a Palestinian intellectual. I'm currently reading The Question Of Palestine ( by him. It's an incredible book, put together with such care for the subject. I recommend it very highly. To get a sense of his style, see this essay ( on the September 11 attacks.

Over the past few weeks I've heard many opinions about why the attacks occurred, from the unlikely to the ludicrous. One very friendly person told me that Arab people have no sense of humor as a result of their harsh desert life, and that it is "our" obligation as decent Christian folk to go there and convert the Arabs to Christianity, so that they can share in our blessings and learn how to laugh more.

I don't consider myself to have a much deeper understanding of the situation than that nice person. My misconceptions may feel more familiar to me, and hence, more justified, but they are still almost entirely uninformed. For me, learning about the history behind these recent attacks is crucial to my forming a coherent opinion about them. What do intelligent Palestinian and other Arab people think about what is going on now? This is what I want to understand.

Mailing List Stats For This Week

We looked at 1422 posts in 6050K.

There were 529 different contributors. 229 posted more than once. 192 posted last week too.

The top posters of the week were:

1. Naming Core Dumps

25�Sep�2001�-�28�Sep�2001 (8 posts) Archive Link: "[PATCH] core file naming option"

Topics: BSD, FS: sysfs

People: Eli Carter,�Bill Davidsen,�Alan Cox,�Don Dugger

Eli Carter announced a patch to allow core dumps to be given semi-unique names. As he put it, "when the sky is falling, it's nice to have more places for it to land." His patch was against 2.2, but he said he was willing to port it up to 2.4 if there was interest. Bill Davidsen replied:

While you're adding this feature, and it seems others are adding similar things, it is *highly* desirable to allow the build to put all the dumps in one place of desired (my first thought is /var/core) so that if you get a lot you won't run the system out of disk.

The directory name could be set in /proc/sys/coredir (or somesuch) with an initial value of "." of course.

Other than that I like the idea, although process "name" could get a lot of clashes on threads, and pid gets reused. There may be a better idea, but most of mine are cumbersome. This would really simplify certain kinds of dump analysis.

Eli replied that he didn't have time to implement all those features, though he could see why they'd be useful. Elsewhere, Padraig Brady suggested using 'core.PID' instead of 'core.processname' as Eli had it. He even suggested allowing 'core.PID' for each thread of a process, and Alan Cox replied, "The -ac tree and latest -linus can use for each thread already." Eli checked this out, but found that the 2.2 series still did not have the feature. He asked if there were plans to port it back from 2.4, and Don Dugger said, "Having the 2.2.x series create `' is like a 2 line change to `fs/binfmt_elf.c', just increase the size of the array that holds the file name and `sprintf' the pid into it. I've got a patch for the 2.2.x series that dumps core for all threads and puts them in `' files." Eli remarked, "Well, when I asked Alan about it, he said "Doing it in 2.2 is incredibly hard for internal locking reasons"... I'm not ready to tackle that. If you have a patch that does it correctly, submit it to Alan. I'd still like to have the option I submitted as part of 2.2 *shrug*... it's just bringing back functionality that was in earlier versions of Linux as a compile option. (From what I understand, this is similar to something done in the BSDs at some time in the past... but being a young whippersnapper, I don't really know.)"

2. Benchmarks And Bug Reports In The New 2.4 VM

26�Sep�2001�-�28�Sep�2001 (24 posts) Archive Link: "VM in 2.4.10(+tweaks) vs. 2.4.9-ac14/15(+stuff)"

Topics: Virtual Memory

People: Craig Kulesa,�Andrea Arcangeli,�Rik van Riel,�Linus Torvalds

Craig Kulesa reported:

As requested, here are a number of tests of the latest VM patches. Tests are described in a previous post, archived here:


2.4.10 performance is great compared to 2.4.[7-9], but these tests still seem to point out some room for improvement in the 2.4.10 VM tree. 2.4.10 and 2.4.10(+00_vm-tweaks-1) performed similarly. The vm-tweaks patch improved the swap smoothness, but the number of pages swapped out didn't change measurably, nor did the large number of swap-ins. Clogging the system with dirty pages via 'dd' still causes XMMS to skip badly.

Let's push the aging/list-order code more by driving the system a bit harder in step d), namely adding mozilla to the common user application test. We will also stream mp3 audio throughout the entire test.

48 sec StarOffice load time
28 sec 2560x2560 GIMP image rotation
82400 KB swapped out, 92148 KB swapped back in

2.4.9-ac14 + aging
33 sec StarOffice load time
25 sec GIMP image rotation
30072 KB swapped out, 22252 KB swapped back in

2.4.9-ac15 + aging + launder
33 sec StarOffice load time
24 sec GIMP image rotation
57556 KB swapped out, 25900 KB swapped back in

'vmstat 1' sessions for these three cases are available at:

2.4.10+ is clearly working a LOT harder to keep dentry and inode caches in memory, and is swapping out harder to compensate. The ac14/ac15 tree frees those caches more freely, and don't page application working sets out so readily.

Let's test this statement by not pre-filling the inode and dentry caches with 'slocate' and performing the same test:

26 sec StarOffice load time
24 sec GIMP image rotation
48332 KB swapped out, 33521 KB swapped back in

2.4.9-ac14 + aging
32 sec StarOffice load time
26 sec GIMP image rotation
37392 KB swapped out, 11952 KB swapped back in

2.4.9-ac15 + aging + launder
32 sec StarOffice load time
22 second GIMP image rotation
23884 KB swapped out, 10828 KB swapped back in

2.4.10 does much better this time; in particular the StarOffice loading that was so plagued by swapouts, pressured by dentry/inode caching last time, went smoothly. But there's still more paging than with 2.4.9-ac1[4-5].

Let's try one more aging/list-order experiment. Instead of creating a 2560x2560 GIMP image first, then loading StarOffice and many other applications after (to start swapping, and cause GIMP pages to be candidates for reaping) -- this time let's load StarOffice first and then create the GIMP image. This should keep the GIMP image at a 'younger' age and presumably shouldn't page back into memory (rotation should be faster). StarOffice may swap itself entirely out however.

25 sec StarOffice load time
29 sec GIMP image rotation
64427 KB swapped out, 77422 KB swapped back in

2.4.9-ac14 + aging
30 sec StarOffice load time
24 sec GIMP image rotation
22147 KB swapped out, 8922 swapped back in

2.4.9-ac15 + aging + launder
31 sec StarOffice load time
21 second GIMP image rotation
17204 KB swapped out, 8224 swapped back in

The 2.4.10 behavior surprised me. The GIMP pages are younger in memory, yet the rotation was slowed by swapin & swapout activity -- slower than before. Plus more StarOffice pages were swapped out, so it had to be paged back in order to close the application. I'm puzzled. The ac14/ac15 behavior was closer to what I expected; the GIMP pages were young and unswapped, only the earliest StarOffice pages had to be recalled.

These are samples of rather 'ordinary' loads which 2.4.10 needs some work handling; the ac15 tree is doing a better job with this particular set right now (ac15 tree also doesn't skip XMMS with the creation of lots of dirty pages via 'dd'). But all three kernels tested kept the user interface relatively responsive, which is an improvement over previous 2.4 releases. Very cool.

A note on page_launder(). ac14 has the smoothest swapping, with small chunks laundered at a time. ac14+aging and ac15+aging+launder both swap out huge (10-20 MB) chunks at a time. Admittedly, the user interface is responsive and XMMS doesn't skip a beat, but most of the 60 MB of actual swapout in the first test in ac15+stuff came from only THREE lines of 'vmstat 1' output. Otherwise there was no swapout activity.

In response to the statement that 2.4.10 was swapping harder to compensate for keeping dentry and inode caches in memory, Andrea Arcangeli replied, "2.4.10 is swapping out more also because I don't keep track of which pages are just uptodate on the swap space. This will be fixed as soon as I teach get_swap_page to collect away from the swapcache mapped exclusive swap pages." Rik van Riel suggested that try_to_swap_out() would be an easier place to do it, and Andrea replied, "Of course that's a possibility but then we'd have to duplicate it in all other get_swap_page callers, see? And I think it much better fits hided in get_swap_page: the semantics of get_swap_page() are "give to the caller a newly allocated swap entry". So IMHO it is its own business to discard our "optimizations" to generate a free swap entry in case all swap was just allocated."

At this point, Robert Macaulay reported a lockup with the new VM. After some traces and hunting around, Andrea felt the bug was not with his code but with the NOHIGHIO logic. He posted a patch to fix it, but Linus Torvalds found races. He posted a patch of his own to illustrate his idea. Andrea agreed it was a fix, though less fine-grained than his own. Rik remarked at this point, "I'd consider that a feature. Undocumented subtle stuff tends to break within 6 months, sometimes even due to changes made by the same person who did the original subtle trick." There was a bit more discussion, and the thread petered out.

3. Some VM Benchmarks

27�Sep�2001�-�29�Sep�2001 (10 posts) Archive Link: "[BENCH] Problems with IO throughput and fairness with 2.4.10 and 2.4.9-ac15"

Topics: Disks: SCSI, Virtual Memory

People: Robert Cohen,�Robert Love

Robert Cohen announced:

Given the recent flurry of changes in the Linux kernel VM subsystems I decided to do a bit of benchmarking. The benchmark is a test of file server performance. I originally did this test about a year ago with fairly dismal results, so I thought I'd see how much things had improved.

The good news, things have improved. The bad news, they're still not good.

The test consists a linux server acting a file server (using netatalk) and 5 macintosh clients.The clients each write a 30 Meg file and read it back. Each client repeats this 10 times.Total amount of IO in the test 1.5 Gigs written, 1.5 Gigs read.

The tests were done with the following kernels

2.4.10: stock 2.4.10 kernel
2.4.10-aa1: 2.4.10 with Andreas patch aa1 including his vm-tweaks-1
2.4.10-p: 2.4.10 with Robert Loves preempt patch
2.4.9-ac15: Alans latest
2.4.9-ac15-al: 2.4.9-ac15 with Riks Aging+Launder patch
2.4.9-ac15 didnt fare too well, but Riks patch resolved these problems so I will leave 2.4.9-ac15 out of the discussion.

The hardware was a UP P-II 266 with 256 Megs of memory using SCSI disks on a Adaptec wide controller. The clients and server were all connected to a 100 Mbit switch. The hardware is nothing special, but disks and LAN are all capable of pushing 10 MB/s of bandwidth.

In the test, the clients are each accessing 30 Meg files. With 5 clients, thats a file working set of 150 Megs of disk space being accessed. With 256 Megs of memory, all the files can fit in memory. I don't consider this to be a realistic test of file server behaviour since if all your files on a file server can fit in memory you bought too much memory :-).

So for all the tests, the file server memory was limited to 128 Megs via LILO except for a baseline test with 256 Megs.

The features of a file server that I consider important are obviously file serving througput. But also fairness in that all clients should get an equal share of the bandwidth. So for the tests, I report the time that the last client finishes the run which indicates total throughput, and the time the first client finishes which ideally should be not too much before the last client.

Summary of the results

In the baseline test with 256 Megs of memory, all the kernels performed flawlessly. Close to 10 MB/s of thoughput was achieved evenly spread between the clients.

In the real test with 128 Megs of memory, things didnt go as well. All the kernels performed similarly but none were satisfactory. The problem I saw was that all the clients would start out getting fairly bad throughput of only a few MB/sec total amongst all the machines. This is accompanied by heavy seeking of the disk (based on the sound). Then one of the clients would "get in the groove". The good client gets the full 10 MB/s of bandwidth and the rest are completely starved. The good client zooms through to the finish with the rest of the clients only just started. Once the good client finished, the disks seek madly for a while with poor throughput until another client "gets in the groove". Once you are down to 2 or 3 clients left, things settle down because the files all fit in memory again.

Overall, the total throughput is not that bad, but the fact that it achieves this by starving clients to let one client at a time proceed is completely unacceptable for a file server.

Note: this is not an accurate benchmark in that the run times are not highly repeatable. This means it can't be used for fine tuning kernels. But at the moment, I am not concerned with fine tuning but a huge gaping hole in linux file serving performance. And its probably true that the non repeatability indicates a problem in itself. With a well tuned kernel, results should be much more repeatable.

Detailed result

Here are the timing runs for each kernel. Times are Minutes:seconds. I did two runs for each. Vmstat 5 outputs are available at But none of the vmstat output shows any obvious problems. None of the kernels used much swap. And I didnt see any problems with daemons like kswapd chewing time.

Baseline run with 256 Megs

Run 1     First finished 4:05       Last finished: 4:18

Notes: this indicates best case performance

Run 1     First finished 2:15       Last finished: 5:36
Run 2     First finished 1:41       Last finished: 6:36

Run 1     First finished 3:38       Last finished: 8:40
Run 2     First finished 1:35       Last finished: 7:07

Notes: slightly worse than straight 2.4.10

Run 1     First finished 1:39       Last finished: 8:33
Run 2     First finished 1:46       Last finished: 6:10

Notes: no better than 2.4.10, of course the preempt kernel is not advertised as a server OS but since the problems observed are primarily fairness problems, I hoped it might help.

Run 1     First finished 2:00       Last finished: 5:30
Run 2     First finished 1:45       Last finished: 5:07

Notes: this has slightly better behaviour than 2.4.10 in that 2 clients tend to "get in the groove" at a time and finish early and then another 2 etc.


In the baseline test with 256 Megs, since all the files fit in page cache, there is no reading at all. Only writing. The VM seems to handle this flawlessly.

In the 128 Meg tests, reads start happening as well as writes since things get flushed out of the page cache. The VM doesnt cope with this as well. The symptom of heavy seeking with poor throughput that is seen in this test I associate with poor elevator performance. If the elevator doesnt group requests enough you get disk behaviour like "small read, seek, small read, seek" instead of grouping things into large reads or multiple reads between seeks.

The problem where one client gets all the bandwidth has to be some kind of livelock. Normally I might suspect that the locked out process have been swapped out, but in this case no swap is being used. I suspose their process pages could have been flushed to make space for page cache pages. But this would show up in an incease page cache size in vmstat. Which doesnt seem to be the case.

Ironically I believe this is associated with the elevator sorting requests too aggressively. All the file data for the processes that are locked out must be flushed out of page cache, and the locked process can't get enough reads scheduled to make any progress. Disk operations are coming in for the "good" process fast enough to keep the disk busy, these are sorted to the top by the elevator since they are near the current head position. And noone else gets to make any progress.

It has been suggested that the problems might be specific to netatalk. However I have been unable to find anything that would indicates that netatalk is doing anything odd. Stracing the file server processes shows that they are just doing 8k reads and writes. The files are not opened O_SYNC and the file server process arent doing any fsync calls. This is supported by the fact that the performance is fine with 256 Megs of memory.

I have been unable to find any non networked test that demonstates the same problems. Tests such as 5 simultaneous bonnie runs or a tiotest with 5 threads that are superficially doing the same things don't see the same problems.

What I believe is the cause is that since we have 5 clients fighting for network bandwidth, the packets from each client are coming in interleaved. So the granularity of operations that the server does is very fine. In a local test such as 5 bonnies, each process gets to have a full time slice accessing its file before the next file is accessed. Which leads to a much greater granularity. So I supposed a modified version of tiotest that does a sched_yeild after each read or write might see the same problems. But I havent tested this theory.

4. When Coders Crack: Status Of 0.01

27�Sep�2001�-�1�Oct�2001 (13 posts) Archive Link: "[PATCH] Linux 0.01 disk lockup"

Topics: Networking

People: Mikulas Patocka,�Arnaldo Carvalho de Melo,�Paul Gortmaker,�Aaron Tiensivu,�Richard Gooch,�Rob Landley,�Alan Cox,�Linus Torvalds

Mikulas Patocka reported from a state of deep psychosis:

Linux 0.01 has a bug in disk request sorting - when interrupt happens while sorting is active, the interrupt routine won't clear do_hd - thus the disk will stay locked up forever.

Function add_request also lacks memory barriers - the compiler could reorder writes to variable sorting and writes to request queue - producing race conditions. Because gcc 1.40 does not have __asm__("":::"memory"), I had to use dummy function call as a memory barrier.

Mikulas' unfortunate affliction appears to have been contagious. Arnaldo Carvalho de Melo said, "Fantastic! who is the maintainer for the 0.x kernel series these days? I thought that 2.0 was Dave W., 2.2 was Alan, 2.4 Linus, so now we have to find people for 1.2 and finally get 1.2.14 released, man, how I wanted one with the dynamic PPP code in back in those days... 8)"

Paul Gortmaker replied:

Well, IIRC, Alan and DaveM were essentially 1.2.x maintainers with various -ac and "ISS" patches (bonus points if you can remember what ISS stood for). I've probably got some of those 1.2.13 patches around somewhere...

As for 1.0.9, at one point some years ago I had updated it (cf. linux-lite) to compile with "new" gcc-2.7.2 when RAM was major $$$ - I don't imagine it has been touched since.

$ ls -l date

-rwxr-xr-x   1 root     root        13624 Sep  4  1992 date
$ ldd ./date
        /lib/ (4.0)
$ ./date
Thu Sep 27 15:58:19 EDT 2001

Wheee! :) Now if I just had a Decwriter for a serial console...

Aaron Tiensivu replied regarding the "ISS" challenge, "Internet Shit Slinger if I remember right. :) There was also 1.2.14-LMP, the Linux Maintenance Project.. it even had a mailing list."

Fortunately, Richard Gooch managed to maintain an immunity, remarking soberly, "Er, why bother to fix bugs in such an ancient kernel, rather than upgrading to a more modern kernel (like 0.98:-)? It's like finding a bug in 2.3.30 and fixing it rather than grabbing 2.4.10 and seeing if the problem persists." Mikulas' reply was, "Well - why not? The disk interrupt locking algorithm in 0.01 is beautiful (except for the bug - but it can be fixed). It's something you don't see in 2.4.10 with __cli, __sti, __save_flags, __restore_flags everywhere. So why not to post a bug report and patch for 10th anniversary of Linux?"

Elsewhere, Linus Torvalds offered to let Mikulas be the official maintainer of the 0.01.xx series, but Mikulas, in a moment of sanity, turned him down. He said, "It would be cool to have linux-0.01 distribution. I started to use linux in 2.0 times, so I'm probably not the right person to maintain it. I don't even know where to get programs for it and I doubt it would work on my 4G disk." Rob Landley replied

You might want to read the mailing list entries from 1991 and early 1992:

I've put together a summary of some of the more interesting early posts from 1991 and early 1992 for the computer history book I'm writing...

Alan Cox added:

The late 1993->95 list is archived at

I don't currently know if the rest of 1992->late 1993 exists

And Rob added:

The kclug archives claim to run until the second week of october 1993. I just haven't sorted through them that far yet.

Please don't beat on them TOO hard, I think they're on an ISDN line or something...

(The reason for limiting the search to mid 1992 is that by they they were already up to 0.95, so 0.01 was no longer relevant...)

5. Status Of NFS And TCP In 2.4

27�Sep�2001�-�3�Oct�2001 (7 posts) Archive Link: "status of nfs and tcp with 2.4"

Topics: FS: NFS, Networking

People: James D Strandboge,�Trond Myklebust

James D Strandboge asked, "What is the status of tcp and nfs with the 2.4 kernel? The sourceforge site (regarding this) has not changed for a while and the NFS FAQ at sourceforge simply states: "nfsv3 over tcp does not work - the code for 2.4.x is as yet to be merged". What progress is being made toward this end?" Trond Myklebust replied, "None: AFAIK nobody has yet written any code that works for the server. The client works though..." James asked how involved it would be to write the TCP code since the UDP code had already been written. Trond replied:

The biggest problem is to prevent the TCP server hogging all the threads when a client gets congested.

With the UDP code, we use non-blocking I/O and simply drop all replies that don't get through. For TCP dropping replies is not acceptable as the client will only resend requests every ~60seconds. Currently, the code therefore uses blocking I/O something which means that if the socket blocks, you run out of free nfsd threads...

There are 2 possible strategies:

  1. Allocate 1 thread per TCP connection
  2. Use non-blocking I/O, but allow TCP connections to defer sending the reply until the socket is available (and allow the thread to service other requests while the socket is busy).

I started work on (2) last autumn, but I haven't had time to get much done since then. It's on my list of priorities for 2.5.x though, so if nobody else wants to get their hands dirty I will get back to it...

James asked for Trond's patch for his work on the second strategy, and Trond gave a link to, explaining, "It's a patch against linux-2.4.0-test6 and is basically at the 'toy' stage. Definitely nowhere near ready for release. IIRC though it did actually run fairly reliably." They had a brief technical discussion to clarify some of the ideas, and the thread ended.

6. Work Still Being Done On The Old VM In The -ac Tree

27�Sep�2001�-�1�Oct�2001 (20 posts) Archive Link: "2.4.9-ac16 good perfomer?"

Topics: Virtual Memory

People: Thomas Hood,�Rik van Riel,�Bill Davidsen

Thomas Hood reported, "Either 2.4.9-ac16 has much improved VM performance over previous 2.4 kernels (under moderate load, at least), or someone sneaked in to my apartment last night and upgraded my machine while I was asleep. I'm leaning toward the latter explanation." Rik van Riel replied, "Now that the -ac VM was stable for a few weeks, I thought it might be time to sneak in some big performance changes, finally. They seem to work ;)" Bill Davidsen reported some slowness on low-memory machines, and there was a bit of technical discussion to try to root it out.

7. More Developer Backlash For Invasive 2.4 Changes

28�Sep�2001�-�4�Oct�2001 (16 posts) Archive Link: "kernel changes"

Topics: Networking, Virtual Memory

People: Pavel Zaitsev,�Alan Cox,�Andrew Ebling,�Dan Maas

Pavel Zaitsev complained, "I wonder wether there would be reversion to old process where kernel source will be solidified before starting development branch. Sysadmins where I work are uneasy to move to linux from solaris, because of erratic changes to the kernel such as replacement of the hardware monitoring code, rewriting of network card drivers - all of which broke some or other software, or simply did not work. I myself update kernel as time permits, usually every 0.0.2+ I have spent nearly two days tracing problem of my network, that ended up in source code of the driver that have been radically changed for "better". Now I don't trust 2.4 line kernel to work *at all*." Alan Cox replied, "You certainly aren't the only one. 2.4.10 both really alarms me and doesn't survive over night on my test box." Andrew Ebling replied, "It would be better to be premature starting 2.5.x than to risk damaging the reputation of Linux by pretending an unproven kernel is stable. Lets start 2.5.x and let Alan sort 2.4.x out."

At one point in the discussion, Dan Maas said:

I consider it extremely embarrassing that fundamental drivers like aic7xxx, emu10k1, tulip, etc. are breaking regularly in the mainline kernels. I haven't had any trouble with things like this in Windows for several years now... Sure the Windows drivers are probably a few percent slower, but as Nathan Myers once wrote, "It is meaningless to compare the efficiency of a running system against one which might have done some operations faster if it had not crashed."

I think we all owe major thanks to Alan Cox, who does his best to keep the house in order amidst the chaos of kernel development (kudos to Mr. Cox for holding on to Rik's VM design long enough for it to stabilize!)

8. Status Of ext3 And VM In -ac Kernels

28�Sep�2001�-�29�Sep�2001 (11 posts) Archive Link: "Linux 2.4.9-ac17"

Topics: FS: ext3, Virtual Memory

People: Alan Cox,�Hristo Grigorov,�John Jasen,�Mike Fedyk

Alan Cox announce 2.4.9-ac17, and Mike Fedyk asked approximately when ext3 would be merged. Alan replied, "When the ext3 folk ask me to merge it." Hristo Grigorov said, "What about if we, the regular users ask you yo merge it?" There was no reply to this, but in the past Alan has been very clear on the idea that he will only merge in patches when the developers feel they're ready. The point perhaps being that at any given moment the code is in a nontrivial state, and the developers are in the best position to know which patch to submit. Simply dumping the latest version into the -ac kernel could cause many bugs.

Elsewhere, John Jasen asked, "You (-ac) and linus (2.4.10 and eventual .11-pre) are split on the VM from Rik and Andreas, correct?" And Mike Fedyk asked, "which VM will be in 2.4.10-ac1?" Alan replied, "Rik's vm and also none of the horrible page cache/block device hackery that belongs in 2.5. I actually use the trees I release and I want to keep my machines working"

9. Success And Problems With New VM

29�Sep�2001 (6 posts) Archive Link: "2.4.10 VM, active cache pages, and OOM"

Topics: Executable File Format, Virtual Memory

People: Tobias Ringstrom,�Linus Torvalds,�Andrea Arcangeli

Tobias Ringstrom reported good success in general with the new VM. He said, "the 2.4.10 VM works great for my desktop and home server, much better than previous versions. I have not tried Alan's kernels." However, he'd managed to isolate a small program that could cause the system to become very unresponsive, go out of memory, and start killing processes. He said:

it is illustrated by the following very simple program:

#include <unistd.h> int main() {
char buf[512]; while (read(0, buf, sizeof(buf)) == sizeof(buf)) �
return 0;

The program should be reading a block device, but a big file probably does the trick as well.

./a.out < /dev/hde1

When the program is running, all cached pages pop up in the active list, and when the memory is full of active pages, the computer starts to page out stuff, becomes VERY unresponsive, and after half a minute or so it goes OOM and starts killing processes. There are lots and lots of free swap at this time. I also get a bunch of 0-order allocation failures in the log.

Linus Torvalds replied, "if you get into a situation with _many_ more active pages than inactive, the plain 2.4.10 VM doesn't age the active list nearly fast enough." He added that the problem had been fixed in Andrea Arcangeli's latest patches. They discussed some other ideas in that area, and Tobias said he'd try out Andrea's patches.

Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.