Table Of Contents
|1.||24 Oct 2001 - 2 Nov 2001||(42 posts)||Memory Debugging Tool|
|2.||31 Oct 2001 - 5 Nov 2001||(34 posts)||Linus And Alan Outline Their Future Plans (Wow!)|
|3.||31 Oct 2001 - 3 Nov 2001||(17 posts)||Andrea's VM Code Performs Better Than Rik's|
|4.||1 Nov 2001 - 2 Nov 2001||(5 posts)||Solaris Making Use Of Linux|
|5.||1 Nov 2001 - 2 Nov 2001||(5 posts)||Comparing The 2.2 And 2.4 Virtual Memory Subsystems|
|6.||2 Nov 2001 - 8 Nov 2001||(8 posts)||Bootmem For 2.5|
|7.||3 Nov 2001 - 4 Nov 2001||(4 posts)||Status Of Matrox G550 Framebuffer Support|
|8.||3 Nov 2001 - 7 Nov 2001||(16 posts)||Faster File Creation And Deletion For ext2|
|9.||3 Nov 2001 - 5 Nov 2001||(9 posts)||Regression Testing|
|10.||6 Nov 2001 - 7 Nov 2001||(6 posts)||Status Of ext3|
Mailing List Stats For This Week
We looked at 1672 posts in 7953K.
There were 603 different contributors. 266 posted more than once. 193 posted last week too.
The top posters of the week were:
1. Memory Debugging Tool
24 Oct 2001 - 2 Nov 2001 (42 posts) Archive Link: "xmm2 - monitor Linux MM active/inactive lists graphically"
Topics: Virtual Memory
People: Zlatko Calusic, Andrea Arcangeli, Linus Torvalds
Zlatko Calusic announced a new version of xmm2 (http://linux.inet.hr/) , a tool to monitor active/inactive lists in the MM code. He added:
As Linus' MM lost inactive dirty/clean lists in favour of just one inactive list, the application needed to be modified to support that.
You can still continue to use the older one for kernels <= 2.4.9 and/or Alan's (-ac) kernels, which continued to use older Rik's VM system.
There was a long discussion about an apparent problem Zlatko was seeing with the VM in Linus Torvalds' tree. Linus, Andrea Arcangeli (author of the code) and others piled on the problem, but it turned out that Zlatko's system used the wrong hdparm parameters. There was no discussion of xmm2.
2. Linus And Alan Outline Their Future Plans (Wow!)
31 Oct 2001 - 5 Nov 2001 (34 posts) Archive Link: "2.4.14-pre6"
Topics: FS: ext3, Kernel Release Announcement, OOM Killer, Virtual Memory
People: Linus Torvalds, Michael Peddemors, Alan Cox, Marcelo Tosatti, David Weinehall
Linus Torvalds announced Linux 2.4.14-pre6, remarking, "Incredibly, I didn't get a _single_ bugreport about the fact that I had forgotten to change the version number in pre5. Usually that's everybody's favourite bug.. Is everybody asleep on the lists?" He also said:
The MM has calmed down, but the OOM killer didn't use to work. Now it does, with heurstics that are so incredibly simple that it's almost embarrassing.
And I dare anybody to break those OOM heuristics - either by not triggering when they should, or by triggering too early. You'll get an honourable mention if you can break them and tell me how ("Honourable mention"? Yeah, I'm cheap. What else is new?)
In fact, I'd _really_ like to know of any VM loads that show bad behaviour. If you have a pet peeve about the VM, now is the time to speak up. Because otherwise I think I'm done.
Michael Peddemors suggested, "Lets' let this testing cycle go a little longer before making any changes.. Let developers catch up.." Linus replied:
My not-so-cunning plan is actually to try to figure out the big problems now, then release a reasonable 2.4.14, and then just stop for a while, refusing to take new features.
Then, 2.4.15 would be the point where I start 2.5.x, and where Alan gets to do whatever he wants to do with 2.4.x. Including, of course, just reverting all my and Andrea's VM changes ;)
I'm personally convinced that my tree does the right thing VM-wise, but Alan _will_ be the maintainer, and I'm not going to butt in on his decisions. The last thing I want to be is a micromanaging pointy-haired boss.
(2.5.x will obviously use the new VM regardless, and I actually believe that the new VM simply is better. I think that Alan will see the light eventually, but at the same time I clearly admit that Alan was right on a stability front for the last month or two ;)
Several folks asked for ext3 to be included in 2.4.14, but Michael Peddemors said, "As much as I would like to ext3 get in, NOT IN THIS RELEASE please... Don't put anything else in, until what we got works.. Hit him up on 2.4.15 :)"
In a completely different forum, Alan Cox wrote in his Advogato Diary (http://advogato.org/article/370.html) :
People will have been wondering about the 2.4 stable kernel progression. Various bizarre rumours in Byte seem to have generated a lot of discussion and rumour. Now that the people concerned are all agreed its time to put the entire roadmap out and make it clear.
Linus will be releasing a 2.4.14 and probably a 2.4.15 finishing off the VM stability work and other rough corners. At that point the 2.5 kernel tree will be opened. There is a lot stuff queued for 2.5. It isn't going to be possible or sensible to throw it all into 2.5.0. One of the tasks is to put changes together in the right order.
Marcelo Tosatti will be the head maintainer over the 2.4 stable kernel tree. This is not the giant change it may seem from the outside. The stable kernel management was and is a group effort. Marcelo and many others have been active in 2.2 and 2.4 stabilisation work. I'll be helping Marcelo with advice when he asks it, and working on feeding him the 2.4 relevant bits of the -ac tree.
I will not be dissappearing from the scene, although I might be a little less visible at times. There are various kernel projects I will be working on as well as spending more time concentrating on Red Hat customer related needs. I'm hopeful that spending more time closer to customers will help provide more insight into where 2.5 needs to be going.
David Weinehall did a great job on 2.0.39 when he took over 2.0 from me. I'm very confident that Marcelo will do a great job on 2.4.
(Thanks go to Christophe Barbé for the Advogato link.)
3. Andrea's VM Code Performs Better Than Rik's
31 Oct 2001 - 3 Nov 2001 (17 posts) Archive Link: "graphical swap comparison of aa and rik vm"
Topics: Virtual Memory
People: Ed Sweetman, Rik van Riel, Andrea Arcangeli
Ed Sweetman (known to the list as 'safemode') reported:
In an earlier post i mentioned a way of locking up my vm easily and repeatedly but that has since been fixed in one way or another. I reran the test and took vmstat 1 's of both runnings on a 2.4.14-pre6-preempt kernel and a 2.4.13-ac5-preempt kernel. I began both vmstat's at the same time (about 4 seconds before running each). What i did was run kghostview on a postscript file located here http://safemode.homeip.net/test.ps. It is 224K. kmail was loaded previously in both trials so kdeinit was already loaded as were all libs. After kghostview became responsive, i waited a few seconds (again about 5) and then exited the app.
No other interaction or running programs were present while doing this. I have 771580 KB of ram and 290740 KB of swap.
Now to explain the graphs. The blue is AA's vm. The red is Rik's vm. Rik's vm finished in 66 seconds. AA's vm finished in 52 seconds. Both start at 0 swap usage. Both from clean boots.
Here is the graph http://safemode.homeip.net/vm_swapcomparison.png. It's about 4.6K.
When you look at the graph it goes like this. The left side is 0 seconds, the right side is 66 seconds. bottom is 0KB, top is 290740KB.
I'll leave the actual interpretation of the data of both the graph and raw data up to those who actually know the code.
Neadless to say that while running the test on either box, the entire computer became unresponsive multiple times for extended lengths of times. No OOM was generated on either run.
To explain the better performance of Linus' kernel, Rik van Riel (author of Alan's VM) said, "I think this is because in safemode's test, the swap space gets exhausted. My VM works better when there is lots of swap space available but degrades in the (rare) case where swap space is exhausted. Testing corner cases always gives interesting results ;)" . But Ed replied:
I think the answer of why AA's kernel beat rik's has nothing to do with how much swap rik is using or how much swap is being swapped back in. It has to do with how rik decides what to swap. Apparently the algorithm used by rik to play with memory is taking seriously too much cpu and it leaves little for the actual process to work. Thus AA's less cpu intensive code allows the program to actually run and despite making errors in what to swap-out, the process finishes well before Rik's more intelligent code.
Unfortunately, the trailing columns in my aa vmstat somehow got lost during the paste from terminal buffer to file. This means i'm going to have to redo it all in order to get an accurate measurement to compare system cpu time to the rik vm. But for now i think the rik vm system graph is sufficient. And there are some numbers from the AA vmstat and those alone show a much lower cpu usage than in rik's. MUCH.
I made an overlay of Rik's system ( kernel ) cpu usage on top of the so and si graphs to illustrate this. Bottom being 0% top being 100% usage.
Here we see that after every major write out, there is major kernel cpu usage. This is serious usage, and this is the reason why rik's VM loses the race even though it swapped out and in the right things the first try more often than AA's.
Of course after each major write out in Rik's vm there is a minor read in. These happen to be directly under the cpu spikes so this could be the cause of the cpu usage, perhaps determining where the page is? I dont know enough about what's going on in the code to figure out if the VM does something after writing out that could be using all that cpu or if whenever it needs to read in. Although now that i look at it i'm tending to lean towards some bad code dealing with swap -> ram.
This is truly where the simple vm design conquers the complex. Less cpu being used by the kernel means more by the program, and sometimes the time gained by not using a lot of cpu greatly outweighs the time lost by having to correct mistakes with deciding what gets swapped in and out.
Maybe i'm wrong as to the cause of the kernel cpu usage, but from the numbers i do have from AA's vmstat, they are much higher in Rik's vm than in Andrea's. That and the fact that Rik's vm seems to be doing the right thing whereas Andrea's is having to fix mistakes yet Rik's loses seems to tell you that i'm not wrong in thinking that it's the vm's cpu usage that is the culprit.
Note that this is likely to be a side effect of running completely out of swap, because that means many of the "obvious candidates" of what to swap out cannot be swapped out, meaning we have to scan more pages until we find something which already has swap backing.
Before you draw conclusions like the one above, please test again with more swap.
Ed replied that he'd do another test after work, but that there was no denying that the code written by Rik used more swap to do the same thing than the code in Linus' kernel (written by Andrea Arcangeli). Rik replied:
Uhhh ... this is nothing but a classical speed/size tradeoff.
The fact that under my VM swap space stays reserved for the program on swapin means that if the page isn't dirtied, we can just drop it without having to write it to disk again.
In situations where there is enough swap available, this should be a win (and it has traditionally been a big win).
Andrea's VM always frees swap space on swapin, so even if the process doesn't write to its memory at all, the data still needs to be written out to disk again.
Only in the one corner-case where my VM runs out of swap space and Andrea's VM doesn't yet run out of swap you'll find situations where the tactic used by Andrea's VM has its advantages, but I consider this to be a rare situation.
Ed added more swap and redid his tests, and found that Linus' kernel still performed better, in fact much better. End Of Thread (tm)
4. Solaris Making Use Of Linux
1 Nov 2001 - 2 Nov 2001 (5 posts) Archive Link: "Code from ~2.4.4 going into Solaris 9 Alpha?"
Topics: BSD, Samba
People: Mike Fedyk, Danek Duvall, Chris Ricker
Mike Fedyk noticed on a graph of the history of UNIX (http://perso.wanadoo.fr/levenez/unix/history.html) , a line going from Linux to Solaris 9 Alpha. He asked, "Does anyone know what code they copied, and if they're now making solaris GPL compatible?" Danek Duvall speculated, "That might simply be the inclusion of various "freeware" packages -- shells, gzip, apache, samba, and so forth, not necessarily kernel code. All of those packages come with full source as well, so they should be compliant with the GPL if that's how they happen to be licensed."
Mike had been flamed off-list, and replied to Danek, "I didn't mean to start a flame thread (like someone accused me of doing), it just looked interesting to me, and I don't remember anything in kernel traffic (which I was reading at the time) or on lkml (which I have been reading more recently)... so I figured someone here would know more." Chris Ricker replied in technical terms:
Solaris 9/ia32 includes software called lxrun (actually slip-streamed during Solaris 8, as Sun is so fond of doing for some brain-dead reason) which implements the Linux/ia32 ABI on Solaris/ia32. It's much like the Linux compatibility layer all the *BSDs have these days.
Solaris 9 on both Intel and Sparc also implements more of the Linux (really primarily GNU glibc) APIs. The idea is that Linux apps are now just a recompile away from running on Solaris (assuming they're sane and don't have 32-bit / 64-bit or endian issues to be sorted out), with no portage necessary....
I'd imagine these two features are what the line reflects. No code theft has taken place, and Solaris is definitely not GPL'ed.
5. Comparing The 2.2 And 2.4 Virtual Memory Subsystems
1 Nov 2001 - 2 Nov 2001 (5 posts) Archive Link: "Linux 2.2 and 2.4 VM systems analysed"
Topics: Virtual Memory
People: Derek Glidden, Rik van Riel
Derek Glidden reported:
I've been following the 2.4 VM issues since the early 2.4-pre days. As a "power user" and someone who uses Linux at work, the kernel's stability is of great interest to me. Finally, I got sick of trying to interpret the data from various sources on how well the 2.4 VM systems perform overall and in comparison with each other and other systems. So I ran my own tests against 2.4.12-ac6, 2.4.13, and 2.2.19 and wrote up the results:
"An analysis of three Linux kernel VM systems"
The conclusion in a nutshell is that yes, the 2.4 kernel VM systems still have a few quirks to work out, but overall they are so significantly better than the 2.2 VM that there really is no comparison.
However, this "significantly better" conclusion is for certain high-stress situations where the 2.2 VM apparently fails entirely, while 2.4 chugs along with barely a notice.
For overall end-user experience, 2.2 still "feels" better overall with better interactive responsiveness under a varying set of loads even though 2.4 really is faster at doing the actual work.
Rik van Riel really liked the document, and was very happy that Derek had taken the time and put in the work to do it. There was no other real discussion.
6. Bootmem For 2.5
2 Nov 2001 - 8 Nov 2001 (8 posts) Archive Link: "[RFC] bootmem for 2.5"
People: William Lee Irwin III, Robert Love, Tony Luck
William Lee Irwin III announced:
A number of people have expressed a wish to replace the bitmap-based bootmem allocator with one that tracks ranges explicitly. I have written such a replacement in order to deal with some of the situations I have encountered.
The following patch features space usage proportional only to the number of distinct fragments of memory, tracking available memory at address granularity up until the point of initializing per-page data structures, and the use of segment trees in order to support efficient searches on those rare machines where this is an issue. According to testing, this patch appears to save somewhere between 8KB and 2MB on i386 PC's versus the bitmap-based bootmem allocator.
The following patch has been tested on i386 PC's, IA64 Lions, and IBM IA64 NUMA hardware with sparse memory, and debugged without the help of logic analyzers or in-target probes. I would like to thank the testers of #kernelnewbies (reltuk and asalib) and my co-workers for their help in making this work, and Tony Luck and Jack Steiner for their assistance in profiling the existing bootmem.
I am now especially interested in feedback regarding its design, and also the results of wider testing.
Robert Love was very impressed, and replied, "The patch is without problem on 2.4.13-ac7. Free memory increased by about 100K: free and dmesg both confirm 384292k vs 384196k. This is a P3-733 on an i815 with 384MB. Very nice." He also added, "Note that the patch and UP-APIC do not get along. Some quick debugging with William found the cause. APIC does indeed touch bootmem. The above is thus obviously with CONFIG_X86_UP_APIC unset." William was thrilled that Robert had tested the patch, and promised to investigate the problem. A couple days later he posted again to the list, saying he'd managed to reproduce the bug; and posted a new patch. Robert tried the patch, reporting, "No problem on any system -- no difference, in fact, except the gain in total system memory. Most importantly, however, the new design is quite nice. :>"
7. Status Of Matrox G550 Framebuffer Support
3 Nov 2001 - 4 Nov 2001 (4 posts) Archive Link: "Support for Matrox G550 framebuffer?"
People: Petr Vandrovec, Alan Cox, Dave Jones
Jordan Breeding asked if there was or would be support for the Matrox G550 framebuffer. Dave Jones replied that Petr Vandrovec had sent a pretty good patch into the linuxfb-devel mailing list a few weeks before; Petr also replied to Jordan, saying, "I sent patches to Alan on Friday. I do not know whether he'll apply them or not. But for using G550 you must download matroxset from ftp://platan.vc.cvut.cz/pub/linux/matrox-latest, as if you are connecting VGA monitor to card, you are on 90% using secondary output." And Alan Cox confirmed, "They are in my working tree and will be in the next -ac." End Of Thread (tm)
8. Faster File Creation And Deletion For ext2
3 Nov 2001 - 7 Nov 2001 (16 posts) Archive Link: "Ext2 directory index, updated"
Topics: Big Memory Support, FS: ext2
People: Daniel Phillips, Christian Laursen
Daniel Phillips announced a new version of his patch to allow faster file creation and deletion. A performance graph (http://people.nl.linux.org/~phillips/htree/performance.png) is also available. He warned that the code should only be used on test machines, and said:
This update mainly fixes a bug, a one-in-a-million occurance on an untested code path. This bug resulted in rm -rf deleting all files but one from a million-file directory. I believe that's the last untested code path, and otherwise it's been very stable.
I didn't expect highmem to work properly, and it didn't. It's on my to-do list, but for now highmem has to be off or you will oops on boot.
I elaborated the dx_show_buckets debug output to show dump the full index tree instead of just one level. This function now serves as a capsule summary of the index tree structure, and as you can see, it's simple.
I've done quite a bit more testing, including stress testing on a real machine and I find that everything works quite comfortably up to about 2 million files, turning in an average time of about 50 microseconds/create and 300 microseconds/delete (1 GHz PIII). In the 4 million file range things go pear-shaped, which I believe is not due to the index patch, but to rd. The runs do complete, but require exponentially more time, with cpu 98% idle and block throughput in the 300/second range. I'll look into that more later.
I did run into some bad mm behavior on 2.4.13. The icache seems to be too severely throttled, resulting in delete performance being less than it should be. I also find I am rarely unable to create a million file test run on uml (2.4.13) without oom-ing. In my experience, such problems are not due to uml, but to the kernel's memory manager. These issues may have been addressed in recent pre-patch kernels, but it seems there is a still some room for improvement in mm stability.
The patch is available at:
Christian Laursen tried the patch, never having examined earlier versions before, and reported:
I must say, that the first impression is very good indeed.
I took a real world directory (my linux-kernel MH folder containing roughly 115000 files) and did a 'du -s' on it.
Without the patch it took a little more than 20 minutes to complete.
With the patch, it took less than 20 seconds. (And that was inside uml)
But he added, "However, when I accidentally killed the uml, it left me with an unclean filesystem which fsck refuses to touch because it has unsupported features. Even the latest version does this." He asked if there was a patch to fix this, and Daniel replied:
Ted Ts'o volunteered to do that but I failed to support him with proper documentation so it hasn't been done yet.
However, it's very easy to get around this, just comment out the part of the patch that sets the incompat flag. Then the indexed directories will magically turn back into normal directories the next time you write to them (it would be very good to give this feature a real-life test :-)
9. Regression Testing
3 Nov 2001 - 5 Nov 2001 (9 posts) Archive Link: "Regression testing of 2.4.x before release?"
People: Ted Deppner, Dan Kegel, Alan Cox, Linus Torvalds
Dan Kegel felt sure that Alan Cox stress-tested his kernels more than Linus Torvalds tested his. He suggested Linus adopt Alan's stress-tests before putting out releases. Ted Deppner replied, "It would be a better idea if everyone (including you and me) stress test those pre and final kernels." He added in a later post, "Linus and others have said in the past though, that YOUR usage is the testing they want... So it's best if you install the kernel and use it normally, whatever you'd use a kernel to do." At one point Dan said:
I'm not saying Linus should do the testing.
It's good that Linus is asking others to test with cerberus, as he did in http://marc.theaimsgroup.com/?l=linux-kernel&m=100451768023436&w=2
It would be even better if Linus came out and stated that he would refuse to call a kernel final if there is an outstanding report of it failing an agreed-upon set of stress tests.
And it would be *even better* if http://osdl.org/stp/ were used to do stress testing in a nice, automated way on 1, 4, 8, and 16-cpu machines on release candidates.
Almost none of this requires any work by Linus. All Linus has to do is say "The 2.4.x kernels will pass stress tests before release", and recruit someone to run his kernels through OSDL's STP in a timely manner.
10. Status Of ext3
6 Nov 2001 - 7 Nov 2001 (6 posts) Archive Link: "ext3-0.9.15 against linux-2.4.14"
Topics: Access Control Lists, Extended Attributes, FS: ext3, SMP
People: Andrew Morton, Steven N. Hirsch
Andrew Morton announced:
Download details and documentation are at
Changes since ext3-0.9.13 (which was against linux-2.4.13):
For a long time, the ext3 patch has used a semaphore in the core kernel to prevent concurrent pagein and truncate of the same file. This was to prevent a race wherein the paging-in task would wake up after the truncate and would instantiate a page in the process's page tables which had attached buffers. This leads to a BUG() if the swapout code tries to swap the page out.
This semaphore has been removed. The swapout code has been altered to simply detect and ignore these pages.
This is an incredibly obscure and hard-to-hit situation. The testcase which used to trigger it can no longer do so. So if anyone sees the message "try_to_swap_out: page has buffers!", please shout out.
There are no plans to remove this semaphore from -ac kernels, unless Alan wants it that way.
Steven N. Hirsch reported that he'd been seeing thousands of Andrew's "hard-to-hit" problem, but just hadn't realized it was a problem with ext3. Andrew asked for more details about Steven's system and setup, but there was not much more discussion.
Sharon And Joy
Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.