Kernel Traffic #21 For 3 Jun 1999

By Zack Brown

Table Of Contents

Introduction

This week I'd like to introduce you to the Kernel Cousins (../index.html) project, a conglomeration of KT-type newsletters covering other Open Source projects. Since the whole KT/KC project is itself Open Source (GPLed), there are some interesting directions this can go.

Linuxcare (http://www.linuxcare.com) developed the Kernel Cousins idea, and is hiring me to help implement it. Once I get fully connected with them, you should expect a number of important Kernel Cousins to be added, covering key projects like Gnome, KDE, Apache, and others.

Right now there's only one Kernel Cousin out there (KC debian-hurd (../KC/debian-hurd/index.html) , done by me). Hopefully soon, that and the other Linuxcare KCs will be joined by independent contributors, and Kernel Cousins will become a true window into the entire Open Source development world.

In regular KT news, the quotes pages are now (finally!) fully uptodate, and the new Linus Torvalds interview at FatBrain.Com is in the interviews page.

Mailing List Stats For This Week

We looked at 632 posts in 2791K.

There were 324 different contributors. 115 posted more than once. 133 posted last week too.

The top posters of the week were:

1. Which Distribution Does Linus Use?

14 May 1999 - 30 May 1999 (27 posts) Archive Link: "CD-RW"

Topics: BSD: FreeBSD, POSIX

People: Linus TorvaldsAlan CoxDavid LuyerAlex BuellMike A. HarrisHorst von Brand

For all the Linophiles out there:

Someone asked about hardware support, and was also curious which distribution Linus Torvalds actually used. Mike A. Harris seemed to remember it was RedHat, probably highly customized. Linus explained, "Actually, I use a pretty much out-of-the-box SuSE install at home, and a RedHat install at work. Basically the only thing I ever upgrade is my kernel (surprise, surprise), and the editor I use (microemacs - not included in any of the distributions, but obviously the best editor out there (tm))."

Alan Cox pointed out, "uemacs is unfortunately under a license that stops CD vendors including it - great pity," and David Luyer did some research, saying, "The latest Unix uemacs I've seen is 3.10e, which is used quite extensively by people UWA (everyone but me, it seems). I've seen references to later versions around (eg, in the pico source I seem to recall; pico is based on microemacs 3.10 but I think I saw references to bits from later versions in there)."

A few days later, he said, "I actually did find v4.00 of MicroEmacs for all operating systems, the developer has taken a usenet-developed-not-explicitly-licensed program and turned it semi-commercial (charge for non-personal use). And it has some useful additions (multi-level undo for example). The source is particularly poorly set up (if you want all the features, you have to use the FreeBSD makefile and set options like GCC and UNIX when it says to only set one of them, that kind of thing), but it seems to work (as far as I can tell as a non-EMACS-user, and as far as the MicroEmacs users here have told me so far)." He added a pointer to the sources (http://members.xoom.com/uemacs/nojavascript.html) .

Linus replied that he was using his own customized version (ftp://ftp.kernel.org/pub/software/editors/uemacs/) , which he'd make available on ftp.kernel.org. He added, "I do _not_ intend to claim that this is a particularly good version of uemacs - it's just the one I use, and it has seen some development that makes it better than some other versions I have seen. Caveat emptor."

Alex Buell posted a patch to bring Linus' version more into POSIX compliance. Horst von Brand offered a simpler modification to accomplish the same thing, but Alan found a security hole. Thus a new code fork is born.

2. XFS Going Open Source

20 May 1999 - 29 May 1999 (5 posts) Archive Link: "[OT] SGI to OpenSource XFS"

Topics: BSD, Extended Attributes, FS: ReiserFS, FS: XFS, FS: ext2, FS: ext3, FS: smbfs, Networking, Patents, Virtual Memory

People: Larry McVoyStephen C. TweediePavel MachekSteve LordMatthew KirkwoodH. Peter AnvinTheodore Y. Ts'oAlan CoxDan KorenJim MostekJeff V. MerkeyPaul JakmaStefan MonnierDavid LuyerStephen Tweedie

Paul Jakma gave a pointer to News.Com article (http://www.news.com/News/Item/0,4,36807,00.html?tag=st.cn.1fd1.newstkr.ne) that discussed SGI's recent decision to release XFS as Open Source. Stefan Monnier was thrilled, and asked how XFS performed relative to other filesystems. Larry McVoy said:

I know a lot about XFS performance. Unfortunately, it's hard to split out what parts are XFS and what parts are IRIX infrastructure.

Some parts of XFS are amazing (actually, it isn't XFS that's so fast, it's XLV - the volume manager underneath - XFS does its part by getting out of the way and letting XLV set up all the DMAs in parallel). The I/O bandwidth that you can get out of XFS/XLV is limited only by the hardware. When I was at SGI they demoed XFS doing 7GByte/second and there is no reason why that number couldn't be 7TByte/second.

The journalling is nice - it's nowhere near as fast as ext2 but it is safe, you can turn off the machine the middle of an untar and things are in a sane state when you reboot. I strongly suspect that Stephen's journalling work will be lighter weight.

XFS is extent based so you could have a 10TB file that was made up of a small number of extents, very nice.

I suspect that what will happen is that we'll get XFS, take a while to understand it and then migrate the ideas that we want into ext3 or whatever Stephen is calling his thing. For a lot of stuff, XFS is overkill and it comes at a non-zero cost.

Under the Subject: [was: ext2 question] XFS opensourced! (../unavailable.html) , there was some concern that XFS might not be GPLed. There was also the question of whether Stephen C. Tweedie's recent ext3 work might become unnecessary. Stephen replied, "Yes, _if_ it comes out as GPL and if it gets integrated well into Linux. XFS is extremely scalable. It will be interesting to see how it compares to ext2/ext3 for smaller filesystems: ext2 is very lightweight indeed as filesystems go."

Under the Subject: Re: SGI's XFS DONATED AS OPEN SOURCE!!!!!!!!! (../unavailable.html) , an actual SGI XFS developer was heard from. Pavel Machek (who is not the XFS developer) said, "I would say hooray at the time someone makes it working with Linux. Having filesystem is nice, but integrating XFS into linux may be well more work than writing journaling filesystem from scratch. (Or maybe SGI is going to do work for us?) There are some non-trivial issues buffer cache." XFS developer Steve Lord replied, "Hmmm, I guess that wasn't made clear - we are working on getting XFS functioning in Linux ourselves, but some help would probably be appreciated. The thing we have to get done first is to produce an unencumbered version of the source - i.e. one which does not include code copyrighted by people other than SGI. After that hopefully other people can jump in."

But the main discussion took place under the Subject: XFS and journalling filesystems (../unavailable.html) . Edward Thomas asked if, assuming XFS would be GPLed, should XFS become the "official" replacement of ext2? A big thread followed. Matthew Kirkwood gave his opinion:

Of course not. We (well I) haven't seen a single line of code yet. (The same is true of SCT's journalling stuff, of course :)

The BSD and Linux VFSes are pretty different and incorporating XFS with Linux will take time to do and stabilise.

ext2 has proven itself to provide good performance in most cases. With the journalling stuff, and the possibility of btree- and extent-ifying it, we should see that performance remain, and scale to much larger numbers. With that done, we'll have a fast, 64-bit (on suitable platforms), optionally journalled filesystem.

ext2 was designed for Linux, so it "fits" very well. Its developers know Linux about as well as anyone. In contrast, filesystems which weren't designed for Linux (or, alternatively, that Linux wasn't dsesigned for) like fat, nfs, smbfs, often won't sit quite right with Linux's view of the filesystem.

There's also the issue that other people's code doesn't seem to last too long in Linux. Look at the ports of the BSD network stack, or their IP firewalling code. We have our own code now, and it's at least as good.

SGI has made a great and very forward-thinking move and I for one applaud and thank them for it. Perhaps in a year or so, we'll know how ext2, xfs, reiserfs and maybe others (WAFL, please :) compete. I suspect that each will have an obvious core competency or home ground. Good - choice is great. I just think that talk about dumping one of Linux's core components (and greatest strengths) on the strength of a press release is a little premature.

Elsewhere, H. Peter Anvin also explained, "It doesn't make sense to "adopt a defacto replacement" until it is IN the kernel and WORKS. Expect that conversion of the kernel interface from IRIX to Linux to require at least some amount of pain. Linux standardization should be technology driven. Furthermore, ext2 will have to be supported more-or-less indefinitely because of the huge installed base."

Elsewhere, Serena Del Bianco gave a pointer to an interview with SGI Strategic Technologist Hank Shiffman (http://www.linuxworld.com/linuxworld/lw-1999-05/lw-05-sgi.html) .

Elsewhere, Theodore Y. Ts'o said:

I've talked to the SGI folks, and there are a couple of things to keep in mind as you read their press release. First of all, they have just made the decision to release it under an Open Source(tm) license; they have not yet committed to using the GPL. If they use an Open Source license (such as one similar to the Apple or Mozilla Public License) which is not compatible with the GPL, then putting it into the kernel sources would be a problem; it might have to be distributed separately from the kernel.

Secondly, they still need review the XFS code they intend to contribute for copyright and patent encumberances from other companies, and to actually port it to Linux. So it will be a while before we even have a chance to look at the contributed code to evaluate it, and even longer before it will be ready for prime-time use in the Linux kernel.

Finally, I'm told that when XFS was first introduced into Irix, significant changes needed to be made to its VM layer to support XFS. If any changes are required, they will have to be clean enough so that Linus will approve them. If they are horribly ugly and grotesque, we all know that Linus will turn them down flat-out.

(And there are certain features of XFS, such as the features that allow Irix to tell the disk controller to send disk blocks directly to the ethernet controller, which then slaps on the TCP header and calclates the TCP checksum without the disk data ever hitting memory, which I'm pretty confident we won't be supporting in Linux any time soon. It's a cool idea conceptually, the implementation and maintenance headaches it causes are generally acknowledged to be not worth it. It's a pain even if you control all of the hardware, such as SGI did. I don't even want to think about what it would be like to try supporting this on generic PC hardware.)

So the bottom line is that I would be very, very, suprised if XFS for Linux took less than a year (or more) to become a reality. That being said, XFS is a very nice filesystem, and the white papers that I've read about it shows that it's something which SGI was very generous to donate and which no doubt will be great for Linux. However, there's some work that needs to be done before first before it will become a reality. If there are implementation issues, either because SGI doesn't release it under a GPL-compatible license, or because their implementation requires VM changes which Linus refuses to accept, one option may be to look at the XFS filesystem format and do our own clean, from-scratch implementation which takes the ideas from XFS and is XFS-format compatible. There are many different options available to us.

Also, we can't discount the possibility that as a result of SGI deciding to Open Source XFS, Compaq might not decide to do the same thing with their advfs, which is also a truly wonderful filesystem which urrently ships with their Digital Unix OS. If that happens, we will be in the happy position of having two very well-designed filesystems to evaluate and choose from.

So my personal perspective is that XFS is a very promising filesystem for us, but it's not ready yet. In the short- to medium- term, both Stephen Tweedie and I will be working on improvements to ext2fs so that people who need solutions sooner than when XFS will be available will have something they can use. In the long term, XFS is undoubtedly a very interesting prospect to consider.

Larry McVoy agreed, and they had a technical discussion about the inner workings of XFS.

Elsewhere, Alan Cox said, "XFS is 50,000 odd lines of mainframe class filing system code. Its unlikely to be the ideal fs for a small appliance or a desktop at home even if it kicks butt as a server fs." But Dan Koren replied, "Quite the contrary. The fewer disk spindles on a system, the greater the performance gains from XFS' very sophisticated i/o scheduling. In addition, XFS code is layered neatly enough that unwanted features/options can be left out if one so wants," and Jim Mostek said to Alan, "More like 100,000 lines+. But, I'm not sure what will wind up in Linux. There are two directory formats (old/new) and only one should go into Linux. This should save about 10K. Other stuff is on the side like the extended attributes and they really don't impact the main code." He asked why lines of code was so important, and Alan replied that it caused binary bloat; but some other folks argued that Linux allowed you to pick and choose what got compiled in.

Elsewhere, Jeff V. Merkey from Timpanogas Research Group replied to Edward's original post, claiming SGI was probably just reacting to Timpanogas' GPLing their FENRIS filesystem. He said, "You should wait to see just how serious they really are about this, and how much of it they are really going to give you. Another Unix File system (yawn yawn yawn) with journalling (which means it will be **SLOW**). I would vote for the ext3 project to continue." He added, "We are bringing Linux Novell's installed base of 8,000,000 NetWare Servers as a potential target market for Linux to penetrate. How many Irix nodes are there? 40,000 maybe?" David Luyer felt the XFS code could be very useful, and that Linux should support as many journalling and log-based filesystems as possible. He didn't see a reason to choose one.

Jeff later clarified, "I just didn't like seeing some folks go belly up and start killing their internal projects (like ext3) just becuase XFS shows up on the scene."

There followed a technical discussion of the ups and downs of XFS and IRIX, and various subthreads are still quite active, getting over 10 posts per day.

3. Klogd Acts Up

23 May 1999 - 30 May 1999 (6 posts) Archive Link: "[2.2.9] klogd is using 99% cpu and networkperf. is strange"

People: Pavel MachekArjan van de VenAndi Kleen

Arjan van de Ven found that klogd was using 99% cpu under 2.2.9 on his system, causing (as you might expect) a big performance drop. Andi Kleen thought it must be writing something to the system log, but Arjan said no, the logs weren't being written. Then Andi asked for an strace, but Arjan reported that klog wasn't making any system calls. Finally, Pavel Machek said, "Klogd just does this sometimes. Get newer version." End Of Thread (tm).

4. API Changes From 2.0 to 2.2

23 May 1999 - 29 May 1999 (5 posts) Archive Link: "functions from 2.0 replaced by?"

Topics: FS: ROMFS

People: Jan KaraRichard Gooch

Kit Peters found that verify_area(), memcpy_tofs(), and memcpy_fromfs() were not available in his kernel 2.2.7 and glibc 2.1.1pre2 system, and he was curious what had replaced them. Jan Kara replied, "Currently there are functions copy_to_user() and copy_from_user(). They do all that was done by above mentioned functions.. The mentioned functions don't exist any more..." . Sam Roberts searched for hours, and finally found Richard Gooch's page, http://www.atnf.csiro.au/~rgooch/linux/docs/index.html, which (among other things) listed the API changes from 2.0 to 2.2 and from 2.2 to 2.3. Richard pointed out that his page was listed in the linux-kernel FAQ, but Sam said that the FAQ itself was hard to find, and wasn't linked from any obvious places.

5. Conflicting Development On The Page Cache

22 May 1999 - 24 May 1999 (5 posts) Archive Link: "[PATCHES]"

Topics: Executable File Format, FS: NFS, FS: ext2, SMP

People: Eric W. BiedermanLinus TorvaldsIngo MolnarStephen Tweedie

Eric W. Biederman posted several uuencoded patches to improve the basic mechanisms of the page cache, intended for 2.3.4. He listed the functionality of each patch:

  1. Allow reuse of page->buffers if you aren't the buffer cache
  2. Allow old old a.out binaries to run even if we can't mmap them properly because their data isn't page aligned.
  3. Muck with page offset.
  4. Allow registration and unregistration for functions needed by swap off. This allows a modular filesystem to reside in swap...
  5. Large file support, basically this removes unused bits from all of the relevant interfaces. I also begin to handle PAGE_CACHE_SIZE != PAGE_SIZE
  6. Introduction of struct vm_store, and associated cleanups. In particular get_inode_page. vm_store is a variation on the inode struct which is lighter weight. vm_stores's seperates out the vm layer from the vfs layer more, making things like the swap_cache easier to build, and cleaner. This is potentially very useful and the cost is low.
  7. Actual patch for dirty buffers in the page cache.
  8. I'm fairly well satisfied except for generic_file_write. Which I haven't touched. It looks like I need 2 variations on generic_file_write at the moment.
  9. Misc things I use, Included simply for reference.

Linus Torvalds replied:

I have three worries:

So would you mind just sending the patches in plaintext, one by one, to avoid at least one of my worries (and as a reference to other people: this is basically how I always prefer patches).

The other worries I'll see about later. The short descriptions sound fine, although I still want to look at the vm_store part closer..

Eric was a bit disturbed that Ingo had been working along similar lines without the two of them coordinating with each other. He asked for a pointer to Ingo's work, and Ingo replied:

i'm mainly working on these two areas:

He gave some implementation details, and added, "the current state of the patch is that it's working and brings a nice performance jump on SMP boxes on disk-benchmarks and is stable even under heavy stress-testing. Also (naturally) dirty buffers show up only once, in the page cache. I've broken some things though (swapping and NFS side-effects are yet untested), i'm currently working on cleaning the impact on these things up." He went on, "i didnt know about you working on this until Stephen Tweedie told me, then i quickly looked at archives and (maybe wrongly?) thought that while our work does collide patch-wise but is quite orthogonal conceptually. I've tried to sync activities with others working in this area (Andrea for example). I completely overlooked that you are working on the block-cache side as well."

Under the Subject: [PATCH] cache large files in the page cache (../unavailable.html) , Eric posted a big patch, and announced, "since Ingo has been working on the page cache as well, I'm stopping here. Any changes up to this point are straight forward to resolve, and this patch is the really challenging one to port from kernel to kernel." There followed some corrections and discussion about the patch.

6. linux-kernel Mail Delays

24 May 1999 - 30 May 1999 (6 posts) Archive Link: "l-k silent?"

Topics: Mailing List Administration

André Dahlqvist noticed a big time lag on linux-kernel, and there was some discussion about it. The problem has come up before, and will undoubtedly come up again, especially given the tremendous volume of linux-kernel.

7. Profiling Locks For Speed Enhancements

23 May 1999 - 29 May 1999 (5 posts) Archive Link: "kernel_lock() profiling results"

People: Stephen C. TweedieManfred SpraulDavid S. Miller

Manfred Spraul wrote a patch to profile kernel locks, and gave a URL pointing to his findings (http://www.colorfullife.com/manfreds/kernel_lock/) .

While compiling the kernel, he found that nearly 60% of the locks were released after less than 1024 CPU cycles, though a few locks were kept for a very long time. E.g., sys_bdflush kept locks for more than 10 milliseconds (>0.01 seconds).

While serving up web pages with apache, he found that only 17% of the locks needed less than 1024 CPU cycles, but 55% needed less than 2048 CPU cycles.

He suggested some changes to take advantage of these statistics, and Stephen C. Tweedie gave a pointer to ftp://ftp.uk.linux.org/pub/linux/sct/performance, a patch he and David S. Miller had written to improve lock handling. There was some technical discussion about implementation, which continued under the Subject: Re: [patch] releasing kernel lock during copy_from/to_user (../unavailable.html) .

8. Ipchains Firewalling Code Patched For Memory Leak

25 May 1999 - 30 May 1999 (3 posts) Archive Link: "[PATCH] Memory leak in ipchains"

People: Peter TirsekDavid S. MillerPaul Rusty Russell

Peter Tirsek found and fixed a bug, saying, "I've recently had a machine crash due to an appearant lack of memory. The nature of this problem lead us to look for a memory leak in the kernel, and we found a bug in the ipchains firewalling code. This doesn't affect normal operation of most sites, but does cause the kernel to allocate one 100-byte buffer[1] (128-byte slab?) that is never freed again, every time a rule is deleted using IP_FW_DELETE." Paul Rusty Russell was impressed, and asked David S. Miller to include the patch in Paul's other 2.2 ip_fw.c patch (he also added that patches should be CCed to the maintiner so as not to get lost in the linux-kernel swamp). David acknowledged the patch and applied it.

9. Linus Announces Pre-2.3.4-1

25 May 1999 - 31 May 1999 (10 posts) Archive Link: "pre-2.3.4.."

Topics: SMP

People: David S. MillerDominik KublaLinus TorvaldsHorst von Brand

Linus Torvalds announced the latest pre-patch, which had a rough cut of the new scalable network code, as well as newer versions of ISDC and PPC. There was a bit of criticism. Horst von Brand found an SMP bug (for which David S. Miller posted a simple patch), and Dominik Kubla found that the networking layer's new locking functions hadn't been implemented for most architectures.

 

 

 

 

 

 

Sharon And Joy
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.