Kernel Traffic #289 For 3 Jan 

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 2130 posts in 11667K.

There were 493 different contributors. 275 posted more than once. 161 posted last week too.

The top posters of the week were:

1. Status Of Acceptance Of FUSE User-Space Filesystem Into Official Kernel

15 Nov 2004 - 3 Dec 2004 (119 posts) Subject: "[PATCH] [Request for inclusion] Filesystem in Userspace"

Topics: FS: sysfs, Microkernels, Virtual Memory

People: Linus TorvaldsMiklos SzerediPavel MachekGreg KH

Miklos Szeredi asked if the FUSE userspace filesystem could be added to the main kernel sources, but Linus Torvalds replied:

Quite frankly I think it's too messy.

I'd like FUSE a whole lot more if it _only_ did the general page cache reading, but it seems to do a whole lot more, most of it broken.

In other words, I think it's fundamentally wrong to not have a special "fuse_file_read". If it isn't just "generic_file_read()" (possibly together with a re-validation callback but even that is very debatable indeed) there's something wrong with it imho.

The code looks like it was started before the page cache was all done, and nobody ever cleaned it up to use the full VFS power - or for some suspect reason decided that they wanted to support insane filesystems.

Together with removing the 2.4.x code and sending a real patch that has the cleanups, and maybe I'd reconsider.

Miklos replied that the latest patch already addressed these concerns, and the 2.4 code had already been removed. He affirmed that there was some unnecessary code in the patch, but that it had been included for performance reasons, not out of a lack of maintainership.

A bunch of folks piled onto the patch, with comments and criticisms. Greg KH in particular helped clarify some issues involving /dev, /proc, and SysFS.

Elsewhere, Pavel Machek asked what the advantages were of FUSE over CODA. Miklos said that the two were really quite different; and he and Pavel launched into a back-end comparison. Linus, following the discussion silently for awhile, and in the course of discussion made some interesting comments on some general issues. One, on the patch submission process:

from a merging standpoint, simple really _is_ better. Even if you really really want to use exotic features like "direct IO" and writable mappings some day, let's just put it this way: it's a lot easier to merge something that has no questions about strange cases, and then _later_ add in the strange cases, than it is to merge it all on day #1.

I'm a sucker. Ask anybody. I'll accept the exact same patch that I rejected earlier if you just do it the right way. I'm convinced that some people actually do it on purpose just for the amusement value ("Look, he did it _again_. What a doofus!")

Elsewhere, Linus compared the idea of a user-space filesystem to the idea of a microkernel, in that they both attempted to de-integrate the operations of the various parts of the system. He said:

there is a _reason_ why microkernels suck. This is an example of how things are _not_ "independent". The filesystems depend on the VM, and the VM depends on the filesystem. You can't just split them up as if they were two separate things (or rather: you _can_ split them up, but they still very much need to know about each other in very intimate ways).

So what do you do? You limit shared dirty pages (inefficient memory use), or you disallow certain behaviours, or you add tons of new interfaces to expose essentially the same "every thing that can allocate and is on the write-out path takes a GFP flag".

User-space filesystems are hard to get right. I'd claim that they are almost impossible, unless you limit them somehow (shared writable mappings are the nastiest part - if you don't have those, you can reasonably limit your problems by limiting the number of dirty pages you accept through normal "write()" calls).

2. Status Of Software Suspend

24 Nov 2004 - 3 Dec 2004 (249 posts) Subject: "Suspend 2 merge"

Topics: Software Suspend

People: Christoph HellwigNigel CunninghamPavel Machek

Nigel Cunningham submitted 51 separate patches, to merge Suspend 2 properly. Pavel Machek went back-and-forth with him on a bunch of them; and it seemed that some of Nigel's code completely subverted the swsusp (Software Suspend) code that Pavel had been working on. Christoph Hellwig called him on each of these, saying things like, "Make sure swsusp and swsusp2 export the same interface. Preferably the old one, but if it absolutely doesn't fit your needs submit a patch to switch the old code to the new interface first." For the most part this usually cantankerous subject was discussed nonviolently. Judging from the reactions of most of the folks working in the same area (including Nigel), it's unlikely that all of Nigel's patches will be accepted unchanged. Christoph in particular seemed to indicate that massive changes would be needed, to clean up problems that he said had existed in the code for a long time (at one point he accused Nigel of re-submitting work unchanged that had already been rejected -- and Nigel affirmed this was in some cases true). By the same token, enough folks had enough interesting comments to make, that it does seem likely Nigel's work will be accepted eventually, in one form or another. Even Pavel agreed that Suspend 2 should replace the existing swsusp code he'd written. In fact, Pavel and Christoph both suggested that the proper way to submit the patches would be to incrementally transform swsusp into Suspend 2, though Nigel argued, "I'm purposely not doing that. The reason is that suspend2 isn't a bunch of incremental changes to swsusp. It has been redesigned from the ground up and I'd have to pull swsusp to pieces and put it back together to do the same things." In the same post, he also said, "let Pavel and others get to the point where they're ready to say "Okay, we're satisfied that suspend2 does everything swsusp does and more and better." Then we can remove swsusp. This is the plan that was discussed with Pavel and Andrew ages ago." The others did insist on an incremental conversion however -- though Pavel did remark, "Okay, at this point I'll understand when you'll put my picture as a texture to some doom3 monster and shoot me thousand times... Lot of work went into suspend2, but in the meantime lot of work went into swsusp1, too..."

3. Status Of Class-Based Kernel Resource Management

29 Nov 2004 - 5 Dec 2004 (14 posts) Subject: "[PATCH] CKRM: 0/10 Class Based Kernel Resource Management"

Topics: Version Control

People: Gerrit HuizengaAndrew MortonMarc E. Fiuczynski

Gerrit Huizenga of IBM said, "The following ten patches add the core of CKRM (Class Based Resource Management) to Linux. Current patches are against 2.6.10-rc2. This set of patches is essentailly a cleaned up version of what is known on the ckrm-tech@lists.sourcerforge.net as the E16 code base. As compared to E16, the patch breakout has been reorganized for easier application to mainline with a number of stylistic cleanups more in line with mainline kernel code." Andrew Morton asked, "How useful is this code at present? What are its limitations? And what is the plan for future enhancements?" And Gerrit replied:

This set of code alone allows for creation of classes which include per-class resource accounting (including delay accounting), basic task management for memory, CPU and disk IO, limited socket & listener queue management for networking, and the related rules based infrastructure.

So, in short, it is a useful set of code to work with to demonstrate real utility with CKRM. However, this submission is not as full featured as is being used by those on the ckrm-tech list, such as the PlanetLab work. There are also things in SLES9 that are more featureful than this set although those will be worked into here in time.

It does not have the full memory management and scheduler support that other versions do and I'm not yet convinced that those are ready to submit. Future enhancements will start with the cleanups as recommended by lkml so far (thanks all ;-) followed by more work on the scheduler and memory management side in the short term. There are also ways to hook in additional resource controllers for any exhaustible resource, e.g. file handles. setrlimit style resources, etc.

Most of the next level of changes will build on these and are based on work currently in progress on the ckrm-tech list. However, this is a stripped down set of code which is believed to be stable (tested on IA32, x86-64, PPC64) with a variety of config options using both standard regression suites (e.g. LTP, kernbench, the ckrm tests, etc.).

Marc E. Fiuczynski also put in:

I integrated CKRM with the kernel used by PlanetLab (www.planet-lab.org), and I believe we (PlanetLab) are the first to use CKRM in a production setting. Our kernel is deployed on roughly 100 machines worldwide and we intend to upgrade all of our machines (roughly 400) over the next few weeks. Our kernel uses linux-vservers to create rather thin "virtual machines" (for the lack of a better name), but uses CKRM to provide for performance isolation between each vserver. The integration between CKRM and vservers was easy!

PlanetLab is used by tons of researchers. The software of each research is placed into a vserver, and each PlanetLab machine typically has anywhere from 20-40 actively running vservers running at a constant load of roughly 20. Some of the services running on PlanetLab have been discussed on Slashdot.

Gerrit mentioned that PlanetLab uses a more featureful version of CKRM. This is true. For each vserver we create a corresponding CKRM class, and then use the rule-based classification engine (RBCE) to automatically classify vserver processes to the appropriate CKRM class. We are itching to deploy the CKRM memory controller and IO controller, but unfortunately those have not been ready for prime time. For now, we've only deployed a variant of CKRM's cpu scheduler. We currently do not leverage the hierarchical support provided by CKRM, but envision a use for it in the future.

Unlike the posted CKRM patchset, the CPU, IO, and Memory controller make more invasive modifications to various kernel subsystems. I suspect that the CPU and IO controllers can be completely modularized into the pluggable CPU and IO framework that Con and Jens posted earlier, if that's the direction that mainline is heading. The CKRM memory controller makes a few choice modifications to mm/vmscan.c, which I suspect will rouse a fair amount of dicussion on LKML when the day arrives.

Andrew was glad to see some real-world examples, but said:

A concern which I have about the CKRM implementation is that the patches which have been sent out appear to be simply the "core" of CKRM, plus minimally-intrusive hooks. I have the impression that this core will not be terribly useful to real-world users and that follow-on patches will be required to add more functionality and to wire up more instrumentation and control points.

I would not like to be in a situation where we merge the "core" patch, but the as-yet-unseen follow-on patches which make CKRM useful and complete end up creating a big unmaintainable mess. We end up not wanting to go forwards and being unable to go backwards.

IOW: I think we need to see a reasonably-close-to-final implementation of CKRM before we can take it much further.

Gerrit replied:

Understood. We do have a more complete set of patches floating around, although most are ported to an existing distro rather than set for current mainline adoption. But if we can get general consensus on the patches (once I finish the current round of cleanup and testing), we do have work in memory management, IO scheduling, and even CPU scheduling (the latter being the most debatable for mainline acceptance given the rate of scheduler replacements in recent past) that are being used today.

We can dump the current, raw distro patches or the rest of the e16 patch set from ckrm-tech on you although I believe they will need some significant review/modification to be mainline acceptable yet. One big problem is that these changes are somewhat hard to maintain as distinct from mainline and yet remain relatively current. There are several developers working in distinct areas and each area moves at its own pace. Hence, I'd like to get to a more stable -mm compatible core, and build up from there. As we see that the entire set approaches stability/utility, we can push from the core up through the working set of resource controllers.

If getting you a set of patches for general concept review as based on a current distro would help, just say the word. However, getting those up to current mainline, integrated with each other and fully tested (while holding their development stable long enough to do that) is the requirement, well, that will take us a fair bit longer.

Part of the goal of this posting was to start to stabilize a core and improve on it, rather than try to deliver an entire project as a moderately large set of changes as a fait accompli. And, we are more than willing to continue to tweak and tune this to be generally useful to a wider audience, even though we have a set that works well for some groups needing better workload management.

So, Andrew, can you clarify how much we need to put in your hands, how well tested it needs to be and how clean and current the entire set needs to be before this is ready for -mm testing?

Andrew replied:

Well we can toss stuff into -mm any old time really. Doing it too early will cause rather a lot of difficulty and churn at both ends - working against -mm can be an extra burden at times.

I'd say that it would be best to wait until the code is, in your opinion, in a Linus-mergeable form. Then after one lkml review round and any subsequent rework we should be in good shape.

4. Performance Problems With kernel.org

1 Dec 2004 - 2 Dec 2004 (3 posts) Subject: "kernel.org has severe performance problems"

People: H. Peter Anvin

Continuing from Issue #287, Section #18  (24 Nov 2004: kernel.org Hardware Troubles) , H. Peter Anvin said:

Just to let you know; kernel.org has been suffering for performance problems lately, but in the last few days, for reasons we're not really clear about, the performance problems seem to have spread to the upload procedure. Thus, unfortunately, you might see long times between uploading something and when things appear on the main repository.

We are already in late-stage discussions with sponsors about new hardware, so bear with us.

5. Deprecating Broken And Duplicated Drivers

2 Dec 2004 (2 posts) Subject: "[PATCH/RFC] deprecate some drivers"

Topics: Networking

People: Jeff GarzikJim Nelson

Jeff Garzik said:

I'm looking to eliminate some horribly broken/dup drivers. Since 2.6 is an ongoing matter, I want a 'flashing-red warning sign' that drivers will soon be disappearing, rather than just killing the driver and listening for the screams.

IPhase driver is broken+abandoned, and xirtulip is broken+duplicate+abandoned, and are two prime candidates for my prefence of handling this matter: CONFIG_DEPRECATED.

Jim Nelson suggested, "Please add digiboard to your list - duplicate+abandoned."

6. Linux 2.6.9-ac14 Released

3 Dec 2004 - 5 Dec 2004 (7 posts) Subject: "Linux 2.6.9-ac13"

Topics: Kernel Release Announcement

People: Alan CoxArjan van de Ven

Alan Cox announced Linux 2.6.9-ac13, saying:

This -ac is a little different. It's still an experimental -ac to test the accumulated patches it would be nice to have in -ac but which might break something and seemed too risky. As such please test it but in general wait for the next -ac before planning to update production systems.

Arjan van de Ven is now building RPMS of the kernel and those can be found in the RPM subdirectory and should be yum-able. Expect the RPMS to lag the diff a little as the RPM builds and tests do take time.

The it8212 still doesn't default to DMA on - that is on the TODO list. The HPT366 rework project is also not ready (its gone back to the drawing board until the current panic is over if you are a volunteer and wondered what is up).

ftp://ftp.kernel.org/pub/linux/kernel/people/alan/linux-2.6/2.6.9/

There was some confusion about the patch, at first because it seemed Alan had not updated the version number in the Makefile, and then because it seemed he had uploaded the wrong version entirely. Alan released a 2.6.9-ac14 quickly with the intended patch.

7. Linux 2.6.10-rc3 Released; Some Dangers Remain

3 Dec 2004 - 7 Dec 2004 (26 posts) Subject: "Linux 2.6.10-rc3"

Topics: Framebuffer, I2C, Kernel Release Announcement, Power Management: ACPI

People: Linus Torvalds

Linus Torvalds announced Linux 2.6.10-rc3, saying:

Ok, it's out there in all the normal places, and here's the shortlog for the thing.

Mostly a lot of small fixes, although the MIPS update is pretty sizeable simply because it's been a while.

ACPI updates and a new i2c driver, mtd, arm, uml updates.. fbdev and sparse fixes. And a lot of other small things better just described by the changelogs.

Please do test this - and don't send me anything but bug-fixes. Let's aim for a real 2.6.10 before xmas (or hanukkah, or whatever your favourite holiday happens to be).

Several folks reported filesystem corruption with this kernel, but no immediate explanation was found. Several other folks said that an oops they'd experienced with earlier kernels was still present in this one.

8. Proposal For A Userspace Architecture Portability Library

4 Dec 2004 - 6 Dec 2004 (13 posts) Subject: "Proposal for a userspace "architecture portability" library"

Topics: BSD, Klibc

People: Paul MackerrasRobert LoveH. Peter Anvin

Paul Mackerras said:

Some of our kernel headers implement generally useful abstractions across all of the architectures we support. I would like to make an "architecture portability" library, based on the kernel headers but as a separate project from the kernel, and intended for use in userspace.

The headers that I want to base this on are:

There are some others that may also be useful: cache.h, checksum.h, io.h, xor.h.

Now, clearly I can do this under the GPL. However, I think it would be more useful to have the library under the LGPL, which requires either getting the permission of the authors of the kernel files, or rewriting them from scratch.

Linus (and other kernel copyright holders) - would you be willing to relicense such of the above files that have your copyright under the LGPL for this purpose?

I'm looking for volunteers to help with porting and testing on various architectures. I can do x86, ppc and ppc64, and I know sparc{,64} and m68k assembler, but for the rest I'll need help.

My hope is that distributions will be able to use this to replace some of the headers in /usr/include/asm, and thus reduce the desire for applications to include kernel headers.

Several folks loved this idea. Robert Love gave his permission to relicense his own kernel contributions; and said:

I think that this is an _awesome_ idea. Might want to check out what overlap there is with existing glibc interfaces. For example, I presume that glibc implements at least some of the atomic operations (but I also think having a full suite of atomic operations available is useful).

Some of the stuff, like semaphores, isn't really going to port very well to user-space. At least not directly, I would not think.

But on numerous occasions I have wanted the kernel's barriers, atomic operations, bitwise operations, or some of the compiler things we implement (likely, unlikely, fixes) in user-space.

H. Peter Anvin also offered to pitch in, though he would have preferred a BSD license, so he could add the result to klibc.

 

 

 

 

 

 

Sharon And Joy
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.