Kernel Traffic
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic

Kernel Traffic #310 For 4 Jun 2005

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1457 posts in 9MB. See the Full Statistics.

There were 539 different contributors. 208 posted more than once. The average length of each message was 107 lines.

The top posters of the week were: The top subjects of the week were:
82 posts in 583KB by adrian bunk
46 posts in 209KB by andrew morton
34 posts in 120KB by alan cox
31 posts in 142KB by daniel phillips
28 posts in 201KB by jeff dike
62 posts in 263KB for "mercurial 0.4b vs git patchbomb benchmark"
46 posts in 407KB for "[patch 1b/7] dlm: core locking"
44 posts in 177KB for "mercurial 0.3 vs git benchmarks"
33 posts in 160KB for "[patch 0/7] dlm: overview"
24 posts in 93KB for "zimage on 2.6?"

These stats generated by mboxstats version 2.8

1. Some Comparison Of git And Mercurial; A Backward Glance At BitKeeper

25 Apr 2005 - 4 May 2005 (107 posts) Archive Link: "Mercurial 0.3 vs git benchmarks"

Topics: Compression, Real-Time, Version Control

People: Matt MackallLinus TorvaldsMike Taht

Matt Mackall said:

This is to announce an updated version of Mercurial. Mercurial is a scalable, fast, distributed SCM that works in a model similar to BK and Monotone. It has functional clone/branch and pull/merge support and a working first pass implementation of network pull. It's also extremely small and hackable: it's about 1000 lines of code.

Here are the results of checking in the first 12 releases of Linux 2.6 into empty repositories for Mercurial v0.3 (hg) and git-pasky-0.7. This is on my 512M Pentium M laptop. Times are in seconds.

                 user         system       real        du -sh
ver    files   hg    git    hg    git    hg    git    hg   git

2.6.0  15007 19.949 35.526 3.171 2.264 25.138 87.994 145M   89M
2.6.1    998  5.906  4.018 0.573 0.464 10.267  5.937 146M   99M
2.6.2   2370  9.696 13.051 0.752 0.652 12.970 15.167 150M  117M
2.6.3   1906 10.528 11.509 0.816 0.639 18.406 14.318 152M  135M
2.6.4   3185 11.140  7.380 0.997 0.731 15.265 12.412 156M  158M
2.6.5   2261 10.961  6.939 0.843 0.640 20.564  8.522 158M  177M
2.6.6   2642 11.803 10.043 0.870 0.678 22.360 11.515 162M  197M
2.6.7   3772 18.411 15.243 1.189 0.915 32.397 21.498 165M  227M
2.6.8   4604 20.922 16.054 1.406 1.041 39.622 25.056 172M  262M
2.6.9   4712 19.306 12.145 1.421 1.102 35.663 24.958 179M  297M
2.6.10  5384 23.022 18.154 1.393 1.182 40.947 32.085 186M  338M
2.6.11  5662 27.211 19.138 1.791 1.253 42.605 31.902 193M  379M

tar of .hg/   108175360
tar of .git/  209385920

Full-tree change status (no changes):
hg:  real 0.799s  user 0.607s  sys 0.167s
git: real 0.124s  user 0.051s  sys 0.051s

Check-out time (2.6.0):
hg:  real 34.084s  user 4.069s  sys 2.024s
git: real 30.487s  user 2.393s  sys 1.007s

Full-tree working dir diff (2.6.0 base with 2.6.1 in working dir):
hg:  real 4.920s  user 4.629s  sys 0.260s
git: real 3.531s  user 1.869s  sys 0.862s
(this needed an update-cache --refresh on top of git commit, which took
another: real 2m52.764s  user 2.833s  sys 1.008s)

Merge from 2.6.0 to 2.6.1:
hg:  real 15.507s  user 6.175s  sys 0.442s
git: haven't quite figured this one out yet

Some notes:

Despite the above, it compares pretty well to git in speed and is quite a bit better in terms of storage space. By reducing the zlib compression level, it could probably win across the board.

The size numbers will get dramatically more unbalanced with more history - a conversion of the history in BK to git is expected to take over 3G, which Mercurial may actually take less space due to storing compressed binary forward-only deltas.

While disk may be cheap, network bandwidth is not. Given that the common case usage of git will be to do network pulls, it will find most of its speed wasted on waiting for the network. Mercurial will almost certainly win here for typical developer usage as it can do efficient delta communication (though it currently doesn't attempt any pipelining so suffers a bit in round trips).

More discussion about Mercurial's design can be found here:

Linus Torvalds replied:

That time in checking things in is worrisome.

"git" is basically linear in the size of the patch, which is what I want, since most patches I work with are a couple of files at most. The patches you are checking in are huge - I never actually work with a change that is as big as a whole release. I work with changes that are five files or something.

"hg" seems to basically slow down the more patches you have applied. It's hard to tell from the limited test set, but look at "user" time. It seems to increase from 6 seconds to 27 seconds.

To make an interesting benchmark, try applying the first 200 patches in the current git kernel archive. Can you do them three per second? THAT is the thing you should optimize for, not checking in huge changes.

If you're checking in a change to 1000+ files, you're doing something wrong.

Mike Taht pointed out, "One difference is probably - mercurial appears to be using zlib's *default* compression of 6.... using zlib compression of 9 really impacts git..." Linus replied:

I agree that it will hurt for big changes, but since I really do believe that most changes are just a couple of files, I don't believe it matters for those.

I forget what the exact numbers were, but I did some timings on plain "gzip", and it basically said that doing gzip on a medium-sized file was not that different for -6 and -9. Why? Because most of the overhead was elsewhere ;)

Oh, well, I just re-created some numbers. This wasn't exactly what I did last time I tested it, but it's conceptually the same thing:

        torvalds@ppc970:~> time gzip -9 < v2.6/linux/kernel/sched.c > /dev/null
        real    0m0.018s
        user    0m0.018s
        sys     0m0.000s

        torvalds@ppc970:~> time gzip -6 < v2.6/linux/kernel/sched.c > /dev/null
        real    0m0.015s
        user    0m0.013s
        sys     0m0.001s

ie there's a 0.003 second difference, which is certainly noticeable, and would be hugely noticeable if you did a lot of these. But in my world-view (which is what git is optimized for), the common case is that you usually end up compressing maybe five-ten files, so the _compression_ overhead is not that huge compared to all the other stuff.

But yes, testing git on big changes will test exactly the things that git isn't optimized for. I think git will normally hold up pretty well (ie it will still beat anything that isn't designed for speed, and will be comparable to things that _are_), but it's not what I'm interested in optimizing for.

That said - these days we can trivially change over to a "zlib -6" compression, and nothing should ever notice. So if somebody wants to test it, it should be fairly easy to just compare side-by-side: the results should be identical.

The easiest test-case is Andrew's 198-patch patch-bomb on linux-kernel a few weeks ago: they all apply cleanly to 2.6.12-rc2 (in order), and you can use my "dotest" script to automate the test..

An hour later he continued:

Oh, well. That was so trivial that I just did it:


        torvalds@ppc970:~/git-speed-1> ./script
        Removing old tree
        Creating new tree
        Initializing db
        defaulting to local storage area
        Doing sync
        Initial add

        real    0m37.526s
        user    0m33.317s
        sys     0m3.816s
        Initial commit
        Committing initial tree 0bba044c4ce775e45a88a51686b5d9f90697ea9d

        real    0m0.329s
        user    0m0.152s
        sys     0m0.176s

        real    0m50.408s
        user    0m18.933s
        sys     0m25.432s


        torvalds@ppc970:~/git-speed-1> ./script
        Removing old tree
        Creating new tree
        Initializing db
        defaulting to local storage area
        Doing sync
        Initial add

        real    0m19.755s
        user    0m15.719s
        sys     0m3.756s
        Initial commit
        Committing initial tree 0bba044c4ce775e45a88a51686b5d9f90697ea9d

        real    0m0.337s
        user    0m0.139s
        sys     0m0.197s

        real    0m50.465s
        user    0m18.304s
        sys     0m25.567s

ie the "initial add" is almost twice as fast (because it spends most of the time compressing _all_ the files), but the difference in applying 198 patches is not noticeable at all (because the costs are all elsewhere).

That's 198 patches in less than a minute even with the highest compression. That rocks.

And don't try to make me explain why the patchbomb has any IO time at all, it should all have fit in the cache, but I think the writeback logic kicked in. Anyway, I tried it several times, and the real-time ends up fluctuating between 50-56 seconds, but the user/sys times are very stable, and end up being pretty much the same regardless of compression level.

Here's the script, in case anybody cares:

        echo Removing old tree
        rm -rf linux-2.6.12-rc2
        echo Creating new tree
        zcat < ~/v2.6/linux-2.6.12-rc2.tar.gz | tar xvf - > log
        echo Initializing db
        ( cd linux-2.6.12-rc2 ; init-db )
        echo Doing sync
        echo Initial add
        time sh -c 'cd linux-2.6.12-rc2 && cat ../l | xargs update-cache --add --' >> log
        echo Initial commit
        time sh -c 'cd linux-2.6.12-rc2 && echo Initial commit | commit-tree
        $(write-tree) > .git/HEAD' >> log
        echo Patchbomb
        time sh -c 'cd linux-2.6.12-rc2 ; dotest ~/andrews-first-patchbomb' >> log

and since the timing results were pretty much what I expected, I don't think this changes _my_ opinion on anything. Yes, you can speed up commits with Z_DEFAULT_COMPRESSION, but it's _not_ that big of a deal for my kind of model where you commit often, and commits are small.

It all boils down to:

I mean, if it took 2 _hours_ to do the initial commit, I'd think it matters. But when we're talking about less than a minute to create the initial commit of a whole kernel archive, does it really make any difference?

After all, it's something you do _once_, and never again (unless you script it to do performance testing ;)

Anyway guys, feel free to test this on other machines. I bet there are lots of subtle performance differences between different filesystems and CPU architectures.. But the only hard numbers I have show that -9 isn't that expensive.

In the course of discussion, Linus and Matt came to consider BitKeeper's methods of doing things. Linus remarked:

I didn't want to do anything that even smelled of BK. Of course, part of my reason for that is that I didn't feel comfortable with a delta model at all (I wouldn't know where to start, and I hate how they always end up having different rules for "delta"ble and "non-delta"ble objects).

But another was that exactly since I've been using BK for so long, I wanted to make sure that my model just emulated the way I've been _using_ BK, rather than any BK technical details.

Matt also confirmed that "I've never used BK, but I got the impression that it was all SCCS under the covers, which means adding stuff and reconstructing random versions is expensive (just as it is in CVS). The split between index and data in Mercurial is intended to address that."

2. Review Of Patch Submissions For Stable Release

27 Apr 2005 - 30 Apr 2005 (47 posts) Archive Link: "[00/07] -stable review"

Topics: Digital Video Broadcasting, Disks: SCSI, FS: NFS, FS: sysfs, SMP, User-Mode Linux

People: Greg KHAlan CoxChris Wright

Greg KH said:

This is the start of the stable review cycle for the release. There are 7 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let us know. If anyone is a maintainer of the proper subsystem, and wants to add a signed-off-by: line to the patch, please respond with it.

These patches are sent out with a number of different people on the Bcc: line. If you wish to be a reviewer, please email to add your name to the list. If you want to be off the reviewer list, also email us.

Responses should be made by Friday, Apr 29 17:00 UTC. Anything received after that time, might be too late.

One patch gave modular NFSd a syscall interface, usable by User-Mode Linux. Alan Cox felt this was not really a critical bug, since anyone who wanted it could just compile NFSd directly into the kernel. Chris Wright felt this made sense, and suggested dropping that patch; and Greg dropped it.

Another patch would fix a system lockup with some bt8xx-based DVB cards, when loading the bttv driver. There were no objections to this one.

Another patch fixed some SysFS files to be read-only, since trying to write to them could produce undefined results. There were no objections to this either, although seme refinements were offered.

Another patch fixed partition guessing; however since there didn't seem to be anyone who actually experienced problems with this, there was some doubt as to whether it should go into the a .8 release or not.

Another patch fixed a reproducible SMP crash. There was no objection to this.

Another patch attempted to fix SCSI tape security, but Alan remarked, "This patch is just wrong on so many different levels its hard to know where to begin." However, after some discussion, he modified his objections, to say, "Its the wrong answer long term I suspect but its definitely a good answer for now."

3. Fixing SysFS File Ownership For Tpm Driver

27 Apr 2005 - 29 Apr 2005 (6 posts) Archive Link: "[PATCH 10 of 12] Fix Tpm driver -- sysfs owernship changes"

Topics: FS: sysfs

People: Kylene HallGreg KH

Kylene Hall said that for the current Tpm driver, "all sysfs files end up owned by the base driver module rather than the module that actually owns the device this is a problem if the module is unloaded and the file is open. This patch fixes all that." Greg KH had some technical suggestions and concerns, but seemed generally favorable to the patch.

4. New Broadband Processor Architecture Within The PPC64 Architecture Tree

27 Apr 2005 - 28 Apr 2005 (15 posts) Archive Link: "[PATCH 0/4] ppc64: Introduce BPA platform"

Topics: Ioctls, POSIX

People: Arnd Bergmann

Arnd Bergmann said:

This series of patches add support for a fifth platform type in the ppc64 architecture tree. The Broadband Processor Architecture (BPA) is currently used in a single machine from IBM, with others likely to be added at a later point.

I already sent preparation patches before, these need to be applied on top of them. The first three patches add the actual platform code, which should be usable for any BPA compatible implementation.

The final patch introduces a new file system to make use of the SPUs inside the processors. This patch is still in a prototype stage and not intended for merging yet.

Regarding this last, Arnd posted the final patch, saying:

This is an early version of the SPU file system, which is used to run code on the Synergistic Processing Units of the Broadband Engine.

The file system provides a name space similar to posix shared memory or message queues. Users that have write permissions on the file system can create directories in the spufs root.

Every directory represents an SPU context, which is currently mapped to a physical SPU, but that is going to change to a virtualization scheme in future updates.

An SPU context directory contains a predefined set of files used for manipulating the state of the logical SPU. Users can change permissions on those files, but not actually add or remove files without removing the complete directory.

The current set of files is:

Other files are planned but currently are not implemented or not functional.

5. Attempting To Reorganize XFS Compile-Time Configuration Options

27 Apr 2005 - 1 May 2005 (10 posts) Archive Link: "[PATCH] fs/Kconfig: more consistent configuration of XFS"

Topics: FS: JFS, FS: ReiserFS, FS: XFS, FS: ext3

People: Nguyen Anh QuynhChristoph Hellwig

Nguyen Anh Quynh said:

At the moment, the configuration interface of Filesystem is not very consistent:

Here is the patch to fix the problem: it moves XFS configuration from fs/xfs/Kconfig to fs/Kconfig, makes it to do all the configuration in the same screen (by removing "menu" directive), and removes the unnecessary fs/xfs/Kconfig.

Christoph Hellwig replied, "The screen bits is fine, btu please keep fs/xfs/Kconfig. It make maintaince a lot a easier for us XFS people." Nguyen said:

I dont see why we should keep a file in kernel tree without using it (since the patch removes "source xfs/Kconfig). Anyway, here is another patch that doesnt remove fs/xfs/Kconfig.

Also note that this patch (and the last one, too) moves "config XFS_EXPOR" to the bottom, so the menu intems aligns better and consistently with others (like what Reiserfs, JFS,... are doing)

But Christoph replied that not only should the file itself remain, but the usage of it should remain as well. Nguyen said, "OK, here is another patch. It is up to Andrew to pick the approriate. But I still prefer the first patch, which provides both consistency in interface and configuration."

6. Strong Discord Among Top IDE Developers

28 Apr 2005 - 29 Apr 2005 (9 posts) Archive Link: "Multiple functionality breakages in 2.6.12rc3 IDE layer"

Topics: Disks: IDE, Disks: SCSI, Ioctls

People: Alan CoxBill DavidsenBartlomiej Zolnierkiewicz

Alan Cox said:

Ages ago we added an ide_default driver to clean up all the corner cases like spurious IRQs for a device with no matching driver (eg ide-cd and no CD driver) as well as ioctls and file access.

2.6.12rc removes it. Unfortunately it also means that if your only IDE interface is one you hand configure you can no longer run Linux. It also changes other aspects of behaviour although they don't look problematic for most users. You can no longer

without having a device specific driver loaded matching the media - and that only works if its already detected the device correctly.

I don't have the tools at the moment to generate spurious IRQ's for devices with no driver loaded but it does look like the code may well then crash. From the way the changes were done it appears the current IDE maintainers never appreciated that ide_default existed for far more than just cleaning up ide-proc but also to handle IRQ's, opening of empty slots, ioctls and power management ?

The ability to specify the IDE ports on the command line as needed for some Sony laptop installs have also become "obsolete" over time. They still appear to work but spew a warning that the user will soon be screwed.

Bill Davidsen said, "I missed the discussion of why it was felt that the users would no longer want to be able to do these things, or the new way to do it." Alan replied, "I'm assuming it may be accidental rather than detailed planning. Also its taken this long to notice so its clearly not that critical to everyone. Seems to be reasonably sane to fix too." Bill replied, "I was being a bit sarcastic about the "missed the discussion" bit, but I'm pretty sure ripping out the capability was deliberate. Hopefully it's now going to be evaluated, and then fixed. One thing Linux doesn't seem to do well is recover failed drives at boot time, it always seems to take a bunch of fiddling or even a boot from live CD and hand recover." He added, "Thanks for jumping into this, with ATAPI storage down to about $1100(US)/TB it's getting really hard to justify SCSI and real hot swap hardware."

Elsewhere, Bartlomiej Zolnierkiewicz also replied to Alan, saying, "Maybe you should mail current maintainer before spreading FUD?" He added, "No functionality was removed AFAIK, see the patches. I spend quite a bit of time making sure that nothing breaks up (I missed one special case but somebody already posted patch to LKML fixing it). These patches were posted at least two times to both linux-ide and linux-kernel, they were in -mm for ages - were you hiding under the rock?" He said there had been several discussions already; and added, "Alan, seriously, what is your problem?" Alan replied that his problem was that "The fact that the IDE layer appears to be getting worse not better, which given the starting point is a remarkable achievement." He added that he had been busy "doing an MBA thesis, a job, learning a second language and trying to beat sense into our politicians. Now I come back to look at the ide layer ready for a 2.6.12 merge and its all a bit messy. The open code was clean and is now duplicated. Copies of subtly different per driver gendisk/disk layer open routines have appeared that should be shared. The default driver handling has been removed and half the options for obscure systems have been marked obsolete in some Gnome like purge of functionality that might scare small children." He added, "If you need details you shouldn't be maintaining that code."

Bartlomiej said, "Give details or quit whining." The two continued flaming each other, and Bartlomiej ended it with, "Feel free to fork so you'll be wasting yours time only and not mine."

7. Attempting To Unify Semaphore Implementations For Maintainability

28 Apr 2005 - 30 Apr 2005 (17 posts) Archive Link: "[RFC] unify semaphore implementations"

Topics: Assembly

People: Benjamin LaHaiseJames BottomleyDavid S. MillerRussell KingTrond MyklebustPaul Mackerras

Benjamin LaHaise said:

Please review the following series of patches for unifying the semaphore implementation across all architectures (not posted as they're about 350K), as they have only been tested on x86-64. The code generated is functionally identical to the earlier i386 variant, but since gcc has no way of taking condition codes as results, there are two additional instructions inserted from the use of generic atomic operations. All told the >6000 lines of code deleted makes for a much easier job for subsequent patches changing semaphore functionality.
Introduce linux/semaphore.h. Convert all users of asm/semaphore.h over to linux/semaphore.h.
Move i386 rwlock helper functions out of semaphore.c and into their own file rwlock.c.
Replace all semaphore implementations with a single implementation derrived from the i386 code using atomic operations. Tested on x86-64, compiled on i386 and ia64.

James Bottomley replied:

It's all very well for platforms that have efficient atomic operations. However, on parisc we have no such luxury (the processor has no atomic operations, so we have to fiddle them in the kernel using locks), so it looks like you're making our semaphore operations less efficient.

Could you come up with a less monolithic way to share this so that we can still do a spinlock semaphore implementation instead of an atomic op based one?

Benjamin replied, "As I read the code, it doesn't make a difference: parisc will take a spin lock within the atomic operation and then release it, which makes the old fast path for the semaphores and the new fast path pretty much equivalent (they both take and release one spinlock). The only extra cost is the address computation for the spinlock. If there is contention for the atomic spinlocks, then parisc can increase the number of buckets in their hashed spinlocks." David S. Miller replied, "I think parisc should be allowed to choose their implementation of semaphores. Look, if you change semaphores in some way it will be their problem to keep their parisc version in sync. Or you could provide both a spinlocked and an atomic op based implementation of generic semaphores, as we do for rwsem already."

Elsewhere, Russell King saw no point to Benjamin's patches at all. He said, "What happened to efficiency and performance? It is my understanding that the inline part of the semaphore implementation was one of the critical areas - critical enough to warrant coding it in assembly for some people." Trond Myklebust explained, "It started from a desire to extend the existing implementations to support new features such as asynchronous notification. Currently that sort of thing is impossible unless your developer-super-powers include the ability to herd 24 different subsystem maintainers into working together on a solution. In other words, the main drive is the desire to make it maintainable." Paul Mackerras said, "Well, maybe the slow paths could be unified somewhat, and then these extra features could be added in the slow paths. I would support that. I certainly don't support replacing the current optimized fast-path implementations with a lowest-common-denominator thing like Ben was proposing."

8. New timeofday Core Subsystem

29 Apr 2005 - 3 May 2005 (15 posts) Archive Link: "[RFC][PATCH (1/4)] new timeofday core subsystem (v A4)"

Topics: FS: sysfs, POSIX

People: John StultzNishanth AravamudanDarren HartMatt Mackall

John Stultz said:

This patch implements the architecture independent portion of the time of day subsystem. For a brief description on the rework, see here: (Many thanks to the LWN team for that clear writeup!)

Mostly this version is just a cleanup of the last release. One neat feature is the new sysfs interface which allows you to manually override the selected timesource while the system is running.

Included below is timeofday.c (which includes all the time of day management and accessor functions), ntp.c (which includes the ntp scaling calculation code, leapsecond processing, and ntp kernel state machine code), timesource.c (for timesource specific management functions), interface definition .h files, the example jiffies timesource (lowest common denominator time source, mainly for use as example code) and minimal hooks into arch independent code.

The patch does not function without minimal architecture specific hooks (i386, x86-64, ppc32, ppc64, ia64 and s390 examples to follow), and it should be able to be applied to a tree without affecting the code.

New in this version:

Items still on the TODO list:

Nishanth Aravamudan replied:

I have been working closely with John to re-work the soft-timer subsytem to use the new timeofday() subsystem. The following patch attempts to being this process. I would greatly appreciate any comments.

Some design points:

  1. The patch is small but does a lot.

    1. Renames timer_jiffies to last_timer_time (now that we are not jiffies-based).
    2. Converts the soft-timer time-vector's/bucket's entries to timerinterval (a new unit) width, instead of jiffy width.
    3. Defines timerintervals to be the current time as reported by the new timeofday-subsystem shifted down by 20 bits and masked to only grab the lower 32 bits. This effectively emulates a 32-bit millisecond value.
    4. Uses do_monotonic_clock() (converted to timerintervals) as the basis for addition and expiration instead of jiffies.
    5. Adds some new helper functions for dealing with nanosecond values.
  2. Currently, the patch is dependent upon John's timeofday core rework. For arches that will not have the new timeofday (or for which the rework is still in progress), I can emulate the existing system with a separate patch. The goal of this patch, though, is just to show how easy the new system can be implemented and the benefits.
  3. The reason for the re-work?: Many people complain about all of the adding of 1 jiffy here or there to fix bugs. This new systems is fundamentally human-time oriented and deals with those issues correctly.

The code is reasonably well commented, but does expect readers to understand the current system to some degree.

And Darren Hart said:

Also working closely with John and Nish, I have been taking advantage of the new human-time soft-timer subsystem and the NO_IDLE_HZ code to dynamically schedule interrupts as needed. The idea is to have interrupt source drivers (PIT, Local APIC, HPET, ppc decrementers, etc) similar to the time sources in John's timeofday patches.

Because the resolution of the soft-timer sybsystem is configurable via TIMER_INTERVAL_BITS, and the timeofday code is now free of the periodic system tick, we can move the soft-timers to a dynamically scheduled interrupt system. We can achieve both sub-millisecond timer resolution and NO_IDLE_HZ simply by adjusting TIMER_INTERVAL_BITS and scheduling the next timer interrupt appropriately whenever a soft-timer is added or removed.

In general at the end of set_timer_nsecs(), we see when the next timer is due to expire and pass that value (in absolute nanoseconds) to schedule_next_timer_interrupt(). Each interrupt source driver is then free to reprogram the hard-timer to the "best" interval. For something like the local APIC, that may be exactly when the next timer needs to go off. For the PIT, it may do nothing at all and just fire periodically.

I have a prototype using the PIT, which just demonstrates that the system will still run this way. Obviously other timers will perform much better since the PIT is so slow to program.

I feel that this is a clean approach to two soft-timer issues: resolution and NO_IDLE_HZ. It integrates well with the patches from John and Nish and is a direct approach to these issues, rather than an attempt to add support on top of a jiffies based soft-timer subsystem.

9. Linux Released

29 Apr 2005 (4 posts) Archive Link: "Linux"

Topics: FS: sysfs, I2C, SMP, USB

People: Greg KHLee RevellChris WrightDavid S. MillerJohannesJean DelvareAlexander Nyberg

Greg KH announced Linux, saying:

As the -stable patch review cycle is now over, I've released the kernel in the normal places. Due to some disagreement over some of the patches in the review cycle, I've dropped a number of them.

The diffstat and short summary of the fixes are below.

I'll also be replying to this message with a copy of the patch between and, as it is small enough to do so.

And a personal thanks to OSU for letting me bore them by doing this in their meeting.

 Makefile                                 |    4 ++--
 arch/sparc/kernel/ptrace.c               |   12 ------------
 arch/sparc64/kernel/ptrace.c             |   19 -------------------
 arch/sparc64/kernel/signal32.c           |    5 ++++-
 arch/sparc64/kernel/systbls.S            |    2 +-
 arch/um/include/sysdep-i386/syscalls.h   |   12 ++++++------
 arch/um/include/sysdep-x86_64/syscalls.h |    5 -----
 arch/um/kernel/sys_call_table.c          |   11 ++++-------
 drivers/i2c/chips/it87.c                 |    2 +-
 drivers/i2c/chips/via686a.c              |    2 +-
 drivers/media/video/bttv-cards.c         |    2 --
 fs/partitions/msdos.c                    |    5 +++++
 security/keys/key.c                      |    3 ++-
 13 files changed, 26 insertions(+), 58 deletions(-)

Summary of changes from v2.6.11.7 to v2.6.11.8

Alexander Nyberg:

David S. Miller:

Greg Kroah-Hartman:

Jean Delvare:

Johannes Stezenbach:

Paolo 'Blaisorblade' Giarrusso:

Lee Revell asked, "Why didn't the fix for losing the keyboard when unplugging a USB audio device go in? That was a serious bug that bit many, many users." Chris Wright replied, "They came in while we were already in the review process. They'll have to be queued for next review cycle."

10. Documentation For realtime-preempt Patchset

29 Apr 2005 - 30 Apr 2005 (7 posts) Archive Link: "Updated realtime-preempt documentation"

Topics: Real-Time

People: Michael J. CohenLee RevellJohn CooperJonathan CorbetIngo Molnar

Michael J. Cohen said, "I've been following Ingo Molnar and friends' lovely realtime-preempt patchset. I'm curious, though, the only piece of documentation I've found is which is, indeed, quite old. Is anyone planning on updating this in the near future or is it in too much flux? Should I try and make heads or tails of the code first?" Lee Revell replied, "I think it's changing too fast for anyone to have bothered to document it yet. But, you could make some pretty good documentation based on the LKML discussions of RT preemption, especially Ingo's posts." Lee added later, "Right now there's this: It's slightly outdated and focuses only on using the RT kernel for low latency audio with JACK, but I think it's the best user level doc so far." Close by, John Cooper remarked, "I'd cobbled together documentation for internal use here though it lacks an "overall concepts" wrapper. I should be revisiting this in a week or so and will look into making it generally available." And also close by, Jonathan Corbet said:

Don't know if it's what you're after, but I've written some on the realtime preemption patches:

11. Clarifying I2C Driver Dependencies

29 Apr 2005 - 30 Apr 2005 (4 posts) Archive Link: "tighten i2c dependancies"

Topics: I2C

People: Dave JonesGeert UytterhoevenChrister WeinigelChristoph Hellwig

Dave Jones said that a lot of I2C drivers "show up on pretty much every arch regardless of whether they make sense." He posted a patch that "adds a bunch of additional dependancies tying down platform specific drivers that are unlikely to be used on other archs." Christoph Hellwig, Geert Uytterhoeven, and Christer Weinigel helped track down various dependencies.

12. Removing BitKeeper Documentation From The Kernel

1 May 2005 - 2 May 2005 (9 posts) Archive Link: "[2.6 patch] remove BK documentation"

Topics: Version Control

People: Jeff GarzikBill DavidsenAdrian Bunk

Adrian Bunk pointed out that there was no longer any reason for the kernel source tree to document BitKeeper usage, and posted a patch to remove those docs from the tree. Jeff Garzik, the author of the bulk of the documentation in question, resented not being CCed on the patch removing it, but still approved of the patch. Adrian said he would have CCed Jeff, if Jeff had been listed in the files. Jeff pointed out that Adrian should have consulted the revision history in that case. Jeff said, "Files you wish to remove were obviously written by -somebody-. When removing things, make a serious effort to contact the author."

Close by, Bill Davidsen said:

This seems like a good place to thank Adrian for his cleaning fetish, which makes the kernel code and docs far less confusing, and Jeff, who put in most of the effort in documenting bk.

Documentation authors really should mention themselves in the introduction, docs aren't sexy and don't get your name in the news, but they are a vital part of making Linux usable.

13. JFSutils Version 1.1.8 Released

3 May 2005 (1 post) Archive Link: "[ANNOUNCE] jfsutils-1.1.8"

Topics: FS: JFS

People: Dave Kleikamp

Dave Kleikamp said:

Release 1.1.8 of jfsutils was made available today.

This release include the following changes to the utilities:

For more details about JFS, please see our website:

14. yaird 0.0.6 Released

3 May 2005 (1 post) Archive Link: "[ANNOUNCE] yaird 0.0.6, a mkinitrd based on hotplug concepts"

Topics: Disk Arrays: LVM, FS: NFS, FS: devfs, Hot-Plugging

People: Erik van Konijnenburg

Erik van Konijnenburg said:

Version 0.0.6 of yaird is now available at:

Yaird is a proof of concept perl rewrite of mkinitrd. It aims to reliably identify the necessary modules by using the same algorithms as hotplug, and comes with a template system to to tune the tool for different distributions and experiment with different image layouts. It requires a 2.6 kernel with hotplug. There is a paper discussing it at:

Summary of user visible changes for version 0.0.6

On top of the todo list are now:

15. Mercurial Version 0.4c Released

3 May 2005 - 5 May 2005 (4 posts) Archive Link: "Mercurial v0.4c"

People: Matt MackallJeff Garzik

Matt Mackall said:

A new version of Mercurial is available at:

This version is officially self-hosting, now that I've added the final planned changed to the metadata. To pull the repo, do:

hg init
hg merge

This version fixes numerous reported bugs, adds a "verify" command to check the repository integrity, transaction handling, and some minor speed improvements.

Jeff Garzik asked, "Can you make it do HTTP 1.1 pipelining?" Matt replied:

Yes, a zsync-like protocol ought to be doable. But you'll still potentially be doing 16k requests to pull something the size of the kernel, which isn't very friendly to a web server. So I'm working on a stand-alone or possibly CGI-based replacement.

My goal is to do something like this:

client server
I last saw change N from you
W, X, Y, and Z are newer here
Send me X, Y, and Z relative to N
Here you go, deltas from N to X to Y to Z, sorted by file

So not only can we be efficient in number of round trips and data transferred, we can reduce seeks by applying all per-file changes together. We can also usually avoid decompress/recompress and patch/diff because both ends will end up storing the same delta.

16. Linux-iSCSI High-Performance Initiator Version Released

4 May 2005 (1 post) Archive Link: "[ANNOUNCE] Linux-iSCSI High-Performance Initiator"

Topics: Disks: SCSI

People: Alex Aizman

Alex Aizman said:

This is to announce a new release of the iSCSI Initiator for Linux: v5.0.0.3rc2 for 2.6.12 kernel. The previous (2nd) submission (posted 04/12/05) can be located at:

The very first submission is here:

Current release is result of the ongoing effort by the combined linux-iscsi team. In-depth information on the project, including the latest download, performance results, etc. documentation can be found at:


He added, "This Initiator will work with the new iSCSI transport class from the (very) recent submission by Mike Christie. The related (and required) submission can be located at:" . He also said, "The assoicated userspace tools can be downloaded from"

17. NTFS Development Switches From BitKeeper To git

5 May 2005 (1 post) Archive Link: "ntfs development git tree for -mm"

Topics: FS: NTFS, Version Control

People: Anton Altaparmakov

Anton Altaparmakov said:

The former ntfs-2.6-devel BK repository is now converted to a GIT repository and is available from:


It would be great if you could add it to the -mm tree.

Please let me know if you have any problems with this tree or indeed if you prefer a patch / patches (against what?) that I can make available to you instead.







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.