Kernel Traffic
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic

Kernel Traffic #307 For 26 Apr 2005

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 2975 posts in 16MB. See the Full Statistics.

There were 808 different contributors. 316 posted more than once. The average length of each message was 85 lines.

The top posters of the week were: The top subjects of the week were:
259 posts in 2MB by 'andrew morton'
102 posts in 555KB by ingo molnar
83 posts in 502KB by greg kh
80 posts in 453KB by adrian bunk
72 posts in 390KB by linus torvalds
192 posts in 794KB for "kernel scm saga.."
110 posts in 543KB for "non-free firmware in kernel modules, aggregation and unclear copyright notice."
83 posts in 347KB for "more git updates.."
78 posts in 398KB for "non-free firmware in kernel modules, aggregation and unclear"
69 posts in 440KB for "2.6.12-rc2-mm1"

These stats generated by mboxstats version 2.7

1. Some Intricacies Of klists And Semaphores In The Driver Model

26 Mar 2005 - 7 Apr 2005 (28 posts) Archive Link: "klists and struct device semaphores"

Topics: Disks: SCSI, FS: sysfs, USB

People: Alan SternPatrick Mochel

Alan Stern said to Patrick Mochel:

Your recent series of driver model patches naturally divides up into two parts: those involved with implementing and using klists, and those involved with adding a semaphore to struct device. Let's consider these parts separately.

The klist stuff embodies a good idea: safe traversal of lists while nodes are added and removed. Your klist library is intended for generic use and hence it necessarily is not tightly integrated with the data type of the list objects. This can lead to problems, as discussed below. I'll start with the least important issues and work up to some big ones.

  1. Your structures contain more fields than I would have used. klist_node.n_klist isn't really needed; it only gets used when removing a klist_node from a klist, and at such times the caller could easily pass the klist directly. Also, klist_iter.i_head isn't needed because it's easily derivable from klist_iter.i_klist. (Doesn't really matter because there never are very many iterators in existence.)
  2. Likewise I'm not so sure about klist_node.n_removed. Waiting for removal operations to complete is generally a bad idea; in the driver-model core it's important only when deregistering a driver. The only other scenario I can think of where you would need to wait is when you remove an object from a klist and then want to put it on another (see also (5)). While this might be necessary for general-purpose usage, the driver-model core doesn't do it. It also means that moving an object from one klist to another can't be carried out in_interrupt. (Not to mention you're adding a lot of struct completions all over the place, most of which shouldn't need to be used.)
  3. Your iteratators don't allow for some simple optimizations. For example, a bus's driver<->device matching routine should be able to run while retaining the klist's spinlock.
  4. You don't have any way of marking klist_nodes that have been removed from their klist. So an iterator will return these nodes instead of skipping right past them, as one would expect. This can have unpleasant consequences, such as probing a new device against a driver that has been unregistered. The klist_node's container will be forced to have its own "deleted" marker, and callers will have to skip deleted entries by hand.
  5. Most importantly, klist_nodes aren't protected against their containers being deallocated. Or rather, the way in which you've set up the protection is inappropriate. Right now you force a caller to wait until list-removal is complete, and only later is it allowed to deallocate the container object. This is a violation of the principles underlying the reference-counting approach; instead a klist_node should take a reference to its container. But your generic library-based approach doesn't allow that, at least, not without some awkward coding. This also means that iterating through a klist requires acquiring and releasing two references at each step: one for the klist_node and one for the container.

Some of these weaknesses are unavoidable (they're also present in the outline Dmitry proposed).

Let's move on to consider the new struct device.sem. You've recognized, like other people, that such a thing is necessary to protect drivers against simultaneous callbacks. But things aren't as simple as just sticking the semaphore into the structure and acquiring it for each callback! It requires much more careful thought.

  1. Your code doesn't solve the race between driver_unregister and device_add. What happens if a driver is unregistered at the same time as it's being probed for a new device? Maybe I missed something, but it looks like you might end up with the device bound to a non-existent driver. (And what about the other way 'round, where a device is unregistered at the same time as it's being probed by a new driver? That's easier to get right but I haven't checked it.)
  2. Adding lots of semaphores also adds lots of new possibilities for deadlock. You haven't provided any rules for locking multiple semaphores. Here are a few potentially troublesome scenarios:

    A driver being probed for newly-discovered hardware registers a child device (e.g., a SCSI adapter driver registers its SCSI host).

    Autoresume of child device requires the parent to be resumed as well.

    During remove a driver wants to suspend or resume its device.

    There's also a possibility for deadlock with respect to some lock private to a subsystem. Of course such things can't be solved in the core; the subsystem has to take care of them.

  3. A subsystem driver might want to retain control over a new device around the device_add call -- it might want to hold the device's semaphore itself the whole time. There need to be entry points in which the caller holds the necessary semaphores.
  4. Your scheme doesn't allow any possibility for complete enumeration of all children of a device; new children can be added at any time. So for example, checking that all the children are suspended (and preventing any from being resumed!) while suspending a device becomes very difficult.

To solve this last problem, my thought has always been that adding a device to the list of its parent's children should require holding the parent's lock. There's room for disagreement. But note that there's code in the kernel which does tree-oriented device traversals; by locking each node in turn the traversal could be protected against unwelcome changes in the device tree.

The final issue I have is more complex; it has to do with the peculiar requirements of the USB subsystem. In case you're not familiar with the details of how USB works (and for the benefit of anyone else reading this message), here's a capsule description:

A USB device can have multiple functional units, called "interfaces" for some strange unknown reason. It's the interfaces that do the actual work; each one gets its own struct device (in addition to the struct device allocated for the USB device itself) and its own driver. While most actions are local to a single interface, there are a few that affect the entire device including all the interfaces. The most obvious ones are suspend, resume, and device-reset. More important and harder to deal with is the Set-Configuration command, which destroys all the current interfaces and creates a whole set of new ones.

For proper protection, the USB subsystem requires that the overall device be locked during suspend, resume, reset, and Set-Config. This also involves locking the device during any call to a driver callback -- but now the struct device being locked is the _parent_ of the one bound to the driver (i.e., the interface's parent).

At the moment this locking is handled internally by the subsystem. But in one respect it conflicts badly with the operation of the driver-model core: when a driver is registered or unregistered. At such times the subsystem isn't in control of which devices are probed or unbound. I ended up "solving" this by adding a second layer of locking, which effectively permits _all_ the USB devices to be locked during usb_register and usb_deregister. It's awkward and it would benefit from better support from the driver-model core.

Such support would have to take the form of locking a device's parent as well as the device itself, minimally when probing a new driver and unbinding a deregistered driver, possibly at other times as well. As far as I know, USB is the only subsystem to require this and it's probably something you don't want to do if you don't have to. I don't know what the best answer is. It was a struggle to get where we are now and we only had to worry about locking USB devices; locking the interfaces too adds a whole new dimension.

Patrick liked Alan's point 1, and thanked him for the suggestions. For Alan's point 2 (waiting for removal operations to complete), Patrick replied, "It's important when removing a containing object's knode from the list when that object is about to be freed. This happens during both device and driver unregistration. In most cases, the removal operation will return immediately. When it doesn't, it means another thread is using that particular knode, which means its imperative that the containing object not be freed." He asked if Alan had any alternative code to suggest; Alan did indeed have some code to share; Patrick was not entirely convinced, and the discussion went off-list.

To Alan's point 3 (iterator optimization), Patrick replied, "It's trivial to add a helper that holds a lock across an entire iteration. However, we currently don't separate the bus->match() and the driver->probe() operations. We must not hold a spinlock across ->probe(), which means we drop the lock before both ->match() and ->probe(). However, it might be interesting to split those up and do a locked iteration just to find a match, grab a reference for the driver, break out of the iteration, and call ->probe()."

To Alan's point 4 (skipping deleted klist entries by hand), Patrick replied, "Good point. It's trivial to add an atomic flag (.n_attached) which is checked during an iteration. This can also be used for the klist_node_attached() function that I posted a few days ago (and you may have missed)." Alan remarked, "There's no need for the flag to be atomic, since it's only altered while the klist's lock is held." Patrick replied:

In principle, you're right. Kind of. We need to tie the "usage" reference count of the klist_node to the containing objects' "lifetime" count. But, there is no need to confuscate the klist code to do it. At least not at this point.

The subsystems that use the code must be sure to appropriately manage the lifetime rules of the containing objects. That is true no matter what. When they add a node, they should increment the reference count of the containing object and decrement when the node is removed. If practice shows that there is more that can be rolled into the model, then we can revisit it later.

[ Sidebar: Perhaps we can add a callback parameter to klist_remove() to call when the node has been removed, instead of the struct completion. ]

To Alan's point 5 (deallocation protection), Patrick replied, "It's assumed that the controlling subsystem will handle lifetime-based reference counting for the containing objects. If you can point me to a potential usage where this assumption would get us into trouble, I'd be interested in trying to work arond this." But Alan replied:

It's not that you get into trouble; it's that you're forced to wait for klist_node.n_removed when you shouldn't have to. To put it another way, one of the big advantages of the refcounting approach is that it allows you to avoid blocking on deallocations -- the deallocation happens automatically when the last reference is dropped. Your code loses this advantage; it's not the refcounting way.

If you replace the struct completion with the offset to the container's kref and make the klist_node hold a reference to its container, as described above, then this unpleasantness can go away.

To Alan's point 6 (driver_unregister and device_add race), Patrick replied:

The only race I see is the klist_remove() in bus_remove_driver() racing with the iteration of the klist in device_attach(). The former will block until the driver.knode_bus reference count reaches 0, which will happen when the ->probe() is over and the iteration stops. The klist_remove() will finish, then each device attached to the driver will be removed. That's less than ideal, but it should work.

To help that a bit, we could add a get_driver()/put_driver() pair to __device_attach(), which would prevent the driver from being removed while we're calling ->probe().

Alan took another look, and said, "You're right. Instead of a race that needs to be resolved, you have a potential for an extra sleep in bus_remove_driver(). It's not a problem."

To Alan's point 7 (deadlocks due to semaphore proliferation), Patrick replied, "For now, I'm willing to punt on those and consider them subsystem-specific until more is known about those situations' characteristics. As it currently stands, the core will not lock more than 1 device at a time. The subsystems can know that and lock devices appropriately." But Alan replied:

That's absolutely not true. Whenever a probe() routine registers a new device, the core will acquire nested locks. This happens in a number of places. Likewise when a remove() routine unregisters a child device.

You need to formalize the locking rule: Never lock a device while holding one of its descendants' locks.

Patrick agreed that this latter statement was a good rule, but was definitely subsystem-specific. He rephrased his own objections to Alan's point, saying, "The driver core will never explicitly take more than 1 lock. It will lock a device during calls to the driver routines, which the drivers should be aware of when re-entering the driver model via those functions." This made sense to Alan, but he suggested that his formalized locking rule - that Patrick had agreed with - "should be mentioned somewhere in the kerneldoc, maybe near the declaration of your new device_lock() routine. You might also want to mention explicitly that a probe() routine shouldn't call through the driver core to suspend its device because it would deadlock; it should simply do the suspend directly."

To Alan's point 8 (subsystem driver wanting to retain control of a new device around the device_add() call), Patrick asked:

Out of curiosity, why would a subsystem want to do this? Would it be something like this:

create device
lock device
do other stuff
unlock device (and let other things happen to it)

? If so, what do you want to protect against, suspend/resume? In cases like this, do you still want to do driver probing, or do you know a priori what the driver is?

Alan confirmed that this was the scenario he'd envisioned. He explained, "The case I had in mind was adding a new USB device. The USB core wants to retain control of the device at least through the point where it chooses and sets a new configuration -- otherwise userspace might do so first. We ought to be able to work around this by locking the device after calling device_add() and before usb_create_sysfs_dev_files(). In this case the driver is known a priori." Patrick asked, "How is this a driver model problem if it can be fixed locally?" Alan replied, "It isn't a problem. Forget I brought it up -- if anything ends up going wrong I'll let you know."

To Alan's point 9 (child enumeration), Patrick said he didn't quite get Alan's point. There didn't seem to be any problem with adding children at any time, since there was no way to prevent it; and any suspend/resume issues were not clear to him either. Alan replied that it was precisely because there was no way to prevent adding children at any time in Patrick's scheme, that the suspend/resume part became a problem. He explained:

Look at what happens when a driver wants to suspend a device. If there are any unsuspended children it will lead to trouble. (Note this concerns runtime PM only; for system PM we already know that all the children are suspended.) So the driver loops through all the children first, making sure each one is already suspended (if not then the suspend request must fail). At the end it knows it can safely suspend the device.

But! What if another child is added in the interim, so the loop misses it? And there's a related problem: What if one of the existing children gets resumed after it was checked but before the parent can be suspended?

The first problem could be solved at the driver level, by using an additional private semaphore to block attempts at adding new children. On the other hand, if the core always locked the parent while adding a child then a separate private semaphore wouldn't be needed. The driver could simply use the pre-existing device->sem.

The second problem can be solved in a couple of ways. The most obvious is for the driver to lock all the children while checking that they are already suspended, then unlock all of them after suspending the parent. Alternatively the resume pathway could be changed, so that to resume a device both it and its parent have to be locked. (The alternative might not work as well in practice, because drivers are likely to resume devices on demand directly, without detouring through the core routines. Even if the core was careful always to lock the parent before a resume, drivers might not be so careful when bypassing the core.)

Patrick replied:

I don't like the idea of locking every single child just to check if they're suspended. Sounds messy.

How about we just add a counter to the parent for all of the running (not suspended) children. When a child is suspended, this counter is decremented. When the counter reaches 0, the parent can suspend. This makes the check simple when a parent is directed to suspend.

We can have the counter modifications block on the semaphore, so if there is an update to it while the parent is doing something, it will be queued up appropriately. Incrementing this past 0 should automatically resume the parent (or at least check that it's resumed). This will guarantee that the parent is woken up for any children that are added.

In fact, we probably want to add a counter to every device for all "open connections" so the device doesn't try to automatically sleep while a device node is open. Once it reaches 0, we can have it enter a pre-configured state, which should save us a bit of power for very little pain.

Alan asked:

By "open connections", do you mean something more than unsuspended children?

Are you proposing to add these counters to struct device? If so, would they be used and maintained by the core or by the driver/subsystem? I should think the core wouldn't know enough about the requirements of different devices to do anything useful. But then if the core doesn't use the counters they should be stored in a private data structure, not in struct device.

Patrick confirmed that he did indeed mean something more than unsuspended children by "open connections". He explained, "I mean anything that requires the device be awake and functional. This would include open device nodes for many devices, open network connections for network devices, active children for bridges and controllers, etc. This will require modification of at least the open() routines at the subsystem level. They can simply access the class device and call down to the driver, with some help from some core utility functions and some hand waving. The driver (or bus subsystem) can determine if the parent needs to be awakened at that same time, and awaken it if necessary." Regarding adding the counters to struct device, Patrick said:

The core would know very little to be useful. However, it would most likely need to modify them around calls to e.g. probe()/remove() to make sure the device is functional when accessing it. Maybe.

At the very least, the shortest path to getting every device working with this is to modify the subsystems' open calls. The only way to bridge their notion of class-specific objects (and class_devices) with physical devices is through the core.

So, I think we need the counter in struct device, along with some helper functions.

2. Linux 2.6.12-rc2 Released

4 Apr 2005 - 8 Apr 2005 (27 posts) Archive Link: "Linux 2.6.12-rc2"

Topics: Kernel Release Announcement

People: Linus TorvaldsGene HeskettDave JonesAndres SalomonPanagiotis Issaris

Linus Torvalds announced Linux 2.6.12-rc2, saying:

this is a lot of very small changes, ie tons of small cleanups and bug fixes. With a few new drivers thrown in for good measure.

This is also the point where I ask people to calm down, and not send me anything but clear bug-fixes etc. We're definitely well into -rc land. So keep it quiet out there

Andres Salomon pointed out that a couple of IRQ fixes Linus had attributed to him had actually been produced by Panagiotis Issaris; Andres had merely forwarded them on.

Gene Heskett said to Linus, "I'm happy to report that it not only built, it booted, and that the one program thats been a noshow for video, tvtime, in any kernel newer than -rc1, failing in all the . releases after .2, or any -mm I tried, is working quite nicely thank you in -rc2."

Elsewhere, Bob Gill reported that NVidia modules no longer worked for him, and Dave Jones replied, "Totally unsurprising. They'll need serious brain surgery to work with the multi-gart support. I'm amazed they even compiled for you."

3. Linux 2.6.12-rc2-mm1 Released

4 Apr 2005 - 12 Apr 2005 (90 posts) Archive Link: "2.6.12-rc2-mm1"

Topics: FS: XFS, Kernel Release Announcement, Microsoft, Software Suspend, USB

People: Andrew MortonDave AirlieChristophe SaoutBarry K. NathanPavel MachekNathan Scott

Andrew Morton announced Linux 2.6.12-rc2-mm1, saying:

About the PM resume and DRI behavior, Dave Airlie said, "Well the DRI is, both reports of bugs have been fixed :-), the bug should be closed on I think, and it looks rock solid on my box both FC3 and Debian sarge.." Christophe Saout added elsewhere, "works for me. DRI (i915) is working again and USB is now happy after a PM resume too." Barry K. Nathan also replied to Andrew's question about whether the PM resume and DRI behavior problems had been fixed. Barry said:

No, I just didn't get a chance to send mail yet.

Compared to 2.6.11-ac5, I'm seeing one regression: the part of the resume where it says something like:

swsusp: reading slkf;jalksfsadflkjas;dlfasdfkl (12345 pages): 34% [sorry, I just got up so my short-term memory isn't working that well yet]

takes 10-30 minutes (depending on whether it's closer to 11000 pages or 20000) rather than the 5-10 seconds or so that it takes under 2.6.11-ac5 (or mainline 2.6.11 if I remember correctly).

However, this is not vanilla 2.6.12-rc1-mm4. It has my own modified version of the Win4Lin patch applied; this is GPL, but the userspace program that uses this patch (Win4Lin 5.1) isn't, nor is the software ultimately executed by this patch via Win4Lin (Microsoft Windows Millennium Edition[*]). And I didn't try swsusp on any kernels between 2.6.11 and 2.6.12-rc1-mm4.

I'll try to do some more testing to see (a) when this problem started and (b) whether it still exists in 2.6.12-rc2 or later. This is going to be ridiculously difficult for me to fit into my schedule right now, but I'll try....

BTW, ieee1394 is still broken after resume (impossible to rmmod, too) and snd_cmipci (but this can be resurrected by quitting anything that uses sound, rmmod snd_cmipci, then modprobe snd_cmipci). But these are long-standing issues, and under 2.6.12-rc1-mm4, ieee1394/sbp2 can at least stay up indefinitely as long as I don't suspend -- that's a tremendous improvement over 2.6.11.

Barry tried to dig a little deeper, and several posts later, said:

Ok, I've narrowed the problem down to one patch. In 2.6.11-mm3, the problem goes away if I remove this patch:

(Recap of the problem in case this gets forwarded: Resume is almost instant without the apparently-guilty patch. With the patch, resume takes almost half an hour.)

BTW, there's another strange thing that's introduced by 2.6.11-rc2-mm1: With that kernel, suspend is also ridiculously slow (speed is comparable to the slow resume with the aforementioned patch). 2.6.11-rc2 does not have that problem.

Also, with 2.6.12-rc2-mm1, this computer happens to hit the bug where all the printk timestamps are 0000000.0000000 (don't take the # of digits too literally). Probably unrelated, but I may as well mention it. (System is an Athlon XP 2200+ with SiS chipset. I can't remember which model of SiS chipset.)

Andrew said that swsusp-enable-resume-from-initrd.patch seemed innocent, and forwarded the report along to its developers. Barry agreed the patch seemed innocent enough, but said he'd done a binary search that isolated just that patch as the culprit.

Pavel Machek came in at this point. He suspected the problem might be related to XFS. After confirming that Barry was indeed using XFS as his root filesystem, Pavel asked if Barry could try again, modularizing XFS and using an initrd. Barry tried this, and the problem immediately disappeared. Pavel replied, "I reproduced it locally. Problem is that xfsbufd goes refrigerated, but someone still tries to wake it up *very* often. Probably something else in xfs needs refrigerating, too, but I'm not a XFS wizard..." Nathan Scott replied:

Thanks Pavel - I've been reading the thread from the other side of the fence, not understanding the swsusp side of things. :)

There are two ways the xfsbufd thread will wake up - either by its timer going off (for it to flush delayed write metadata buffers) or by being explicitly woken up when we're low on memory (in which case it also flushes out dirty metadata, such that pages can be cleaned and made available to the system).

Since the refrigerator() call is in place in the main xfsbufd loop, I suspect we're hitting that second case here, where a low memory situation is resulting in someone attempting to wakeup xfsbufd -- I'm not sure if this is the right way to check if we're in that state, but does this patch help? (it would certainly prevent the spurious wakeups, but only if the caller has PF_FREEZE set - will that be the case here?)

He added later, "if that doesn't help, here's an alternate approach - this lets xfsbufd track when its entering the refrigerator(), so that other callers know that attempts to wake it are futile." Pavel confirmed that the patch helped. Barry added, "I can confirm, the 2nd patch worked and the 1st one didn't. (This is against 2.6.12-rc2-mm1 with sched-x86-patch-name-is-way-too-long.patch backed out. ;) )"

4. Review Period Leading Toward

5 Apr 2005 - 7 Apr 2005 (36 posts) Archive Link: "[00/11] -stable review"

Topics: Networking, Sound: ALSA

People: Greg KHDavid HowellsTheodore Y. Ts'oDavid S. MillerStephen HemmingerRenate MeijerPaolo GiarrussoDavid WoodhouseLinus Torvalds

Greg KH announced:

This is the start of the stable review cycle for the release. There are 8 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let us know. If anyone is a maintainer of the proper subsystem, and wants to add a Signed-off-by: line to the patch, please respond with it.

These patches are sent out with a number of different people on the Bcc: line. If you wish to be a reviewer, please email to add your name to the list. If you want to be off the reviewer list, also email us.

Responses should be made by Thursday, April 7, 17:00 UTC. Anything received after that time, might be too late.

He posted 8 patches: an ALSA timer bug fix, a jbd race condition fix, an eeprom driver oops fix, and an IPsec deadlock. He also posted a patch from David Howells, and gave David's changelog text:

The attached patch makes read/write semaphores use interrupt disabling spinlocks in the slow path, thus rendering the up functions and trylock functions available for use in interrupt context. This matches the regular semaphore behaviour.

I've assumed that the normal down functions must be called with interrupts enabled (since they might schedule), and used the irq-disabling spinlock variants that don't save the flags.

Greg said, "We should merge this backport - it's needed to prevent deadlocks when dio_complete() does up_read() from IRQ context. And perhaps other places."

Greg also posted another patch, saying:

Attached is a patch against David's audit.17 kernel that adds checks for the TIF_SYSCALL_AUDIT thread flag to the ia64 system call and signal handling code paths. The patch enables auditing of system calls set up via fsys_bubble_down, as well as ensuring that audit_syscall_exit() is called on return from sigreturn.

Neglecting to check for TIF_SYSCALL_AUDIT at these points results in incorrect information in audit_context, causing frequent system panics when system call auditing is enabled on an ia64 system.

I have tested this patch and have seen no problems with it.

[Original patch from Amy Griffis ported to current kernel by David Woodhouse]

He also posted another patch, saying:

Since BIC is the default congestion control algorithm enabled in every 2.6.x kernel out there, fixing errors in it becomes quite critical.

A flaw in the loss handling caused it to not perform the binary search regimen of the BIC algorithm properly.

The fix below from Stephen Hemminger has been heavily verified.

[TCP]: BIC not binary searching correctly

While redoing BIC for the split up version, I discovered that the existing 2.6.11 code doesn't really do binary search. It ends up being just a slightly modified version of Reno. See attached graphs to see the effect over simulated 1mbit environment.

The problem is that BIC is supposed to reset the cwnd to the last loss value rather than ssthresh when loss is detected. The correct code (from the BIC TCP code for Web100) is in this patch.

Theodore Y. Ts'o objected:

I hate to be a stickler for the rules, but does this really meet this criteria?

If the congestion control alogirthm is "Reno-like", what is user-visible impact to users? There are OS's out there with TCP/IP stacks that are still using Reno, aren't there?

Knowing the answer to the question, "How does this bug `bother' either users or network administrators?" would probably be really helpful.

David S. Miller replied:

An incorrect implementation of any congestion control algorithm has ramifications not considered when the congestion control author verified the design of his algorithm.

This has a large impact on every user on the internet, not just Linux machines.

Perhaps on a microscopic scale "this" part of the BIC algorithm was just behaving Reno-like due to the bug, but what implications does that error have as applied to the other heuristics in BIC? This is what I'm talking about. BIC operates in several modes, one of which is a pseudo binary search mode, and another is a less aggressive slower increase mode.

Therefore I think fixes to congestion control algorithms which are enabled by default always should take a high priority in the stable kernels.

And Stephen Hemminger remarked, "Also, hopefully distro vendors will pick up 2.6.11.X fixes and update their customers."

That was the end of that subthread, but Greg KH posted another patch in his series, to cause the kernel to use "__va_copy instead of va_copy since some old versions of gcc (2.95.4 for instance) don't accept va_copy." Renate Meijer said:

Are there many kernels still being built with 2.95.4? It's quite antiquated, as far as i'm aware.

The use of '__' violates compiler namespace. If 2.95.4 were not easily replaced by a much better version (3.3.x? 3.4.x) I would see a reason to disregard this, but a fix merely to satisfy an obsolete compiler?

In my humblest of opinions you are fixing a bug that is better solved by downloading a more recent version of gcc.

Paolo Giarrusso encouraged folks to avoid a compiler war, and added, " Linus Torvalds said "we support GCC 2.95.3, because the newer versions are worse compilers in most cases". One user complained, even because he uses Debian, and I cannot do less than make sure that we comply with the requirements we have choosen (compiling with that GCC)."

The discussion did begin to degenerate into a flame war, but at some point Greg said that Renate's objections were not significant enough to stop the patch; and the thread petered out.

5. Linus No Longer Using BitKeeper; Creates 'git' Replacement

6 Apr 2005 - 12 Apr 2005 (200 posts) Archive Link: "Kernel SCM saga.."

Topics: BSD: OpenBSD, Disks: IDE, MAINTAINERS File, Version Control

People: Linus TorvaldsGreg KHRik van RielJesse BarnesDaniel PhillipsKarl FogelMarcin DaleckiCatalin MarinasJan HudecPaul MackerrasAlbert D. CahalanMartin PoolJeff GarzikDavid LangWalter LandryAndrea ArcangeliDavid WoodhouseMatthias UrlichsAndrew WalrondAlexander ViroMiles BaderChris Wedgwood

Linus Torvalds said:

as a number of people are already aware (and in some cases have been aware over the last several weeks), we've been trying to work out a conflict over BK usage over the last month or two (and it feels like longer ;). That hasn't been working out, and as a result, the kernel team is looking at alternatives.

[ And apparently this just hit slashdot too, so by now _everybody_ knows ]

It's not like my choice of BK has been entirely conflict-free ("No, really? Do tell! Oh, you mean the gigabytes upon gigabytes of flames we had?"), so in some sense this was inevitable, but I sure had hoped that it would have happened only once there was a reasonable open-source alternative. As it is, we'll have to scramble for a while.

Btw, don't blame BitMover, even if that's probably going to be a very common reaction. Larry in particular really did try to make things work out, but it got to the point where I decided that I don't want to be in the position of trying to hold two pieces together that would need as much glue as it seemed to require.

We've been using BK for three years, and in fact, the biggest problem right now is that a number of people have gotten very very picky about their tools after having used the best. Me included, but in fact the people that got helped most by BitKeeper usage were often the people _around_ me who had a much easier time merging with my tree and sending their trees to me.

Of course, there's also probably a ton of people who just used BK as a nicer (and much faster) "anonymous CVS" client. We'll get that sorted out, but the immediate problem is that I'm spending most my time trying to see what the best way to co-operate is.

NOTE! BitKeeper isn't going away per se. Right now, the only real thing that has happened is that I've decided to not use BK mainly because I need to figure out the alternatives, and rather than continuing "things as normal", I decided to bite the bullet and just see what life without BK looks like. So far it's a gray and bleak world ;)

So don't take this to mean anything more than it is. I'm going to be effectively off-line for a week (think of it as a normal "Linus went on a vacation" event) and I'm just asking that people who continue to maintain BK trees at least try to also make sure that they can send me the result as (individual) patches, since I'll eventually have to merge some other way.

That "individual patches" is one of the keywords, btw. One thing that BK has been extremely good at, and that a lot of people have come to like even when they didn't use BK, is how we've been maintaining a much finer- granularity view of changes. That isn't going to go away.

In fact, one impact BK ha shad is to very fundamentally make us (and me in particular) change how we do things. That ranges from the fine-grained changeset tracking to just how I ended up trusting submaintainers with much bigger things, and not having to work on a patch-by-patch basis any more. So the three years with BK are definitely not wasted: I'm convinced it caused us to do things in better ways, and one of the things I'm looking at is to make sure that those things continue to work.

So I just wanted to say that I'm personally very happy with BK, and with Larry. It didn't work out, but it sure as hell made a big difference to kernel development. And we'll work out the temporary problem of having to figure out a set of tools to allow us to continue to do the things that BK allowed us to do.

Let the flames begin.

PS. Don't bother telling me about subversion. If you must, start reading up on "monotone". That seems to be the most viable alternative, but don't pester the developers so much that they don't get any work done. They are already aware of my problems ;)

Greg KH said, "I'd also like to publicly say that BK has helped out immensely in the past few years with kernel development, and has been one of the main reasons we have been able to keep up such a high patch rate over such a long period of time. Larry, and his team, have been nothing but great in dealing with all of the crap that we have been flinging at him due to the very odd demands such a large project as the kernel has caused. And I definitely owe him a beer the next time I see him." Rik van Riel added, "Seconded. Besides, now that the code won't be on bkbits any more, it's safe to get Larry drunk ;) Larry, thanks for the help you have given us by making bitkeeper available for all these years." Jess Barnes added, "A big thank you from me too, I've really enjoyed using BK and I think it's made me much more productive than I would have been otherwise. I don't envy you having to put up with the frequent flamefests..."

From the other side of the fence, Daniel Phillips said to Linus, "Well I'm really pleased to hear that you won't be drinking this koolaid any more. This is a really uplifting development for me, thanks."

Elsewhere, regarding Subversion, Karl Fogel remarked, "By the way, the Subversion developers have no argument with the claim that Subversion would not be the right choice for Linux kernel development. We've written an open letter entitled "Please Stop Bugging Linus Torvalds About Subversion" to explain why:" Marcin Dalecki replied, "Thumbs up "Subverters"! I just love you. I love your attitude toward high engineering quality. And I appreciate actually very much what you provide as software. Both: from function and in terms of quality of implementation."

Elsewhere in various places regarding monotone, there were a couple of different requests from Andrew Walrond for Linus to publish a 'monotone wishlist', but Linus did not respond to these. However, elsewhere, folks did try experimenting with monotone. Chris Wedgwood was particularly impressed with the features, but noted that a big drawback was speed. Monotone seemed incredibly slow. He did some quick calculations, and estimated that for a developer to pull the full kernel repository would take about 6 hours. Catalin Marinas also remarked, "I tried some time ago to import the BKCVS revisions since Linux 2.6.9 into a monotone-0.16 repository. I later tried to upgrade the database (repository) to monotone version 0.17. The result - converting ~3500 revisions would have taken more than *one year*, fact confirmed by the monotone developers. The bottleneck seemed to be the big size of the manifest (which stores the file names and the corresponding SHA1 values) and all the validation performed when converting."

Linus also replied to Chris, saying:

The silly thing is, at least in my local tests it doesn't actually seem to be _doing_ anything while it's slow (there are no system calls except for a few memory allocations and de-allocations). It seems to have some exponential function on the number of pathnames involved etc.

I'm hoping they can fix it, though. The basic notions do not sound wrong.

In the meantime (and because monotone really _is_ that slow), here's a quick challenge for you, and any crazy hacker out there: if you want to play with something _really_ nasty (but also very _very_ fast), take a look at

First one to send me the changelog tree of sparse-git (and a tool to commit and push/pull further changes) gets a gold star, and an honorable mention. I've put a hell of a lot of clues in there (*).

I've worked on it (and little else) for the last two days. Time for somebody else to tell me I'm crazy.

(*) It should be easier than it sounds. The database is designed so that you can do the equivalent of a nonmerging (ie pure superset) push/pull with just plain rsync, so replication really should be that easy (if somewhat bandwidth-intensive due to the whole-file format).

Never mind merging. It's not an SCM, it's a distribution and archival mechanism. I bet you could make a reasonable SCM on top of it, though. Another way of looking at it is to say that it's really a content- addressable filesystem, used to track directory trees.

Elsewhere, Jan Hudec surveyed some existing version control systems, and found that GNU Arch or Bazaar looked good, as did SVK, the distributed version of Subversion. He added, "I have looked at Monotone too, of course, but I did not find any way for doing cherry-picking (ie. skipping some changes and pulling others) in it and I feel it will need more rework of the meta-data before it is possible. As for the sqlite backend, I'd not consider that a problem." Matthias Urlichs favored Bazaar, listing several good features (multiple implementations, trivial export of patches). He also asked for Linus's 'SCM wishlist', but Linus again did not reply to this. Marcin spoke up against Arch and its variants, saying arch was not an example of a well-designed tool. Specifically, he found that arch had an unintuitive command interface, and arbitrary and typing-error-prone filename restrictions and enforced conventions. He was very down on arch entirely in fact, and recommended just using diff and patch, with miscellaneous scripts to keep things organized. Miles Bader said that Mathias's arguments were trivial. The two started to flame each other, and the arch subthread petered out.

Elsewhere, Paul Mackerras asked Linus what the plan was for the immediate-term. Would Linus go back to processing patches by hand, one per email? Linus replied:

Yes. That's going to be my interim, I was just hoping that with 2.6.12-rc2 out the door, and us in a "calming down" period, I could afford to not even do that for a while.

The real problem with the email thing is that it ends up piling up: what BK did in this respect was that anythign that piled up in a BK repository ended up still being there, and a single "bk pull" got it anyway - so if somebody got ignored because I was busy with something else, it didn't add any overhead. The queue didn't get "congested".

And that's a big thing. It comes from the "Linus pulls" model where people just told me that they were ready, instead of the "everybody pushes to Linus" model, where the destination gets congested at times.

So I do not want the "send Linus email patches" (whether mboxes or a single patch per email) to be a very long-term strategy. We can handle it for a while (in particular, I'm counting on it working up to the real release of 2.6.12, since we _should_ be in the calm period for the next month anyway), but it doesn't work in the long run.

Paul also asked about the oeverhead of this solution, in particular, "Do you have it automated to the point where processing emailed patches involves little more overhead than doing a bk pull?" Linus replied:

It's more overhead, but not a lot. Especially nice numbered sequences like Andrew sends (where I don't have to manually try to get the dependencies right by trying to figure them out and hope I'm right, but instead just sort by Subject: line) is not a lot of overhead. I can process a hundred emails almost as easily as one, as long as I trust the maintainer (which, when it's used as a BK replacement, I obviously do).

However, the SCM's I've looked at make this hard. One of the things (the main thing, in fact) I've been working at is to make that process really _efficient_. If it takes half a minute to apply a patch and remember the changeset boundary etc (and quite frankly, that's _fast_ for most SCM's around for a project the size of Linux), then a series of 250 emails (which is not unheard of at all when I sync with Andrew, for example) takes two hours. If one of the patches in the middle doesn't apply, things are bad bad bad.

Now, BK wasn't a speed deamon either (actually, compared to everything else, BK _is_ a speed deamon, often by one or two orders of magnitude), and took about 10-15 seconds per email when I merged with Andrew. HOWEVER, with BK that wasn't as big of an issue, since the BK<->BK merges were so easy, so I never had the slow email merges with any of the other main developers. So a patch-application-based SCM "merger" actually would need to be _faster_ than BK is. Which is really really really hard.

So I'm writing some scripts to try to track things a whole lot faster. Initial indications are that I should be able to do it almost as quickly as I can just apply the patch, but quite frankly, I'm at most half done, and if I hit a snag maybe that's not true at all. Anyway, the reason I can do it quickly is that my scripts will _not_ be an SCM, they'll be a very specific "log Linus' state" kind of thing. That will make the linear patch merge a lot more time-efficient, and thus possible.

(If a patch apply takes three seconds, even a big series of patches is not a problem: if I get notified within a minute or two that it failed half-way, that's fine, I can then just fix it up manually. That's why latency is critical - if I'd have to do things effectively "offline", I'd by definition not be able to fix it up when problems happen).

He also added:

I detest the centralized SCM model, but if push comes to shove, and we just _can't_ get a reasonable parallell merge thing going in the short timeframe (ie month or two), I'll use something like SVN on a trusted site with just a few committers, and at least try to distribute the merging out over a few people rather than making _me_ be the throttle.

The reason I don't really want to do that is once we start doing it that way, I suspect we'll have a _really_ hard time stopping. I think it's a broken model. So I'd much rather try to have some pain in the short run and get a better model running, but I just wanted to let people know that I'm pragmatic enough that I realize that we may not have much choice.

Albert D. Cahalan said:

I think you at least instinctively know this, but...

Centralized SCM means you have to grant and revoke commit access, which means that Linux gets the disease of ugly BSD politics.

Under both the old pre-BitKeeper patch system and under BitKeeper, developer rank is fuzzy. Everyone knows that some developers are more central than others, but it isn't fully public and well-defined. You can change things day by day without having to demote anyone. While Linux development isn't completely without jealousy and pride, few have stormed off (mostly IDE developers AFAIK) and none have forked things as severely as OpenBSD and DragonflyBSD.

You may rank developer X higher than developer Y, but they have only a guess as to how things are. Perhaps developer X would be a prideful jerk if he knew. Perhaps developer Y would quit in resentment if he knew.

Whatever you do, please avoid the BSD-style politics.

(the MAINTAINERS file is bad enough; it has caused problems)

Completely elsewhere, Paul P. Komkoff Jr. pointed out that Canonical had started up the Bazaar-NG project. Martin Pool replied:

I'd like bazaar-ng to be considered too. It is not ready for adoption yet, but I am working (more than) full time on it and hope to have it be usable in a couple of months.

bazaar-ng is trying to integrate a lot of the work done in other systems to make something that is simple to use but also fast and powerful enough to handle large projects.

The operations that are already done are pretty fast: ~60s to import a kernel tree, ~10s to import a new revision from a patch.

Please check it out and do pester me with any suggestions about whatever you think it needs to suit your work.

Jeff Garzik asked, "By "importing", are you saying that importing all 60,000+ changesets of the current kernel tree took only 60 seconds?" Martin replied:

Now that would be impressive.

No, I mean this:

 % bzcat ../linux.pkg/patch-2.5.14.bz2| patch -p1

 % time bzr add -v .
 (find any new non-ignored files; deleted files automatically noticed)
 6.06s user 0.41s system 89% cpu 7.248 total

 % time bzr commit -v -m 'import 2.5.14'
 7.71s user 0.71s system 65% cpu 12.893 total

(OK, a bit slower in this case but it wasn't all in core.)

This is only v0.0.3, but I think the interface simplicity and speed compares well.

I haven't tested importing all 60,000+ changesets of the current bk tree, partly because I don't *have* all those changesets. (Larry said previously that someone (not me) tried to pull all of them using bkclient, and he considered this abuse and blacklisted them.)

I have been testing pulling in release and rc patches, and it scales to that level. It probably could not handle 60,000 changesets yet, but there is a plan to get there. In the interim, although it cannot handle the whole history forever it can handle large trees with moderate numbers of commits -- perhaps as many as you might deal with in developing a feature over a course of a few months.

The most sensible place to try out bzr, if people want to, is as a way to keep your own revisions before mailing a patch to linus or the subsystem maintainer.

David Lang suggested, "pull the patches from the BK2CVS server. yes some patches are combined, but it will get you in the ballpark." Martin replied:

OK, I just tried that. I know there are scripts to resynthesize changesets from the CVS info but I skipped that for now and just pulled each day's work into a separate bzr revision. It's up to the end of March and still running.

Importing the first snapshot (2004-01-01) took 41.77s user, 1:23.79 total. Each subsequent day takes about 10s user, 30s elapsed to commit into bzr. The speeds are comparable to CVS or a bit faster, and may be faster than other distributed systems. (This on a laptop with a 5400rpm disk.) Pulling out a complete copy of the tree as it was on a previous date takes about 14 user, 60s elapsed.

I don't want to get too distracted by benchmarks now because there are more urgent things to do and anyhow there is still lots of scope for optimization. I wouldn't be at all surprised if those times could be more than halved. I just wanted to show it is in (I hope) the right ballpark.

This caught Linus's attention. He asked Martin to make a full kernel tree public so he could check it out. He added:

The reason I mention that is just that I know several SCM's bog down under load horribly, so it actually matters what the size of the tree is.

And I'm absolutely _not_ asking you for the 60,000 changesets that are in the BK tree, I'd be prfectly happy with a 2.6.12-rc2-based one for testing.

I know I can import things myself, but the reason I ask is because I've got several SCM's I should check out _and_ I've been spending the last two days writing my own fallback system so that I don't get screwed if nothing out there works right now.

Which is why I'd love to hear from people who have actually used various SCM's with the kernel. There's bound to be people who have already tried.

I've gotten a lot of email of the kind "I love XYZ, you should try it out", but so far I've not seen anybody say "I've tracked the kernel with XYZ, and it does ..."

So, this is definitely not a "Martin Pool should do this" kind of issue: I'd like many people to test out many alternatives, to get a feel for where they are especially for a project the size of the kernel..

Walter Landry replied:

At the end of my Codecon talk, there is a performance comparison of a number of different distributed SCM's with the kernel.

I develop ArX ( You may find it of interest ;)

Catalin Marinas also replied to Linus, saying:

I (successfully) tried GNU Arch with the Linux kernel. I mirrored all the BKCVS changesets since Linux 2.6.9 (5300+ changesets) using this script:

My repository size is 1.1GB but this is because the script I use creates a snapshot (i.e. a full tarball) of every main and -rc release. For each individual changeset, an arch repository has a patch-xxx directory with a compressed tarball containing the patch, a log file and a checksum file.

GNU Arch may have some annoying things (file naming, long commands, harder to get started, imposed version naming) and I won't try to advocate them but, for me, it looked like the best (free) option available regarding both features and speed. Being changeset oriented also has some advantages from my point of view. Being distributed means that you can create a branch on your local repository from a tree stored on a (read-only) remote repository (hosted on an ftp/http server).

I can't compare it with BK since I haven't used it.

The way I use it:

The main merge algorithm is called star-merge and does a three-way merge between the local tree, the remote one and the common ancestor of these. Cherry picking is also supported for those that like it (I found it very useful if, for example, I fix a general bug in a branch that should be integrated in the main tree but the branch is not yet ready for inclusion).

All the standard commands like commit, diff, status etc. are supported by arch. A useful command is "missing" which shows what changes are present in a tree and not in the current one. It is handy to see a summary of the remote changes before doing a merge (and faster than a full diff). It also supports file/directory renaming.

To speed things up, arch uses a revision library with a directory for every revision, the files being hard-linked between revisions to save space. You can also hard-link the working tree to the revision library (which speeds the tree diff operation) but you need to make sure that your editor renames the original file before saving a copy.

Having snapshots might take space but they are useful for both fast getting a revision and creating a revision in the library.

A diff command takes usually around 1 min (on a P4 at 2.5GHz with IDE drives) if the current revision is in the library. The tree diff is the main time consuming operation when committing small changes. If the revision is not in the library, it will try to create it by hard-linking with a previous one and applying the corresponding patches (later version I think can reverse-apply patches from newer revisions).

The merge operation might take some time (minutes, even 10-20 minutes for 1000+ changesets) depending on the number of changesets and whether the revisions are already in the revision library. You can specify a three-way merge that places conflict markers in the file (like diff3 or cvs) or a two-way merge which is equivalent to applying a patch (if you prefer a two-way merge, the "replay" command is actually the fastest, it takes ~2 seconds to apply a small changeset and doesn't need go to the revision library). Once a merge operation completes, you would need to fix the conflicts and commit the changes. All the logs are preserved but the newly merged individual changes are seen as a single commit in the local tree.

In the way I use it (with a linux--main--2.6 tree similar to bk-head) I think arch would get slow with time as changesets accumulate. The way its developers advise to be used is to work, for example, on a linux--main--2.6.12 tree for preparing this release and, once it is ready, seal it (commit --seal). Further commits need to have a --fix option and they should mainly be bug fixes. At this point you can branch the linux--main--2.6.13 and start working on it. This new tree can easily merge the bug fixes applied to the previous version. Arch developers also recommend to use a new repository every year, especially if there are many changesets.

A problem I found, even if the library revisions are hard-linked, they still take a lot of space and should be cleaned periodically (a cron script that checks the last access to them is available).

By default, arch also complains (with exit) about unknown files in the working tree. Its developer(s) believe that the compilation should be done in a different directory. I didn't find this a problem since I use the same tree to compile for several platforms. Anyway, it can be configured to ignore them, based on regexp.

I also tried monotone and darcs (since these two, unlike svn, can do proper merging and preserve the merge history) but arch was by far the fastest (CVS/RCS are hard to be bitten on speed).

Unfortunately, I can't make my repository public because of IT desk issues but let me know if you'd like me to benchmark different operations (or if you'd like a simple list of commands to create your own).

Hope you find this useful.

Andrea Arcangeli also replied to Linus, saying:

The huge number of changesets is the crucial point, there are good distributed SCM already but they are apparently not efficient enough at handling 60k changesets.

We'd need a regenerated coherent copy of BKCVS to pipe into those SCM to evaluate how well they scale.

OTOH if your git project already allows storing the data in there, that looks nice ;). I don't yet fully understand how the algorithms of the trees are meant to work (I only understand well the backing store and I tend to prefer DBMS over tree of dirs with hashes). So I've no idea how it can plug in well for a SCM replacement or how you want to use it. It seems a kind of fully lockless thing where you can merge from one tree to the other without locks and where you can make quick diffs. It looks similar to a diff -ur of two hardlinked trees, except this one can save a lot of bandwidth to copy with rsync (i.e. hardlinks becomes worthless after using rsync in the network, but hashes not). Clearly the DBMS couldn't use the rsync binary to distribute the objects, but a network protocol could do the same thing rsync does.

Completely elsewhere, David Woodhouse suggested, "One feature I'd want to see in a replacement version control system is the ability to _re-order_ patches, and to cherry-pick patches from my tree to be sent onwards. The lack of that capability is the main reason I always hated BitKeeper" Linus replied:

I really disliked that in BitKeeper too originally. I argued with Larry about it, but Larry (correctly, I believe) argued that efficient and reliable distribution really requires the concept of "history is immutable". It makes replication much easier when you know that the known subset _never_ shrinks or changes - you only add on top of it.

And that implies no cherry-picking.

Also, there's actually a second reason why I've decided that cherry- picking is wrong, and it's non-technical.

The thing is, cherry-picking very much implies that the people "up" the foodchain end up editing the work of the people "below" them. The whole reason you want cherry-picking is that you want to fix up somebody elses mistakes, ie something you disagree with.

That sounds like an obviously good thing, right? Yes it does.

The problem is, it actually results in the wrong dynamics and psychology in the system. First off, it makes the implicit assumption that there is an "up" and "down" in the food-chain, and I think that's wrong. It's increasingly a "network" in the kernel. I'm less and less "the top", as much as a "fairly central" person. And that is how it should be. I used to think of kernel development as a hierarchy, but I long since switched to thinking about it as a fairly arbitrary network.

The other thing it does is that it implicitly puts the burden of quality control at the upper-level maintainer ("I'll pick the good things out of your tree"), while _not_ being able to cherry-pick means that there is pressure in both directions to keep the tree clean.

And that is IMPORTANT. I realize that not cherry-picking means that people who want to merge upstream (or sideways or anything) are now forced to do extra work in trying to keep their tree free of random crap. And that's a HUGELY IMPORTANT THING! It means that the pressure to keep the tree clean flows in all directions, and takes pressure off the "central" point. In onther words it distributes the pain of maintenance.

In other words, somebody who can't keep their act together, and creates crappy trees because he has random pieces of crud in it, quite automatically gets actively shunned by others. AND THAT IS GOOD! I've pushed back on some BK users to clean up their trees, to the point where we've had a number of "let's just re-do that" over the years. That's WONDERFUL. People are irritated at first, but I've seen what the end result is, and the end result is a much better maintainer.

Some people actually end up doing the cleanup different ways. For example, Jeff Garzik kept many separate trees, and had a special merge thing. Others just kept a messy tree for development, and when they are happy, they throw the messy tree away and re-create a cleaner one. Either is fine - the point is, different people like to work different ways, and that's fine, but makign _everybody_ work at being clean means that there is no train wreck down the line when somebody is forced to try to figure out what to cherry-pick.

So I've actually changed from "I want to cherry-pick" to "cherry-picking between maintainers is the wrong workflow". Now, as part of cleaning up, people may end up exporting the "ugly tree" as patches and re-importing it into the clean tree as the fixed clean series of patches, and that's "cherry-picking", but it's not between developers.

NOTE! The "no cherry-picking" model absolutely also requires a model of "throw-away development trees". The two go together. BK did both, and an SCM that does one but not the other would be horribly broken.

(This is my only real conceptual gripe with "monotone". I like the model, but they make it much harder than it should be to have throw-away trees due to the fact that they seem to be working on the assumption of "one database per developer" rather than "one database per tree". You don't have to follow that model, but it seems to be what the setup is geared for, and together with their "branches" it means that I think a monotone database easily gets very cruddy. The other problem with monotone is just performance right now, but that's hopefully not _too_ fundamental).

Alexander Viro pointed out that there was a very good use for cherry picking, and that was for reorganizing one's own patches, as opposed to trying to clean up other people's work. Linus replied:

Yes. I agree. There should be some support for cherry-picking in between a temporary throw-away tree and a "cleaned-up-tree". However, it should be something you really do need to think about, and in most cases it really does boil down to "export as patch, re-import from patch". Especially since you potentially want to edit things in between anyway when you cherry-pick.

(I do that myself: If I have been a messy boy, and committed mixed-up things as one commit, I export it as a patch, and then I split the patch by hand into two or more pieces - sometimes by just editing the patch directly, but sometimes with a combination of by applying it, and editing the result, and then re-exporting it as the new version).

And in the cases where this happens, you in fact often have unrelated changes to the _same_file_, so you really do end up having that middle step.

In other words, this cherry-picking can generally be scripted and done "outside" the SCM (you can trivially have a script that takes a revision from one tree and applies it to the other). I don't believe that the SCM needs to support it in any fundamentally inherent manner. After all, why should it, when it really boilds down to

(cd old-tree ; scm export-as-patch-plus-comments) | (cd new-tree ; scm import-patch-plus-comments)

where the "patch-plus-comments" part is just basically an extended patch (including rename information etc, not just the comments).

Btw, this method of cherry-picking again requires two _separate_ active trees at the same time. BK is great at that, and really, that's what distributed SCM's should be all about anyway. It's not just distributed between different machines, it's literally distributed even on the same machine, and it's actively _used_ that way.

6. Linux Released; Some Kernel Branches Still Using BitKeeper

7 Apr 2005 (2 posts) Archive Link: "Linux"

Topics: Version Control

People: Greg KH

Greg KH announced Linux, adding, "Yes, I'm still using bk for this release, as we don't have any other system in place just yet..."

7. Linux 2.6.12-rc2-mm2 Released

8 Apr 2005 - 12 Apr 2005 (24 posts) Archive Link: "2.6.12-rc2-mm2"

Topics: I2C, Kernel Release Announcement, PCI, Version Control

People: Andrew Morton

Andrew Morton announced Linux 2.6.12-rc2-mm2, saying:

8. Attempt At System-On-Chip Support

8 Apr 2005 - 11 Apr 2005 (5 posts) Archive Link: "PATCH add support for system on chip (SoC) devices."

Topics: Version Control

People: Ian MoltonRussell KingGreg KH

Ian Molton said:

This patch add support for a new 'System on Chip' or SoC bus type.

This allows common drivers used in different SoC devices to be shared in a clean and healthy manner, for example, the MMC function on toshiba t7l66xb, tc6393xb, and Compaq IPAQ ASIC3.

This is in common use in the CVS tree.

The only real issue is that drivers using this currently tend to assume that the SoC is attached to a platfrom_bus. This can be resolved as and when it becomes an issue for people.

This got a fairly lukewarm response. The most generous was Russell King's "Here's some comments on the patch itself. I'm not endorsing it by replying to it though." And Greg KH said, "Sorry, but I agree with everyone else. Why do you need this?" There was no further discussion.

9. git Development Continues

9 Apr 2005 - 13 Apr 2005 (177 posts) Archive Link: "more git updates.."

Topics: Big O Notation, Compression, FS: ext3, Patents, Spam, Version Control

People: Linus TorvaldsTony LuckChristopher LiPetr BaudisPaul Jackson

Linus Torvalds said:

several of you have sent me small fixes and scripts to "git", but I've been busy on breaking/changing the core infrastructure, so I didn't get around to looking at the scripts yet.

The good news is, the data structures/indexes haven't changed, but many of the tools to interface with them have new (and improved!) semantics:

In particular, I changed how "read-tree" works, so that it now mirrors "write-tree", in that instead of actually changing the working directory, it only updates the index file (aka "current directory cache" file from the tree).

To actually change the working directory, you'd first get the index file setup, and then you do a "checkout-cache -a" to update the files in your working directory with the files from the sha1 database.

Also, I wrote the "diff-tree" thing I talked about:

torvalds@ppc970:~/git> ./diff-tree 8fd07d4b7778cd0233ea0a17acd3fe9d710af035 8c6d29d6a496d12f1c224db945c0c56fd60ce941 | tr '\0' '\n'

        <100664 4870bcf91f8666fc788b07578fb7473eda795587 Makefile
        >100664 5493a649bb33b9264e8ed26cc1f832989a307d3b Makefile
        <100664 9e1bee21e17c134a2fb008db62679048fc819528 cache.h
        >100664 56ef561e590fd99e938bd47fd1f2c7ed46126ff0 cache.h
        <100664 fd690acc02ef9c06d7c4c3541f69b10ca4b4f8c9 cat-file.c
        >100664 6e6d89291ced17a406e64b97fe8bb96a22eefc9d cat-file.c
        +100664 fd00e5603dcc4a93acceda0b8cb914fabc8645d5 checkout-cache.c
        <100664 a4a8c3d9ef0c4cc6c82b96b5d1a91ac6d3bed466 commit-tree.c
        >100664 236ceb7646e3f5d110fd83f815b82e94cc5b2927 commit-tree.c
        +100664 01c92f2620a8e13e7cb7fd98ee644c6b65eeccb7 fsck-cache.c
        <100664 0eaa053919e0cc400ab9bc40d9272360117e6978 init-db.c
        >100664 815743e92dad7e451c65bab01448ee8ae9deeb56 init-db.c
        <100664 e7bfaadd5d2331123663a8f14a26604a3cdcb678 read-cache.c
        >100664 71d0cb6fe9b7ff79e3b2c5a61e288ac9f62b39dc read-cache.c
        <100664 ec0f167a6a505659e5af6911c97f465506534c34 read-tree.c
        >100664 f5c50ba79d02f002b9675fd8f129fa388e3282c6 read-tree.c
        <100664 00a29c403e751c2a2a61eb24fa2249c8956d1c80 show-diff.c
        >100664 b963dd738989bc92bf02352bbedad13a74e66a7d show-diff.c
        <100664 aff074c63ac827801a7d02ff92781365957f1430 update-cache.c
        >100664 3a672397164d5ff27a19a6888b578af96824ede7 update-cache.c
        <100664 7abeeba116b2b251c12ae32c7b38cb048199b574 write-tree.c
        >100664 9525c6fc975888a394477339db86216cd5bd5d7c write-tree.c

(ie the output of "diff-tree" has the same NUL-termination, but if you insist on getting ASCII output, you can just use "tr" to change the NUL into a NL).

The format of the "diff-tree" output is that the first character is "-" for "remove file", "+" for "add file" and "<"/">" for "change file" (where the "<" shows the old state, and ">" shows the new state).

Btw, the NUL-termination makes this really easy to use even in shell scripts, ie you can do

diff-tree <sha1> <sha1> | xargs -0 do_something

and you'll get each line as one nice argument to your "do_something" script. So a do_diff could be based on something like

        while [ "$1" != "" ]; do
                filename="$(echo $1 | cut -d' ' -f3-)"
                first_sha="$(echo $1 | cut -d' ' -f2)"
                second_sha="$(echo $2 | cut -d' ' -f2)"
                c="$(echo $1 | cut -c1)"
                case "$c" in
                        echo diff -u /dev/null "$filename($first_sha)";;
                        echo diff -u "$filename($first_sha)" /dev/null;;
                        echo diff -u "$filename($first_sha)" "$filename($second_sha)"
                        echo WHAT?
                        exit 1;;

which really shows what a horrid shell-person I am (I still use the old tools I learnt to use fifteen years ago. I bet you can do it trivially in perl or something sane, and I'm just stuck in the stone age of UNIX).

That makes it _very_ easy to parse. The example above is the diff between the initial commit and one of the more recent trees, so it has changes to everything, but a more normal thing would be

torvalds@ppc970:~/git> diff-tree 787763499dc4f8cc345bc6ed8ee1e0ae31adedd6 5b0c2695634b5bab2f5d63c7bb30f7e5815af470 | tr '\0' '\n'

        <100664 01c92f2620a8e13e7cb7fd98ee644c6b65eeccb7 fsck-cache.c
        >100664 81aa7bee003264ea302db835158e725eefa4012d fsck-cache.c

which tells you that the last commit changed just one file (it's from this one:

torvalds@ppc970:~/git> cat-file commit `cat .dircache/HEAD` tree 5b0c2695634b5bab2f5d63c7bb30f7e5815af470 parent 81c53a1d3551f358860731481bb2d87179d221e6 author Linus Torvalds <> Sat Apr 9 12:02:30 2005 committer Linus Torvalds <> Sat Apr 9 12:02:30 2005

Make "fsck-cache" print out all the root commits it finds.

Once I do the reference tracking, I'll also make it print out all the HEAD commits it finds, which is even more interesting.

in case you care).

I've rsync'ed the new git repository to, it should all be there in /pub/linux/kernel/people/torvalds/git.git/ (and it looks like the mirror scripts already picked it up on the public side too).

Can you guys re-send the scripts you wrote? They probably need some updating for the new semantics. Sorry about that ;(

Petr Baudis, a.k.a Pasky, showed great interest in the project, and had been maintaining some wrapper scripts for git. He and a bunch of other developers, including a number of the big-time kernel folks, piled into discussion of the design and implementation of git.

Many of Linus's posts were explanations of his basic design ideas. At one point he remarked, "My goal here is that the speed of "git" really should be almost totally independent of the size of the project. You clearly cannot avoid _some_ size-dependency: my "diff-tree" clearly also has to work through the same 1MB of data, but I think it's worth making the constant factor be as small as humanly possible." He posted some numbers, using his current implementation; and his explanation slid into consideration of further design issues:

I just tried checking in a kernel tree tar-file, and the initial checkin (which is allt he compression and the sha1 calculations for every single file) took about 1:35 (minutes, not hours ;).

Doing a commit (trivial change to the top-level Makefile) and then doing a "treediff" between those two things took 0.05 seconds using my C thing. Ie we're talking so fast that we really don't care.

Doing a "show-diff" takes 0.15 secs or so (that's all the "stat" calls), and now that I test it out I realize that the most expensive operation is actually _writing_ the "index" file out. These are the two most expensive steps:

        torvalds@ppc970:~/lx-test/linux-2.6.12-rc2> time update-cache Makefile

        real    0m0.283s
        user    0m0.171s
        sys     0m0.113s

        torvalds@ppc970:~/lx-test/linux-2.6.12-rc2> time write-tree

        real    0m0.441s
        user    0m0.354s
        sys     0m0.087s

ie with the current infrastructure it looks like I can do a "patch + commit" in less than one second on the kernel, and 0.75 secs of that is because the "tree" file actually grows pretty large:

        cat-file tree 5ca21c9d808fa4bee1eb6948a59dfb9c7d73f36a | wc -c

says that the uncompressed tree-file is 950,874 bytes. Compressing it means that the archival version of it is "just" 462,546 bytes, but this is really the part that is going to eat _tons_ of disk-space.

In other words, each "commit" file is very small and cheap, but since almost every commit will also imply a totally new tree-file, "git" is going to have an overhead of half a megabyte per commit. Oops.

Damn, that's painful. I suspect I will have to change the format somehow.

One option (which I haven't tested yet) is that since the tree-file is already sorted, I could always write it out with the common subdirectory part "collapsed", ie instead of writing


I'd write just


since the directory names are implied by the predecessor.

However, that doesn't help with the 20-byte sha1 associated with each file, which is also obviously uncompressible, so with 17,000+ files, we have a minimum overhead of abotu 350kB per tree-file.

So even if I did the pathname compression, it wouldn't help all that much. I'd only be removing the only part of the file that _is_ very compressible, and I'd probably end up with something that isn't all that far away from the 450kB+ it is now.

I suspect that I have to change the file format. Maybe make the "tree" object a two-level thing, and have a "directory" object.

Then a "tree" object would point to a "directory" object, which would in turn point to the individual files (and other "directory" objects, of course). That way a commit that only changes a few files will only need to create a few new "directory" objects, instead of creating one huge "tree" object.

Sadly, that will make "tree-diff" potentially more expensive. On the other hand, maybe not: it will also speed it _up_, since directories that are totally shared will be trivially seen as such and need no further operation.

Thougths? That would break the current repository formats, and I'd have to create a converter thing (which shouldn't be that bad, of course).

I don't have to do it right now. In fact, I'd almost prefer for the current thing to become good enough that it's not painful to work with, since right now I'm using it to develop itself. Then I can convert the format with an automated script later, before I actually start working on the kernel...

One issue that came up was the flat nature of the '.git' directory. Linus's design included a '.git' directory to contain all the revisioning information of a given project. Within this were a big batch of subdirectories, which would contain files holding revision data. As Tony Luck pointed out, the number of files in these directories would grow rapidly. He said, "A changeset that touches just one file a few levels down from the top of the tree (say arch/i386/kernel/setup.c) will make six new files in the git repository (one for the changeset, four tree files, and a new blob for the new version of the file). More complex changes make more files ... but say the average is ten new files per changeset since most changes touch few files. With 60,000 changesets in the current tree, we will start out our git repository with about 600,000 files. Assuming the first byte of the SHA1 hash is random, that means an average of 2343 files in each of the objects/xx directories. Give it a few more years at the current pace, and we'll have over 10,000 files per directory."

Linus replied:

The good news is that git itself doesn't really care. I think it's literally _one_ function ("get_sha1_filename()") that you need to change, and then you need to write a small script that moves files around, and you're really much done.

Also, I did actually debate that issue with myself, and decided that even if we do have tons of files per directory, git doesn't much care. The reason? Git never _searches_ for them. Assuming you have enough memory to cache the tree, you just end up doing a "lookup", and inside the kernel that's done using an efficient hash, which doesn't actually care _at_all_ about how many files there are per directory.

So I was for a while debating having a totally flat directory space, but since there are _some_ downsides (linear lookup for cold-cache, and just that "ls -l" ends up being O(n**2) and things), I decided that a single fan-out is probably a good idea.

He admitted that he could end up being wrong, and that a two-layer directory tree would be required. But he said, "The good news is that we can trivially fix it later (even dynamically - we can make the "sha1 object tree layout" be a per-tree config option, and there would be no real issue, so you could make small projects use a flat version and big projects use a very deep structure etc). You'd just have to script some renames to move the files around.."

Another issue that came up was that of renaming files and directories. Linus said:

You can represent renames on top of git - git itself really doesn't care. In many ways you can just see git as a filesystem - it's content- addressable, and it has a notion of versioning, but I really really designed it coming at the problem from the viewpoint of a _filesystem_ person (hey, kernels is what I do), and I actually have absolutely _zero_ interest in creating a traditional SCM system.

So to take renaming a file as an example - why do you actually want to track renames? In traditional SCM's, you do it for two reasons:

So consider me deficient, or consider me radical. It boils down to the same thing. Renames don't matter.

That said, if somebody wants to create a _real_ SCM (rather than my notion of a pure content tracker) on top of GIT, you probably could fairly easily do so by imposing a few limitations on a higher level. For example, most SCM's that track renames require that the user _tell_ them about the renames: you do a "bk mv" or a "svn rename" or something.

If you want to do the same on top of GIT, then you should think of GIT as what it is: GIT just tracks contents. It's a filesystem - although a fairly strange one. How would you track renames on top of that? Easy: add your own fields to the GIT revision messages: GIT enforces the header, but you can add anything you want to the "free-form" part that follows it.

Same goes for any other information where you care about what happens "within" a file. GIT simply doesn't track it. You can build things on top of GIT if you want to, though. They may not be as efficient as they would be if they were built _into_ GIT, but on the other hand GIT does a lot of other things a hell of a lot faster thanks to it's design.

So whether you agree with the things that _I_ consider important probably depends on how you work. The real downside of GIT may be that _my_ way of doing things is quite possibly very rare.

But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong.

Paul Jackson had an epiphany along these lines, and found that when he thought of git in filesystem terms, it started to make much more sense to him than when he tried to think of it in version control terms. Linus replied:

Yes. You really should think of GIT as a filesystem, and of me as a _systems_ person, not an SCM person. In fact, I tend to detest SCM's. I think the reason I worked so well with BitKeeper is that Larry used to do operating systems. He's also a systems person, not really an SCM person. Or at least he's in between the two.

My operations are like the "system calls". Useless on their own: they're not real applications, they're just how you read and write files in this really strange filesystem. You need to wrap them up to make them do anything sane.

For example, take "commit-tree" - it really just says that "this is the new tree, and these other trees were its parents". It doesn't do any of the actual work to _get_ those trees written.

So to actually do the high-level operation of a real commit, you need to first update the current directory cache to match what you want to commit (the "update-cache" phase).

Then, when your directory cache matches what you want to commit (which is NOT necessarily the same thing as your actual current working area - if you don't want to commit some of the changes you have in your tree, you should avoid updating the cache with those changes), you do stage 2, ie "write-tree". That writes a tree node that describes what you want to commit.

Only THEN, as phase three, do you do the "commit-tree". Now you give it the tree you want to commit (remember - that may not even match your current directory contents), and the history of how you got here (ie you tell commit what the previous commit(s) were), and the changelog.

So a "commit" in SCM-speak is actually three totally separate phases in my filesystem thing, and each of the phases (except for the last "commit-tree" which is the thing that brings it all together) is actually in turn many smaller parts (ie "update-cache" may have been called hundreds of times, and "write-tree" will write several tree objects that point to each other).

Similarly, a "checkout" really is about first finding the tree ID you want to check out, and then bringing it into the "directory cache" by doing a "read-tree" on it. You can then actually update the directory cache further: you might "read-tree" _another_ project, or you could decide that you want to keep one of the files you already had.

So in that scneario, after doing the read-tree you'd do an "update-cache" on the file you want to keep in your current directory structure, which updates your directory cache to be a _mix_ of the original tree you now want to check out _and_ of the file you want to use from your current directory. Then doing a "checkout-cache -a" will actually do the actual checkout, and only at that point does your working directory really get changed.

Btw, you don't even have to have any working directory files at all. Let's say that you have two independent trees, and you want to create a new commit that is the join of those two trees (where one of the trees take precedence). You'd do a "read-tree <a> <b>", which will create a directory cache (but not check out) that is the union of the <a> and <b> trees (<b> will overrride). And then you can do a "write-tree" and commit the resulting tree - without ever having _any_ of those files checked out.

Christopher Li also said that he'd been having trouble visualizing git until he started thinking of it as a filesystem rather than a version control system. He suggested, "one thing I learn from ext3 is that it is very useful to have some compatible flag for future development. I think if we want to reserve some room in the file format for further development of git, it is the right time to do it before it get bigs." Linus replied:

Way ahead of you.

This is (one reason) why all git objects have the type embedded inside of them. The format of all objects is totally regular: they are all compressed with zlib, they are all named by the sha1 file, and they all start out with a magic header of "<typename> <typesize><nul byte>".

So if I want to create a new kind of tree object that does the same thing as the old one but has some other layout, I'd just call it something else. Like "dir". That was what I initially planned to do about the change to recursive tree objects, but it turned out to actually be a lot easier to just encode it in the old type (that way the routines that read it don't even have to care about old/new types - it's all the same to them).

Completely elsewhere, Linus asked:

Btw, does anybody have strong opinions on the license? I didn't put in a COPYING file exactly because I was torn between GPLv2 and OSL2.1.

I'm inclined to go with GPLv2 just because it's the most common one, but I was wondering if anybody really had strong opinions. For example, I'd really make it "v2 by default" like the kernel, since I'm sure v3 will be fine, but regardless of how sure I am, I'm _not_ a gambling man.

There was a small amount of discussion, but no one spoke up in favor of OSL2.1. The idea of remaining compatible with other GPLed software was important to some folks, as well as the idea that there might be ways for unprincipled companies to circumvent OSL2.1's patent clauses.

Completely elsewhere, Petr announced:

here goes git-pasky-0.2, my set of patches and scripts upon Linus' git, aimed at human usability and to an extent a SCM-like usage.

If you already have a previous git-pasky version, just git pull pasky to get it. Otherwise, you can get it from:

Please see the README there and/or the parent post for detailed instructions. You can find the changes from the last announcement in the ChangeLog (releases have separate commits so you can find them easily; they are also tagged for purpose of diffing etc).

This is release contains mostly bugfixes, performance enhancements (especially w.r.t. git diff), and some merges with Linus (except for diff-tree, where I merged only the new output format). New features are trivial - support for tagging and short SHA1 ids; you can use only the start of the SHA1 hash long enough to be unambiguous.

My immediate plan is implementing git merge, which I will do tommorow, if noone will do it before that is. ;-)

Any feedback/opinions/suggestions/patches (especially patches) are welcome.

There was some brief discussion, and Peter released version 0.3. Discussion continued briefly, and Linus announced:

I'm going to stop cc'ing linux-kernel on git issues (after this email, which also acts as an announcement for people who haven't noticed already), since anybody who is interested in git can just use the "" mailing list:

echo 'subscribe git' | mail

to get you subscribed (and you'll get a message back asking you to authorize it to avoid spam - if you don't get anything back, it failed).

10. Status Of New Hardware

9 Apr 2005 - 10 Apr 2005 (5 posts) Archive Link: "Status of new servers"

People: H. Peter Anvin

H. Peter Anvin said:

The second new server, named, ironically enough, went into the racks and, shortly thereafter, into production Friday. The only service that isn't served from both servers at this time is for the simple reason that it's still syncing on zeus1; it will probably be turned on early next week.

As a consequence, all services are now round-robin between two hosts. You can also hit a specific host by appending a digit, e.g., or

When one server is offline, we'll change the round-robin names to only point to the one host that is in operation.

As some of you noticed, there was a problem early on which made uploads lag behind; it has now been corrected and uploads should be significantly faster than they have been in a long time.

As usual, please report problems to As far as we know, the transition problems have been taken care of and all should be well.

The next day, he reported, "Both the new servers are now in full production, including mirrors. Enjoy, and as usual, report problems to <>."

Andre Tomt was sad not to see the old bandwidth stats on the front page, and H. Peter replied, "the issue there is that with multiple servers we have to change the way they're generated and distributed. Nathan Laredo is working on that, but it's so obviously not a high priority item."

11. ACP/MWave Modem Maintainership

9 Apr 2005 - 11 Apr 2005 (3 posts) Archive Link: "[patch] MAINTAINERS: remove obsolete ACP/MWAVE MODEM entry"

Topics: Modems

People: Marcelo TosattiAdrian BunkAndrew Morton

Adrian Bunk noticed that the ACP/Mwabe Modem maintainer emails for Paul B. Schroeder and Mike Sullivan were bouncing, and the listed web site was no longer valid. He posted a patch to remove this entry. Marcelo Tosatti replied, "./drivers/char/mwave/Makefile also references Paul's email address, at least in v2.4." But Adrian replied, "I've given up on removing and correcting obsolete email addresses. This created more discussions than it was worth..." Marcelo accepted the patch as it was, and Andrew Morton also applied the patch to the 2.6 tree.

12. Status Of Patch Commit Mailing List; Some Discussion Of Git

10 Apr 2005 - 16 Apr 2005 (23 posts) Archive Link: "New SCM and commit list"

Topics: Version Control

People: Benjamin HerrenschmidtDavid WoodhouseLinus TorvaldsJames BottomleyIngo MolnarChris MasonJeff Garzik

Benjamin Herrenschmidt asked Linus Torvalds:

Do you intend to continue posting "commited" patches to a mailing list like bk scripts did to bk-commits-head@vger ? As I said a while ago, I find this very useful, especially with the actual patch included in the commit message (which isn't the case with most other projects CVS commit lists, and I find that annoying).

If yes, then I would appreciate if you could either keep the same list, or if you want to change the list name, keep the subscriber list so those of us who actually archive it don't miss anything ;)

David Woodhouse replied:

The commits lists currently only accept posts from dwmw2@hera, I believe. That can relatively easily be changed if the mail is going to come from somewhere else.

I did ask Linus to let me know as soon as possible when he starts to commit patches, so we can come up with a way to keep the list fed. Since he thinks I'm James, however, I suspect that part of the message didn't get through. Perhaps he was just distracted by the Britishness?

Linus also replied to Benjamin, confirming that he did indeed plan to post committed patches to a mailing list. But he added, "GIT isn't quite at the point where I can start using it yet, though. I could actually start committing patches, but I want to make sure that I can also do automated simple merges, so that there is any _point_ to doing this in the first place. My plan is to not be very good at merging (in particular, I don't see GIT resolving renames _at_all_), but my hope is that the people who used to merge with me using BK might be able to still do so using GIT, as long as we try actively to be very careful." Regarding Benjamin's suggestion to use the same list or else keep the subscribers, Linus said, "I didn't even set up the list. I think it's Bottomley. I'm cc'ing him just so that he sees the message, but I don't actually expect him to do anything about it. I'm not even ready to start _testing_ real merges yet. But I hope that I can get non-conflicting merges done fairly soon, and maybe I can con James or Jeff or somebody to try out GIT then..."

James Bottomley disclaimed responsibility for the existing mailing list, saying, "If I remember correctly, the list was set up by the vger list maintainers (davem and company). It was tied to a trigger in one of your trees (which I think Larry did). It shouldn't be too difficult to add to git ... it just means traversing all the added patches on a merge and sending out mail."

Jeff Garzik said he was one of the good people to talk to about the mailing list situation. Along the way, folks also started talking about Git. Jeff was interested in trying out a source tree 'merge', when Linus thought it was ready. At this point in the discussion, Linus said:

I can tell you what merges are going to be like, just to prepare you.

First, the good news: I think we can make the workflow look like bk, ie pretty much like "git pull" and "git push". And for well-behaved stuff (ie minimal changes to the same files on both sides) it will even be fast. I think.

Then the bad news: the merge algorithm is going to suck. It's going to be just plain 3-way merge, the same RCS/CVS thing you've seen before. With no understanding of renames etc. I'll try to find the best parent to base the merge off of, although early testers may have to tell the piece of crud what the most recent common parent was.

So anything that got modified in just one tree obviously merges to that version. Any file that got modified in two trees will end up just being passed to the "merge" program. See "man merge" and "man diff3". The merger gets to fix up any conflicts by hand.

Quite frankly, that means that we really want to avoid any "exciting" merges with GIT. Maybe somebody can come up with something smarter. Eventually. Don't count on it, at least not in the near future.

The good news is that it's not like a three-way file merge is any worse than many people are used to. The bad news is that BK is just a hell of a lot better. So anybody who has been depending heavily on BK merges (and hey, the beauty of them is that you often don't even _know_ that you are depending on them) will be a bit bummed by the "Welcome back to the 1980's" message from a three-way merge.

Ingo Molnar replied:

at that point Chris Mason's "rej" tool is pretty nifty:

it gets the trivial rejects right, and is pretty powerful to quickly cycle through the nontrivial ones too. It shows the old and new code side by side too, etc.

(There is no fully automatic mode in where it would not bother the user with the really trivial rejects - but it has an automatic mode where you basically have to do nothing - maybe a fully automatic one could be added that would resolve low-risk rejects?)

it's really easy to use (but then again i'm a vim user, so i'm biased), just try it on a random .rej file you have ("rej -a kernel/sched.c.rej" or whatever).

13. BitMover Still Supporting

11 Apr 2005 - 13 Apr 2005 (9 posts) Archive Link: " is down"

Topics: Version Control

People: Larry McVoyMartin DaleckiAlexander NybergDiego CallejaJesper Juhl

Larry McVoy announced that the server "Seems to have crashed, we don't know the cause yet. Is there anyone who is dependent on this tonight? If so I'll drive down and fix it (yeah, very lame of us, we moved it to a different rack which was too far away from our remote power so I can't power cycle it remotely. Our bad.) Let me know, if I don't hear from anyone we'll get it in about 14 hours. If that's too long I'll understand, it's 20 minutes away and I can go deal." Martin Dalecki replied, "Excuse me, but: who gives a damn shit?" Alexander Nyberg said, "Anyone who wants to have access to the history or any other functioning of the repository." Diego Calleja also replied to Martin, saying, "All the kernel hackers who used BK for years and are still using it, I'd guess." Jesper Juhl also replied to Martin, saying:

Lots of people do; those who use bitkeeper, and even people (like me) who don't use it to manage source but still use the info at to track what patches got merged etc.

Ohh and by the way, Larry doesn't deserve comments like that. He's done a lot of hard work for everyone here (not to mention spent a lot of money) and he's provided an excellent tool. He deserves gratitude and respect, not childish BS like the above.

And Toon van der Pas agreed wholeheartedly with this.

14. open-iscsi And linux-iscsi Projects Merge

11 Apr 2005 - 14 Apr 2005 (5 posts) Archive Link: "[ANNOUNCE] open-iscsi and linux-iscsi project teams have merged!"

Topics: Disks: SCSI, Version Control

People: Christoph HellwigDmitry YusupovLee Revell

A bunch of folks posted anonymously, saying:

The linux-iscsi and open-iscsi developers would like to announce that they have combined forces on a single iSCSI initiator effort!

This mail gives an overview of this combined effort and will be followed by a set of iSCSI patches the combined team submits for review as a candidate for inclusion into the mainline kernel.


After some dialog with the developers on each team, it was decided that although each team started out with independent code and some differences in architecture, it was worth considering the possibility of combining the two efforts into one. Alternatives were considered for the combined architecture of the two projects, including adding an option for a kernel control plane. After discussions, it was decided by consensus that the open-iscsi architecture and code would be the best starting point for the "next gen" linux-iscsi project. The advantages of this starting point include open-iscsi's optimized I/O paths that were built from scratch, as well as incorporation of well tested iscsi-sfnet components for the control plane and userspace components. The combined open-iscsi and linux-iscsi teams believe this will result in the highest performing and most stable iSCSI stack for Linux.

Overview of Combined Project

This new combined effort will consist of the open-iscsi code and developers moving over to the linux-iscsi project on sourceforge ( The open-iscsi ( architecture will be the basis for the "next gen" of linux-iscsi, which will be numbered the linux-iscsi-5.x release series.

Release Numbering

If you were following the open-iscsi series, here is the mapping between the open-iscsi numbering and the linux-iscsi-5.x numbering: - open-iscsi-0.2 == linux-iscsi-

Kernel Submission

The kernel component of the first release in this linux-iscsi 5.x series will follow shortly, and the combined teams wish to submit this as a candidate for inclusion into the mainline kernel. If you've reviewed the previous open-iscsi patch set, you'll find that this patchset is very similar, with previous reviewer comments incorporated.

Christoph Hellwig asked, "What SCM will the code be in? I must admit I really, really prefer the SVN hosting of open-iscsi over the CVS mess." Dmitry Yusupov replied:

Consider linux-iscsi-5.x CVS branch as a "mainline". Current open-iscsi SVN repository is the place where all hard-core development will happen at least for the nearest future.

I really hope will provide SVN hosting very soon. than we will see how it goes. and may be we might just migrate current hosting to the

Christoph Hellwig felt that SourceForge did not provide the best services, and Lee Revell replied, "FWIW, Sourceforge sent out an email just the other day announcing that they've replaced the CVS servers, and promising that these issues will go away. We'll see how that works out..."

15. /dev/random Maintainership

12 Apr 2005 (1 post) Archive Link: "[patch 143/198] update maintainer for /dev/random"

Topics: CREDITS File, MAINTAINERS File, Random Number Generation

People: Matt Mackall

Matt Mackall said that Theodore Y. T'so "has agreed to let me take over as maintainer of /dev/random and friends. I've gone ahead and added a line to his entry in CREDITS." Matt posted a patch to add his own copyright notice to the code, and to update the /dev/random entry in the MAINTAINERS and CREDITS files.

16. Network Driver Discussion Most Appropriate On lkml

12 Apr 2005 - 13 Apr 2005 (4 posts) Archive Link: "[patch 152/198] Maintainers list update: linux-net -> netdev"

People: George AnzingerAndrew Morton

Andrew Morton posted a patch originally from Simon Horman, updating the preferred mailing list for network driver discussion, from the "mostly dead" to However, George Anzinger felt that really the linux-kernel mailing list was the proper place for these sorts of discussions; so Simon posted a new patch listing linux-kernel as the preferred list.

17. Radeon Framebuffer And Rage128 Framebuffer Maintainership

12 Apr 2005 (1 post) Archive Link: "[patch 196/198] fbdev MAINTAINERS update"

Topics: Framebuffer

People: Ani JoshiBenjamin HerrenschmidtAndrew Morton

Andrew Morton posted a patch originally from Benjamin Herrenschmidt, changing the maintainer of the Radeon Framebuffer display driver and the Rage128 Framebuffer display driver from Ani Joshi the old maintainer, to Benjamin. Benjamin remarked in his changelog entry that this update was "long overdue".

18. Arch Suggested As Potential BitKeeper Replacement

12 Apr 2005 - 13 Apr 2005 (3 posts) Archive Link: "Why not GNU Arch instead of BitKeeper?"

Topics: Version Control

People: Asfand Yar QaziMiles BaderRalf BaechleZack BrownLinus Torvalds

Asfand Yar Qazi asked why Linus Torvalds was developing a new tool instead of using GNU arch. He pointed out that "it was probably started in direct response to the Linux Kernel using a non-free tool." Miles Bader said he used arch as his version control tool of preference, but added, "it has its own issues. For instance it has a _lot_ less attention payed to optimization than one might wish (judging from "git", this is very important to Linus :-). The concept of "archives" and their associated namespace offer some nice advantages, but is a very different model than BK uses, and I presume sticking with the familiar and simple BK model was attractive." And Ralf Baechle put in, "You can get somebody to be doing some work with bitkeeper within a few minutes. Arch has a much longer getting started phase."

(ed. [Zack Brown] Many of the free version control systems have been Linux kernel 'hopefuls' for a long time. It'll be interesting to see how development on all of these projects continues to unfold, as the kernel developers select or create one for their use. Currently it looks as though git will indeed be a permanent solution. Linus is having a really great time developing it, as are a whole bunch of top kernel folks; and the new 'cogito' layer above the git filesystem is starting to look very nice.)

19. New Script For Tracking Official Git Commits

13 Apr 2005 (1 post) Archive Link: "TESTING: new git commits mail"

Topics: Version Control

People: David Woodhouse

David Woodhouse said:

I've set up a script to replace the old one which mailed commits to the bk-commits-head mailing list. It's fed from Linus' "kernel-test.git" repository, which isn't necessarily going to end up going into the real 2.6.12 release -- but in the absence of other information or indeed any tree which definitely _is_ leading to the next release, we might as well see these commits.

I've stopped setting the Date: header of the mail to match the timestamp of the commit. That's partly because the current version of git doesn't actually _include_ the full timestamp information properly, but mostly because some people were requesting that I do that for the old bkexport script anyway.

20. Git Mailing List Archives Available

14 Apr 2005 - 15 Apr 2005 (5 posts) Archive Link: "Git archive now available"

People: David S. Miller

Darren Williams said that the folks at the University of New South Wales in Australia had put the git mailing list archives up at David S. Miller noticed some file permissions problems on that site, which Darren fixed. Meanwhile, Kenneth Johansson point out, and Darren later also reported, that another archive existed at







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.