Kernel Traffic #146 For 17�Dec�2001

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1953 posts in 7867K.

There were 504 different contributors. 248 posted more than once. 197 posted last week too.

The top posters of the week were:

1. Coding Style; Development Philosophy

28�Nov�2001�-�10�Dec�2001 (383 posts) Archive Link: "Coding style - a non-issue"

Topics: BSD, Clustering, FS: NTFS, Microsoft, Networking, PCI, SMP, Virtual Memory

People: Alexander Viro,�Larry McVoy,�Henning P. Schmiedehausen,�Jeff Garzik,�Alan Cox,�David S. Miller,�Rik van Riel,�Daniel Phillips,�Linus Torvalds,�Tim Hockin,�Victor Yodaiken,�Horst von Brand,�Ingo Molnar,�Stanislav Meduna,�Donald Becker,�Chris Ricker

Peter Waltenberg suggested using the 'indent' program to clean up the coding styles folks have been complaining about recently. Several folks pointed him to 'Lindent', which is distributed with the kernel sources; and Alexander Viro also said:

indent does _not_ solve the problem of:

He added that he was "close to setting up a Linux Kernel Hall of Shame - one with names of wankers (both individual and coprorat ones) responsible, their code and commentary on said code..." Larry McVoy (tongue hanging out) replied, "Please, please, please, I'm begging you, please do this. It's the only way people learn quickly. Being nice is great, but nothing works faster than a cold shower of public humiliation :-)" Henning P. Schmiedehausen replied:

Cool. I can really see it. "foo inc." releases a GPL driver for Linux and gets publically humiliated in the "linux source code hall of shame". Perfect. I can hear the laughter from Redmond over here (and it is 12000km away).

Press release:

"If you support Linux, you may get flamed and humiliated in public for doing so. Don't do it."

Grow up. There is more to life than coding style that is not "Al Viro compatible (TM)".

Sigh, sometimes, sometimes, I _really_ understand the BSD folks when they call the Linux community "a bunch of unkempt nerds that need to get a life". If they only would use sane init scripts. ;-)

Larry said, "Perhaps it would be illuminating to know that I was BSD hacker, and that I learned the value of this particular technique from Sun's kernel group, who once upon a time were the best BSD group on the planet. It might also be illuminating to consider that this technique of getting people to listen is still in use at Sun to this day. Perhaps Sun's professionalism is not what you'd like to see here." To which Henning said, "It might be enlightening to you that a closed users group of SUN coders is not compareable to a worldwide distributed environment of thousands of people and companies." He added:

You tell me, that SUN treated _CUSTOMERS_ and companies that wanted to support SunOS 4.1.x like that? If yes, then I definitely know why they went SysV. Surely noone wanted BSD any longer.

I would consider the internal development groups in SUN that treated each other like this also "in need of a change". :-)

But Jeff Garzik agreed with Larry, saying, "The security community has shown us time and again that public shaming is often the only way to motivate vendors into fixing security problems. Yes, even BSD security guys do this :)" He said he'd like to see a "Top 10 Ugliest Linux Kernel Drivers" list. But Henning objected:

A security issue is an universal accepted problem that most of the time has a reason and a solution.

Coding style, however, is a very personal thing that start with "shall we use TABs or not? (Jakarta: No. Linux: Yes ...) and goes on to "Is a preprocessor macro a good thing or not" until variable names (Al Viro: Names with more than five letters suck. :-) Java: Non-selfdescriptive names suck. Microsoft: Non-hungarian names suck) and so on.

And you really want to judge code just because someone likes to wrap code in preprocessor macros or use UPPERCASE variable names?

Come on. That's a _fundamental_ different issue than dipping vendors in their own shit if they messed up and their box/program has a security issue. Code that you consider ugly as hell may be seen as "easily understandable and maintainable" by the author. If it works and has no bugs, so what? Just because it is hard for you and me to understand (cf. "mindboggling unwind routines in the NTFS" (I thing Jeff Merkey stated it like this). It still seems to work quite well.

Are you willing to judge "ugliness" of kernel drivers? What is ugly? Are Donald Beckers' drivers ugly just because they use (at least on 2.2) their own pci helper library? Is the aic7xxx driver ugly because it needs libdb ? Or is ugly defined as "Larry and Al don't like them"? :-)

Flaming about coding style is about as pointless as flaming someone because he supports another sports team. There is no universal accepted coding style. Not even in C.

Alan Cox replied:

The kernel has an accepted coding style, both the documented and the tradition part of it. Using that makes life a lot lot easier for maintaining the code. Enforcing it there is a good idea, except for special cases (headers shared with NT has been one example of that).

There are also some nice tools around that will do the first stage import of a Hungarian NT'ese driver and linuxise it.

Jeff added, close by, that "Diverse coding styles in the Linux kernel create long term maintenance problems." Also close by, Alexander said:

Fact of life: we all suck at reviewing our own code. You, me, Ken Thompson, anybody - we tend to overlook bugs in the code we'd written. Depending on the skill we can compensate - there are technics for that, but it doesn't change the fact that review by clued people who didn't write the thing tends to show bugs we'd missed for years.

If you really don't know that by your own experience - you don't _have_ experience. There is a damn good reason for uniform style within a project: peer review helps. I've lost the count of bugs in the drivers that I'd found just grepping the tree. Even on that level review catches tons of bugs. And I have no reason to doubt that authors of respective drivers would fix them as soon as they'd see said bugs.

"It's my code and I don't care if nobody else can read it" is an immediate firing offense in any sane place. It may be OK in academentia, but in the real life it's simply unacceptable.

It's all nice and dandy to shed tears for poor, abused, well-meaning company that had made everyone happy by correct but unreadable code and now gets humiliated by mean ingrates. Nice image, but in reality the picture is quite different. Code _is_ buggy. That much is a given, regardless of the origin of that code. The only question is how soon are these bugs fixed. And that directly depends on the amount of efforts required to read through that code.

The discussion went on and on, and then took a dramatic turn when Larry quipped, "If you think that Linux is at the same level as Sun's OS or ever will be, you're kidding yourself. Linux is really cool, I love it, and I use it every day. But it's not comparable to Solaris, sorry, not even close. I'm not exactly known for my love of Solaris, you know, in fact I really dislike it. But I respect it, it can take a licking and keep on ticking. Linux isn't there yet and unless the development model changes somewhat, I'll stand behind my belief that it is unlikely to ever get there."

Several folks wanted to hear why Larry felt Linux would never touch Sun, and Larry said that the open source development model was not as good as the closed, proprietary model. Large groups of volunteer contributors, he felt, were not able to compete with small groups of paid professionals.

He also added that the best thing to do would be to split kernel development into the single-processor and multi-processor versions, because simplifying the technical problems would be the only way for the less qualified volunteers to have a shot at competing with the proprietary team.

In the course of a long, surprisingly flame-free technical discussion in response to that, David S. Miller said at one point:

Coming from the background of having threaded from scratch a complete networking stack (ie. my shit doesn't stink and I'm here to tell you about it :-))) I think your claims are pretty much out of whack.

Starting from initial implementation to having all the locking disappear from the kernel profiles during any given load was composed of three tasks:

  1. Is this object almost entirely reader based (or the corrolary)? Use special locking that exploits this. See linux/brlock.h for one such "special locking" we invented to handle these cases optimally.

    This transformation results in ZERO shared dirty cache lines if the initial analysis is correct.

  2. Can we "fan out" the locking so that people touching seperate objects %99 of the time touch different cache lines?

    This doesn't mean "more per-object locking", it means more like "more per-zone locking". Per-hashchain locking falls into this category and is very effective for any load I have ever seen.

  3. Is this really a per-cpu "thing"? The per-cpu SKB header caches are an example of this. The per-cpu SLAB magazines are yet another.

Another source of scalability problems has nothing to do with whether you do some kind of clustering or not. You are still going to get into situations where multiple cpus want (for example) page 3 of libc.so :-) (what I'm trying to say is that it is a hardware issue in some classes of situations)

Frankly, after applying #1 and/or #2 and/or #3 above to what shows up to have contention (I claim the ipv4 stack to have had this done for it) there isn't much you are going to get back. I see zero reasons to add any more locks to ipv4, and I don't think we've overdone it in the networking either.

Even more boldly, I claim that Linux's current ipv4 scales further than anything coming out of Sun engineering. From my perspective Sun's scalability efforts are more in line with "rubber-stamp" per-object locking when things show up in the profiles than anything else. Their networking is really big and fat. For example the Solaris per-socket TCP information is nearly 4 times the size of that in Linux (last time I checked their headers). And as we all know their stuff sits on top of some thick infrastructure (ie. STREAMS) (OK, here is where someone pops up a realistic networking benchmark where we scale worse than Solaris. I would welcome such a retort because it'd probably end up being a really simple thing to fix.)

My main point: I think we currently scale as far as we could in the places we've done the work (which would include networking) and it isn't "too much locking".

The problem areas of scalability, for which no real solution is evident yet, involve the file name lookup tree data structures, ie. the dcache under Linux. All accesses here are tree based, and everyone starts from similar roots. So you can't per-node or per-branch lock as everyone traverses the same paths. Furthermore you can't use "special locks" as in #1 since this data structure is neither heavy reader nor heavy writer.

But the real point here is that SMP/cc clusters are not going to solve this name lookup scaling problem.

The dcache_lock shows up heavily on real workloads under current Linux. And it will show up just as badly on a SMP/cc cluster. SMP/cc clusters talk a lot about "put it into a special filesystem and scale that all you want" but I'm trying to show that frankly thats where the "no solution evident" scaling problems actually are today.

If LLNL was not too jazzed up about your proposal, I right now don't blame them. Because with the information I have right now, I think your claims about it's potential are bogus.

I really want to be shown wrong, simply because the name path locking issue is one that has been giving me mental gas for quite some time.

Another thing I've found is that SMP scalability changes that help the "8, 16, 32, 64" cpu case almost never harm the "4, 2" cpu cases. Rather, they tend to improve the smaller cpu number cases. Finally, as I think Ingo pointed out recently, some of the results of our SMP work has even improved the uniprocessor cases.

Elsewhere, Rik van Riel said, "I'll have to agree with Larry that Linux really isn't going anywhere in particular and seems to be making progress through sheer luck." Daniel Phillips quipped, "You just reminded me of Minnesota Fats most famous quote: "The more I practice, the luckier I get"" And Linus Torvalds also said to Rik:

Hey, that's not a bug, that's a FEATURE!

You know what the most complex piece of engineering known to man in the whole solar system is?

Guess what - it's not Linux, it's not Solaris, and it's not your car.

It's you. And me.

And think about how you and me actually came about - not through any complex design.

Right. "sheer luck".

Well, sheer luck, AND:

I'm deadly serious: we humans have _never_ been able to replicate something more complicated than what we ourselves are, yet natural selection did it without even thinking.

Don't underestimate the power of survival of the fittest.

And don't EVER make the mistake that you can design something better than what you get from ruthless massively parallel trial-and-error with a feedback cycle. That's giving your intelligence _much_ too much credit.

Quite frankly, Sun is doomed. And it has nothing to do with their engineering practices or their coding style.

Rik replied venomously:

Don't forget the fact that 2.4 is the first kernel you managed to get stable under high load since 1.2.

Both 2.0 and 2.2 didn't get stable until Alan took over and Alan's 2.4 fork got stable some 4 months before your 2.4 tree got stable.

I think you've pretty much proven how well random development works.

Close by, Alan backed up some of Rik's criticism, using the Virtual Memory subsystem as the example:

Linus kept ignoring, refusing and merging conflicting patches. The -ac tree since 2.4.9-ac or so with Rik's actual fixes he wanted Linus to takes passes the Red Hat test suite. 2.4.16 kind of passes it now.

It had nothing to do with the VM being broken and everything to do with what Linus applied. As it happens it looks like the new VM is better performing for low loads which is good, but the whole VM mess wasn't bad QA and wasn't bad design.

Linus was even ignoring patches that fixed comments in the VM code that referenced old behaviour. And due to the complete lack of VM documentation at the moment I can only assume he's been dropping Andrea's VM comments/docs too.

Stanislav Meduna asked if Red Hat's test suite was available anywhere; and Chris Ricker and David gave links to http://people.redhat.com/bmatthews/cerberus/.

Elsewhere, several folks seemed to think Linus was advocating making random changes to the code in the hopes that evolution would sort it all out over the millennia. As Tim Hockin put it, "a very interesting argument, but not very pertinent - we don't have 10's of thousands of year or even really 10's of years. We have to use intellect to root out the obviously bad ideas, and even more importantly the bad-but-not-obviously-bad ideas." Linus clarified:

Directed evolution - ie evolution that has more specific goals, and faster penalties for perceived failure, works on the scale of tens or hundreds of years, not tens of thousands. Look at dog breeding, but look even more at livestock breeding, where just a few decades have made a big difference.

The belief that evolution is necessarily slow is totally unfounded.

HOWEVER, the belief that _too_ much direction is bad is certainly not unfounded: it tends to show up major design problems that do not show up with less control. Again, see overly aggressive breeding of some dogs causing bad characteristics, and especially the poultry industry.

And you have to realize that the above is with entities that are much more complex than your random software project, and where historically you have not been able to actually influence anything but selection itself.

Being able to influence not just selection, but actually influencing the _mutations_ that happen directly obviously cuts down the time by another large piece.

In short, your comment about "not pertinent" only shows that you are either not very well informed about biological changes, or, more likely, it's just a gut reaction without actually _thinking_ about it.

Biological evolution is alive and well, and does not take millions of years to make changes. In fact, most paleolontologists consider some of the changes due to natural disasters to have happened susprisingly fast, even in the _absense_ of "intelligent direction".

Of course, at the same time evolution _does_ heavily tend to favour "stable" life-forms (sharks and many amphibians have been around for millions of years). That's not because evolution is slow, but simply because good lifeforms work well in their environments, and if the environment doesn't change radically they have very few pressures to change.

There is no inherent "goodness" in change. In fact, there are a lot of reasons _against_ change, something we often forget in technology. The fact that evolution is slow when there is no big reason to evolve is a _goodness_, not a strike against it.

Elsewhere, Victor Yodaiken also didn't buy the "sheer luck" argument. He said, "Linux is what it is because of design, not accident. And you know that better than anyone." But Linus came back with:

Let's just be honest, and admit that it wasn't designed.

Sure, there's design too - the design of UNIX made a scaffolding for the system, and more importantly it made it easier for people to communicate because people had a mental _model_ for what the system was like, which means that it's much easier to discuss changes.

But that's like saying that you know that you're going to build a car with four wheels and headlights - it's true, but the real bitch is in the details.

And I know better than most that what I envisioned 10 years ago has _nothing_ in common with what Linux is today. There was certainly no premeditated design there.

And I will claim that nobody else "designed" Linux any more than I did, and I doubt I'll have many people disagreeing. It grew. It grew with a lot of mutations - and because the mutations were less than random, they were faster and more directed than alpha-particles in DNA.

Victor had also said (getting back to Larry's point), "The question is whether Linux can still be designed at current scale." To which Linus said in his same post:

Trust me, it never was.

And I will go further and claim that _no_ major software project that has been successful in a general marketplace (as opposed to niches) has ever gone through those nice lifecycles they tell you about in CompSci classes. Have you _ever_ heard of a project that actually started off with trying to figure out what it should do, a rigorous design phase, and a implementation phase?

Dream on.

Software evolves. It isn't designed. The only question is how strictly you _control_ the evolution, and how open you are to external sources of mutations.

And too much control of the evolution will kill you. Inevitably, and without fail. Always. In biology, and in software.

Amen.

Horst von Brand also replied to Victor:

I'd say it is better because the mutations themselves (individual patches) go through a _very_ harsh evaluation before being applied in the "official" sources. There is a population of kernels out there (each developer has a few, distributions check out others, ...), only what survives there has got a chance to be considered for inclusion.

Sure, this is not the only way Linux moves forward. But it is a large factor in the success of "Release early. Release often. Take patches from everywhere."

Larry felt that all this "evolution" stuff was just pure nonsense, and Linus "spouting off". Elsewhere, he replied directly to Linus, saying:

Yeah, right, Linus. We should all go home and turn loose the monkeys and let them pound on the keyboard. They'll just as good a job, in fact, by your reasoning they'll get there faster, they aren't so likely to waste time trying to design it.

I can't believe the crap you are spewing on this one and I don't think you do either. If you do, you need a break. I'm all for letting people explore, let software evolve, that's all good. But somebody needs to keep an eye on it.

If that's not true, Linus, then bow out. You aren't needed and *you* just proved it. You can't have it both ways. Either you are here for a reason or you aren't. So which is it? You're arguing that you don't matter. I personally don't agree, I think Linux would be a pile of dog doo without you. If you don't agree, back off and see what happens.

Daniel Phillips replied, "If you've been involved in any design sessions with Linus - if you could even grace them with that name - you know he relies way more on intuition than process. Actually, far from taking the role of the omniscient creator, he tends to act more like the survival test. Not that he's short of the necessary skills to do it your way. I think he does it the way he does it because it's fun and interesting and, oh yes, effective."

Linus also replied to Larry. To the idea that someone had to "keep an eye" on software evolution, Linus chuckled, "Like somebody had to keep an eye on our evolution so that you had a chance to be around?" And to the idea of his indispensability, he said:

Oh, absolutely.

I wish more people realized it. Some people realize it only when they get really pissed off at me and say "Go screw yourself, I can do this on my own". And you know what? They are right too, even if they come to that conclusion for what I consider the wrong reasons.

The reason I'm doing Linux is not because I think I'm "needed". It's because I enjoy it, and because I happen to believe that I'm better than most at it. Not necessarily better than everybody else around there, but good enough, and with the social ties to make me unbeatable right now.

But "indispensable"? Grow up, Larry. You give me too much credit.

And why should I bow out just because I'm not indispenable? Are you indispensable for the continued well-being of humanity? I believe not, although you are of course free to disagree. Should you thus "bow out" of your life just because you're strictly speaking not really needed?

Do I direct some stuff? Yes. But, quite frankly, so do many others. Alan Cox, Al Viro, David Miller, even you. And a lot of companies, which are part of the evolution whether they realize it or not. And all the users, who end up being part of the "fitness testing".

And yes, I actually do believe in what I'm saying.

Elsewhere, he added, "the people who think you "design" software are seriously simplifying the issue, and don't actually realize how they themselves work." And again elsewhere, he said:

The impressive part is that Linux development could _look_ to anybody like it is that organized.

Yes, people read literature too, but that tends to be quite spotty. t's done mainly for details like TCP congestion control timeouts etc - they are _important_ details, but at the same time we're talking about a few hundred lines out of 20 _million_.

And no, I'm no tclaiming that the rest is "random". But I _am_ claiming that there is no common goal, and that most development ends up being done for fairly random reasons - one persons particular interest or similar.

It's "directed mutation" on a microscopic level, but there is very little macroscopic direction. There are lots of individuals with some generic feeling about where they want to take the system (and I'm obviously one of them), but in the end we're all a bunch of people with not very good vision.

And that is GOOD.

A strong vision and a sure hand sound like good things on paper. It's just that I have never _ever_ met a technical person (including me) whom I would trust to know what is really the right thing to do in the long run.

Too strong a strong vision can kill you - you'll walk right over the edge, firm in the knowledge of the path in front of you.

I'd much rather have "brownian motion", where a lot of microscopic directed improvements end up pushing the system slowly in a direction that none of the individual developers really had the vision to see on their own.

And I'm a firm believer that in order for this to work _well_, you have to have a development group that is fairly strange and random.

To get back to the original claim - where Larry idolizes the Sun engineering team for their singlemindedness and strict control - and the claim that Linux seems ot get better "by luck": I really believe this is important.

The problem with "singlemindedness and strict control" (or "design") is that it sure gets you from point A to point B in a much straighter line, and with less expenditure of energy, but how the HELL are you going to consistently know where you actually want to end up? It's not like we know that B is our final destination.

In fact, most developers don't know even what the right _intermediate_ destinations are, much less the final one. And having somebody who shows you the "one true path" may be very nice for getting a project done, but I have this strong belief that while the "one true path" sometimes ends up being the right one (and with an intelligent leader it may _mostly_ be the right one), every once in a while it's definitely the wrong thing to do.

And if you only walk in single file, and in the same direction, you only need to make one mistake to die.

In contrast, if you walk in all directions at once, and kind of feel your way around, you may not get to the point you _thought_ you wanted, but you never make really bad mistakes, because you always ended up having to satisfy a lot of _different_ opinions. You get a more balanced system.

This is what I meant by inbreeding, and the problem that UNIX traditionally had with companies going for one niche.

(Linux companies also tend to aim for a niche, but they tend to aim for _different_ niches, so the end result is the very "many different directions" that I think is what you _want_ to have to survive).

Ingo Molnar agreed with Linus, and responded to his "we're all a bunch of people with not very good vision", saying:

the fundamental reasons why this happens are the following, special conditions of our planet:

due to these fundamental issues the development of 'technology' becomes very, very unpredictable. We only have 5 billion brain cells, and there's no upgrade path for the time being. Software projects such as Linux are already complex enough to push this limit. And the capacity limits of the human brain, plus the striving towards something better (driven by human needs) cause thousands of ambitios people working on parts of Linux in parallel.

a few things are clear:

but the reality is that we humans have severe limits, and we do not understand this random world yet, so we'll unevitably have lots of fun writing random pieces of Linux (or other) code in many many years to come.

in fact, most computer science books are a glaring example of how limited the human brain is, and how small and useless part of the world we are able to explain exactly ;-)

and frankly, i'd *very* disappointed if it was possible to predict the future beyond our lifetime, and if it was possible to design a perfect enough OS that does not need any changing in the foreseable future. I'd be a pretty disappointed and bored person, because someone would predict what happens and we'd only be dumbly following that grand plan. But the reality is that such grand plan does not exist. One of the exciting things about developing an OS is that we almost never know what's around the next corner.

This whole effort called Linux might look structured on the micro-level (this world *does* appear to have some rules apart of chaos), but it simply *cannot* be anything better than random on the macro (longterm) level. So we better admit this to ourselves and adapt to it.

And anyone who knows better knows something that is worth a dozen Nobel prizes.

It was a very long discussion.

2. Networking Documentation

30�Nov�2001�-�9�Dec�2001 (21 posts) Archive Link: "Finally, CBQ nearly completely documented"

People: Bert Hubert

Bert Hubert announced:

After preparing my talk on CBQ/HTB (http://ds9a.nl/cbq-presentation), I finally understood how CBQ and filters etc truly work. And I wrote it down. Check out the Linux Advanced Routing & Shaping HOWTO, it's changed a lot!

Especially this part is very new, please check it for mistakes and inconsistencies:

http://ds9a.nl/2.4Routing/HOWTO//cvs/2.4routing/output/2.4routing-9.html

I even got 'split' and 'defmap' figured out, which should be a first. There is not a single other page online that tells you correctly what these do.

One thing - does *anybody* understand how hash tables work in tc filter, and what they do? Furthermore, I could use some help with the tc filter police things.

So if you do understand how these work, please drop me a line.

He replied to himself a couple days later, with more enhancements:

Thanks to Andreas Steinmetz and David Sauer, tc hash tables are now documented as well, thanks!

See:

http://ds9a.nl/2.4Routing/HOWTO//cvs/2.4routing/output/2.4routing-12.html

And then 'Hashing filters for very fast massive filtering'.

I also finished documenting all parameters for TBF, CBQ, SFQ, PRIO, bfifo, pfifo and pfifo_fast. All queues in the Linux kernel are now described in the Linux Advanced Routing & Shaping HOWTO, which can be found on

http://ds9a.nl/2.4Routing

I want to send this off to the LDP and Freshmeat somewhere next week, I *would really* like people who are knowledgeable about this subject (this means you, ANK & Jamal 8) ) to read through this.

This HOWTO is rapidly becoming the perceived authoritative source for traffic control in linux (google on 'Linux Routing' finds it), it might as well be right! So if you have any time at all, check the parts you know about. I expect mistakes.

Someone else who'd been involved in this kind of documentation chimed in, and they had a technical discussion about it.

3. New Build Tools

2�Dec�2001�-�11�Dec�2001 (134 posts) Archive Link: "Converting the 2.5 kernel to kbuild 2.5"

Topics: Kernel Build System

People: Keith Owens,�Eric S. Raymond,�Christoph Hellwig,�David Woodhouse,�Giacomo Catenazzi,�Matthias Andree,�Alan Cox,�Dave Jones,�Edward Muller,�Horst von Brand,�Linus Torvalds

Keith Owens proposed:

Linus, the time has come to convert the 2.5 kernel to kbuild 2.5. I want to do this in separate steps to make it easier for architectures that have not been converted yet.

2.5.1 Semi-stable kernel, after bio is working.

2.5.2-pre1 Add the kbuild 2.5 code, still using Makefile-2.5. i386, sparc, sparc64 can use either kbuild 2.4 or 2.5, 2.5 is recommended. ia64 can only use kbuild 2.5. Other architectures continue to use kbuild 2.4. Wait 24 hours for any major problems then -

2.5.2-pre2 Remove kbuild 2.4 code, rename Makefile-2.5 to Makefile. i386, ia64, sparc, sparc64 can compile using kbuild 2.5. Other architectures cannot compile until they convert to kbuild 2.5. The kbuild group can help with the conversion but without access to a machine we cannot test other architectures. Until the other archs have been converted, they can stay on 2.5.2-pre1.

Doing the change in two steps provides a platform where both kbuild 2.4 and 2.5 work. This allows other architectures to parallel test the old and new kbuild during their conversion, I found that ability was very useful during conversion.

The CML1 to CML2 conversion comes later, either in 2.5.3 or 2.5.4.

Linus, is this acceptable?

Linus Torvalds didn't participate in this thread, but Eric S. Raymond replied to Keith, saying, "The schedule I heard from Linus at the kernel summit was that both changes were to go in between 2.5.1 and 2.5.2. I would prefer sooner than later because I'm *really* *tired* of maintaining a parallel rulebase." Elsewhere, Christoph Hellwig asked, "Is the CML2 merge actually agreed on? I still strongly object to it and I know lots of kernel hackers are the same opinion." Eric confirmed that CML2 was going in. In the course of discussion, he also said, "by the way, there is no CML1 :-). Instead, there are four mutually incompatible dialects and a rulebase that breaks in different ways depending on which interpreter you use. Well, maybe just three mutual incompatible dialects and one clone -- but it's notoriously hard to verify that two interpreters have the same accept language, so nobody knows for sure." Christoph replied:

There is a CML1 language specification, as written down in a file, namely Documentation/kbuild/config-language.txt in the kernel tree. There is one tool (mconfig) which has a yacc-parser that implements that specification completly, and some horrid ugly scripts in the tree that parse them in a more or less working way. There also are a number of other tools I don't know to much about that understand the language as well.

All of these tools just require the runtime contained in the LSB and no funky additional script languages. Also none needs a binary intermediate representation of the config.

Eric zeroed in on the language reference:

I quote Linus at the 2.5 kernel summit: "Python is not an issue." Unless and until he changes his mind about that, waving around this kind of argument is likely to do your case more harm than good.

If you want to re-open the case for saving CML1, you'd be better off demonstrating how CML1 can be used to (a) automatically do implied side-effects when a symbol is changed, (b) guarantee that the user cannot generate a configuration that violates stated invariants, and (c) unify the configuration tree so that the equivalents of arch/* files never suffer from lag or skew when an architecture-independent feature is added to the kernel.

Christoph replied that Python was an issue, for him and others. He claimed that CML1 could do all the things Eric wanted, though he didn't see the need to do some of them. Eric said to this, "You can spend all week telling us how easy it would be to implement all the CML2 benefits that CML1 doesn't have, if you like -- but one of the rules of this game is that an ounce of working code beats a pound of handwaving." David Woodhouse replied:

FWIW I have no particular problem with CML2. I agree that CML1 is fairly limited, and can see the advantages in ditching it for a new language.

I do have objections to some of the other ideas which have been floated for changing the behaviour of the config rules, which aren't strictly related to the change in language.

I just want to make sure that the introduction of CML2 doesn't sneak in controversial changes to the config behaviour to make my Aunt Tilley happy, when those changes should be given individual consideration, not presented as a fait accomplis.

If I can't have one without the other, I'd rather not have either - CML1 may indeed suck, but it doesn't suck _that_ much.

But I figure we can trust you not to do that - can't we?

Eric asked what in particular David wanted, and David replied, "Simply assure me that I don't have to scan every line of the CML2 files for such changes, and that you'll make a reasonable effort to make the first set of CML2 rules match the existing CML1 behaviour, without introducing any controversial changes." And Eric said, "People like Giacomo and my other beta testers are keeping an eye on me. Don't sweat it." Close by, Giacomo Catenazzi also replied to David's concerns, exlaining:

The rules are nearly the same (but written in another language). The problem was in converting rules: esr found a lot of error: these error should be corrected, also some rules are different.

Also converting rules, you surelly found some error: i.e. wrong dependencies syntax, wrong implementation,....

I don't think esr changed non problematic rules, but one: all rules without help become automatically dependent to CONFIG_EXPERIMENTAL. I don't like it, but I understand why he makes this decision.

Remember: The config.in files contain a lot of errors, and automatic tools can not find all.

David and others started to object to Giacomo's third paragraph, but Eric quickly corrected Giacomo, saying that it was CONFIG_EXPERT, not CONFIG_EXPERIMENTAL. And he said:

this change is not wired in. Comment out this declaration in the top-level rulesfile:

condition nohelp on EXPERT

and it reverts to old behavior.

David replied, "Good. Please make that the default when submitting the first version of CML2. You can submit patches which effect the change in behaviour later, and they can be individually considered."

Elsewhere, getting back to the subject of whether the kernel should depend on having Python installed in the system, Matthias Andree asked Christoph what exactly was his objection to Python. Alan Cox pointed out that the dependency would be on Python2, which meant that most users wouldn't have it. Matthias replied:

Every new kernel version required new tools, 2.2 particularly many, 2.4 also some, so it's just one more tool to add in the end.

Current distributions already ship with Python2, and probably all will when distributors know that Python2 will be needed to configure Linux 2.5 or 2.6.

Eric also thought Alan was overstating the unavailability of Python2, but the discussion skewed off at that point.

Earlier in the discussion Matthias had said, "Seriously: what do you fear? Losing the efforts you put into mconfig? Linux 2.2 and 2.4 will be around for quite some time (not sure about mconfig on 2.0, I don't use 2.0.x ATM)." At this point Eric confessed:

Oops. I wasn't going to tell anyone this yet, but since you've made this argument I feel I must be up front here....

After CML2 has proven itself in 2.5, I do plan to go back to Marcelo and lobby for him accepting it into 2.4, on the grounds that doing so will simplify his maintainance task no end. That's why I'm tracking both sides of the fork in the rulebase, so it will be an easy drop-in replacement for Marcelo as well as Linus.

Giacomo yelled out:

Don't do it! A stable kernel should be stable also on the building tools. When Marcelo will correct some grave potential security problem, the user will rebuild the kernel and it will found that it must install some other package (machine with 2.4 are now common, python2 not yet so common) to secure his kernel, it would be happy.

This is an example, but for a better maintainability you will give serious problem to the novice kernel user.

But Eric said simply that the final decision would rest with Marcelo. But Alan also felt that a 2.2 backport would be "somewhat impractical. You will break all the existing additional configuration tools for the 2.4 stable tree that people expect to continue to work. Breaking them in 2.5 isnt a big issue, but breaking stable kernel trees is a complete nono." And Dave Jones summed up his own objections, saying, "So anyone perfectly happy with an older distro that didn't ship python2-and-whatever-else gets screwed when they want to build a newer kernel. Nice." Edward Muller pointed out, "That's been the case all along, sans python2. Newer kernels need newer tools. That's always been the case." But Alan said, "Not during stable releases. In fact we've jumped through hoops several times to try and keep egcs built kernels working." Trevor Smith, a little behind in the conversation, thought they were only talking about 2.5, but Alan quipped, "Erik is talking about crapping in both trees, as opposed to 2.5 only."

Elsewhere along the same sub-thread, Horst von Brand bemoaned, "I just shudder when thinking that I'll have to learn yet another weird language to be able to hack on Linux... C, gcc-isms with asm() and all, a bit of CML1, now CML2, are OK; and now Python..." A lot of people pointed out that no one would need Python to use CML2, but Horst replied that hacking on CML2 was kernel hacking, and required learning Python.

4. 2.4 Development

2�Dec�2001�-�8�Dec�2001 (25 posts) Archive Link: "[PATCH] 2.4.16 kernel/printk.c (per processor initialization check)"

People: Andrew Morton,�Marcelo Tosatti,�David Mosberger,�Alan Cox,�William Lee Irwin III

In the course of a bug hunt, Andrew Morton proposed for 2.2:

Marcelo,

after a fairly lengthy off-list discussion, it turns out that special-casing printk is probably the best way to proceed to fix this one.

The problem is that the boot processor sets up the console drivers, and is able to call them via printk(), but the application processors at that time are not able to call printk() because the console device driver mappings are not set up. Undoing this inside the ia64 boot code is complex and fragile. Possibly the problem exists on other platforms but hasn't been discovered yet.

So the patch defines an architecture-specific macro `arch_consoles_callable()' which, if not defined, defaults to `1', so the impact to other platforms (and to uniprocessor ia64) is zero.

Marcelo Tosatti replied, "How hard would it be to fix that on ia64 code? I'm really not willing to apply this kludge..." William Lee Irwin III gave some technical reasons for doing it, and David Mosberger felt the real question was just, "Do you agree that it should always be safe to call printk() from C code?" Alan Cox said that this might sound good in theory, but that it was really up to themaintainers of each architecture to get it right. And Marcelo also replied to David, answering:

No if you can't access the console to print the message :)

Its just that I would prefer to see the thing fixed in arch-dependant code instead special casing core code.

But David replied:

Only architecture specific problems should be fixed with architecture specific code.

I'm not entirely sure whether this particular problem is architecture specific. Perhaps it is and, if so, I'm certainly happy to fix it in the ia64 specific code.

But, he went on, he was skeptical of that. Marcelo replied, "Prove, please. If you show me it can also happen on other architectures, I'll be glad to apply the patch." David and Alan went back and forth for a bit on implementation details, and the thread petered out.

5. 2.4 Release Schedule

6�Dec�2001�-�11�Dec�2001 (52 posts) Archive Link: "Linux 2.4.17-pre5"

People: Marcelo Tosatti

Marcelo Tosatti released Linux 2.4.17-pre5 and added, "I'm going to release -pre versions more often from now on so people can "see" what I'm doing with less latency: I hope that can make developer's life easier."

6. Developer Scuffle In 2.4

6�Dec�2001�-�10�Dec�2001 (23 posts) Archive Link: "devfs unable to handle permission: 2.4.17-pre[4,5] / ALSA-0.9.0beta[9,10]"

Topics: Backward Compatibility, FS: devfs

People: Richard Gooch,�Roman Zippel,�Marcelo Tosatti,�Rene Rebe

Rene Rebe reported some anomalous devfs behavior in 2.4.17-pre4, and Richard Gooch replied that he had improved devfs' behavior in the latest version, to discard duplicate entries. But Roman Zippel replied that Richard shouldn't change the behavior in the stable series, since the option had always been valid up to that point. Richard replied, "Well, no, it was never a valid option. It was always a bug. In any case, the stricter behaviour isn't preventing people from using their drivers, it's just issuing a warning. The user-space created device node still works." Roman replied, "But the driver doesn't. You changed the driver API in subtle way! You cannot change the behaviour of devfs_register during 2.4. Do whatever you want in 2.5, but drivers depend on the current behaviour and devfs has to be fixed not these drivers." Richard disagreed that the drivers would no longer work. He reiterated that the new code only produced a warning. Roman presented, "devfs_mk_dir returns an error now, so the driver won't be able to make new dev nodes available. So far it was legal to manually create a directory under devfs, now it's suddenly an error." Richard replied, "It was always an error, you just got away with it." Roman blew his stack at this point, asking how Richard could expect anyone to take devfs seriously if "You suddenly change the behaviour of devfs during a stable release in a noncompatible way. Distributions and their users that followed _your_ advice are suddenly fucked up, that's irresponsible." He pointed to one of Richard's own scripts that committed the very same error, but Richard only replied blithely, "Oh, the "tar kludge". That script has been obsolete for over a year and a half. I should have removed it ages ago. I really should get around to doing that one day." Roman replied (Ccing Marcelo Tosatti), "You should have done this a year ago. Permission management with the "tar kludge" was a valid option so far and is currently in use. There was no warning period that this future would be obsolete." [...] "The tar solution only works until 2.4.16, the new devfsd provides this only with 2.4.17. I'll leave the final decision to Marcelo, whether he accepts this or not. I shut up now, may someone else explain the meaning of compatibility to you." Marcelo, not having read the whole thread, asked Roman to explain the problem in detail. Richard replied:

I can explain it. The old devfs core was forgiving of duplicate entries, while the new one is not (it now gives EEXIST errors).

There are some broken boot scripts (modelled after the long obsolete rc.devfs script) which dump a whole bunch of inodes in /dev, prior to loading various modules. So the drivers which load after this will not be able to create their devfs entries (because said entries already exist).

This is not actually a problem for leaf nodes, since the user-space created device nodes will still work. It just results in a warning message. It is potentially a problem for directories, if the following conditions are all met:

This is a fairly rare case. Usually, if you are "restoring" some inodes, you will be restoring the individual device nodes as well as the parent directory (otherwise, what's the point of restoring?). So, in this case, the device nodes that the user wants to use will still be there (created by the boot script) and will work fine. There will just be a bunch of warning messages.

Possibly, depending on the driver, the device nodes it tried to create may appear in the /dev directory, rather than the intended subdirectory. While perhaps messy, this isn't actually harmful.

This thread was spawned because of a bug report with two issues. The first issue was the harmless warning messages about duplicate leaf node entries. Nothing broke.

The second issue was due to a broken devfsd configuration file which caused the wrong permissions to be set on a directory. This led to Roman thinking that the new devfs core was breaking stuff. As I've shown above, the breakage is a rare corner case involving an obsolete script.

To Richard's reference to broken boot scripts, Roman followed with, "Which is still included in the kernel tree and at least Mandrake is currently using it. There were no signs of deprecation, so people are legally using it." And to Richard's statement that that the behavior was still the same, but just produced additional warning messages, Roman tacked on, "Wrong, these are not just warning messages, the driver API has changed." To Richard's statement that the device nodes would still be available and would work fine, Roman said, "Except the dynamic update of device nodes won't happen anymore, so it affects also all leaf nodes in the directories (e.g. partition entries won't be created/removed anymore). Events won't be created for these nodes as well, so configurations depending on this are broken as well." And to Richard's statement that the only actual breakage was with a rare corner case, Roman exploded again, with, ""rare corner case"??? Richard, this isn't funny anymore. :-( BTW restoring backward compatibility is probably just a couple of lines code, but first you had to admit that it's broken."

Marcelo asked, "Richard, Are the above problems really introduced by the changes?" Richard admitted they were, but added that he still felt it was a very rare circumstance that would cause any trouble. He said:

In general, if you are tarring and untarring inodes, you take the whole directory and put it all back again. Even the partitioning event is a corner case, since you're most likely to install a new drive (and thus have no inodes to "restore") and then partition. And even the obsolete rc.devfs only saved away inodes which had been changed, not everything.

However, if this concerns you, I can send a patch that effectively restores the old behaviour for directories. It's just a matter of grabbing the right lock, fiddling a flag and returning a different entry. But I definately want to keep a warning message. I want there to be some pain for broken or really obsolete configurations.

Marcelo replied that it would be fine to leave the warning message in, but that the change of behavior should go into 2.5. End Of Thread (tm).

7. Divergence Of 2.4 And 2.5

10�Dec�2001 (9 posts) Archive Link: "Patches in 2.4.17-pre2 that aren't in 2.5.1-pre8"

Topics: MAINTAINERS File, Sound: i810

People: Alan Cox,�Marcelo Tosatti,�Nicolas Aspert,�Robert Love,�Adrian Bunk

Adrian Bunk felt that patches going into 2.4 should also go into 2.5 by default, though he'd noticed a lot of patches were not going in to 2.5, though they went into 2.4 right away. Alan Cox explained, "In many cases that isnt true, and for a lot of the pending patches its pointless merging them into 2.5 until 2.5 gets into better shape. Going back over them as you have done is something that does need doing, but not until the block layer has some semblance of completion about it." Nicolas Aspert had the opposite problem. He'd submitted a patch that had gone right away into 2.5, but Marcelo Tosatti hadn't accepted it for 2.4. Marcelo replied:

Who is the maintainer of the driver ?

Try to think from my side: I may have no hardware or time to test all patches which come to me.

Please, people, send this kind of driver changes to the people who know all hardware specific details.

If there is no maintainer for i810, I'll be glad to apply it on 2.4.18pre and wait for reports. Not going to be on 2.4.17, though.

Alan replied to Marcelo's second paragraph with a smile, "Ditto. Thats what 2.4.x-pre_1_ is for 8)"

Nicolas said that the MAINTAINERS file listed no one as the maintainer, and so all patches had to go to Marcelo by default. But he added, "You're the boss ;-) As I said, I got at least 3 successful feedbacks for this patch, and the fact that Linux got it into the tree kinda confirms my thought that there might not be too many things broken in it. The choice of putting it into 2.4.17 or 2.4.18 is entirely yours, and I accept it whatever it is. Keep me informed ;-)" At this point, Robert Love chimed in, "The maintainer is MIA. I have been doing recent work on the driver. I can confirm Nicolas patch is correct." And Marcelo replied simply, "Lets wait 2.4.18pre for this one, OK ?"

8. 2.4 Development Philosophy

10�Dec�2001 (5 posts) Archive Link: "Linux 2.4.17-pre8"

People: Marcelo Tosatti,�Oliver Xymoron

Marcelo Tosatti announced 2.4.17-pre8, saying, "Here goes pre8: The next one is going to be -rc1 so please don't send me any more updates and only bugfixes now. Updates will be queued for 2.4.18-pre1." He included a set of one-line changelog information, and Oliver Xymoron replied, "I'd like to suggest again having patches include change logs. The basic idea is for a patch to contain a file like patch.foo in the top-level that includes the changelog entry and the maintainer runs a release script that build a changelog by concatenating all the patch.* files and then either appending them to an actual ChangeLog (preferred) or deleting them. This would make it much easier for people to know the details of what got fixed without further burdening The Maintainer. One way to get this started would be to gather up all the existing change logs, add them to a ChangeLog file, and add a note about the file being auto-generated." Marcelo replied, "I know this is a much better thing to do for the changelog... However I really want to spend my available time now on letting 2.4 in a better state." To which Oliver said, "Fair enough. Perhaps I'll send you a patch/script after a few dot releases go by."

Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.