Kernel Traffic #303 For 3 Apr 2005

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 2524 posts in 14MB. See the Full Statistics.

There were 787 different contributors. 277 posted more than once. The average length of each message was 92 lines.

The top posters of the week were: The top subjects of the week were:
148 posts in 689KB by Andrew Morton
128 posts in 569KB by Greg KH
68 posts in 649KB by Adrian Bunk
67 posts in 410KB by Evgeniy Polyakov
59 posts in 256KB by Jeff Garzik
341 posts in 2MB for "RFD: Kernel release numbering"
36 posts in 160KB for "[PATCH] [request for inclusion] Realtime LSM"
35 posts in 167KB for "BUG: Slowdown on 3000 socket-machines tracked down"
35 posts in 140KB for "[RFC] -stable, how it's going to work."
26 posts in 153KB for "current linus bk, error mounting root"

These stats generated by mboxstats version 2.2

1. Linux 2.6.11 Released

2 Mar 2005 - 12 Mar 2005 (19 posts) Archive Link: "Linux 2.6.11"

Topics: Kernel Release Announcement

People: Linus Torvalds

Linus Torvalds announced Linux 2.6.11, saying, "there it is. Only small stuff lately - as promised. Shortlog from -rc5 appended, nothing exciting there, mostly some fixes from various code checkers (like fixed init sections, and some coverity tool finds). So it's now _officially_ all bug-free."

2. Discussion Of Kernel Version Numbering

2 Mar 2005 - 13 Mar 2005 (352 posts) Archive Link: "RFD: Kernel release numbering"

Topics: FS: NFS, Forward Port, Microkernels, Version Control

People: Linus TorvaldsGreg KHAndrew MortonWilly TarreauJeff GarzikTheodore Ts'oAlan CoxRussell KingChris WrightDave JonesLars Marowsky-Bree

This thread was so long that it spilled over into this issue of Kernel Traffic, even though several shorter threads that resulted from it were covered in last-week's issue.

Linus Torvalds said:

This is an idea that has been brewing for some time: Andrew has mentioned it a couple of times, I've talked to some people about it, and today Davem sent a suggestion along similar lines to me for 2.6.12.

Namely that we could adopt the even/odd numbering scheme that we used to do on a minor number basis, and instead of dropping it entirely like we did, we could have just moved it to the release number, as an indication of what was the intent of the release.

The problem with major development trees like 2.4.x vs 2.5.x was that the release cycles were too long, and that people hated the back- and forward-porting. That said, it did serve a purpose - people kind of knew where they stood, even though we always ended up having to have big changes in the stable tree too, just to keep up with a changing landscape.

So the suggestion on the table would be to go back to even/odd, but do it at the "micro-level" of single releases, rather than make it a two- or three-year release cycle.

In this setup, all kernels would still be _stable_, in the sense that we don't anticipate any real breakage (if we end up having to rip up so much basic stuff that we have to break anything, we'd go back to the 2.7.x kind of numbering scheme). So we should fear odd releases, but track them, to make sure that they are good (if you don't track them, and problems won't be fixed in the even version either)

But we'd basically have stricter concerns for an even release, and in particular the plan would be that the diff files would alternate between bigger ones (the 2.6.10->11 full diff was almost 5MB) and smaller ones (a 2.6.11->12 release would be a "stability only" thing, and hopefully the diff file would be much smaller).

We'd still do the -rcX candidates as we go along in either case, so as a user you wouldn't even _need_ to know, but the numbering would be a rough guide to intentions. Ie I'd expect that distributions would always try to base their stuff off a 2.6.<even> release.

It seems like a sensible approach, and it's not like the 2.4.x vs 2.5.x kind of even/odd thing didn't _work_, the problems really were an issue of too big granularity making it hard for user and developers alike. So I see this as a tweak of the "let's drop the notion althogether for now" decision, and just modify it to "even/odd is meaningful at all levels".

In other words, we'd have an increasing level of instability with an odd release number, depending on how long-term the instability is.

with the odd numbers going like:

The reason I put a shorter timeframe on the "all-even" kernel is because I don't want developers to be too itchy and sitting on stuff for too long if they did something slightly bigger. In theory, the longer the better there, but in practice this release numbering is still nothing but a hint of the _intent_ of the developers - it's still not a guarantee of "we fixed all bugs", and anybody who expects that (and tries to avoid all odd release entirely) is just setting himself up for not testing - and thus bugs.

There was a wide array of responses to this. Lars Marowsky-Bree thought the whole idea was overblown and unnecessary, and would only end up confusing users instead of helping them. Various other folks agreed with this, but theirs were not the loudest voices.

Greg KH was willing to give Linus' idea a try, but he said, "this puts a bigger burden on the maintainers to queue up patches for you. It's not that big of a deal, just something to be aware of." Elsewhere, Russell King had similar objections, pointing out that if maintainers had to sit on their patches for longer periods of time, they were likely to suffer from bit-rot. Linus agreed that these were valid concerns, and that it might not always be obvious which kernel version a patch belonged in; but this didn't deter him.

Dave Jones suggested simply using the w.x.y.z approach initiated with 2.6.8.1, as a means of stablizing the tree. This idea would be mentioned in various posts throughout the thread, gradually gaining force.

Elsewhere, in the midst of the fray, Andrew Morton gave his opinion on the state of things:

I would maintain that we're still fixing stuff faster than we're breaking stuff. If you look at the fixes which are going into the tree (and there are a HUGE number of fixes), many of them are addressing problems which have been there for a long time.

So as long as we remain in this state, we don't need to do anything. The technology gets closer to a product until we reach the stage where the fixage rate equals the breakage rate. And we're not there yet.

(It's nice that patches are called "fix the frobnozzle gadget", but this analysis would be a lot easier if people would also label their patches "break the frobnozzle gadget" when that's what they do. Oh well).

So I'd suspect that on average, kernel releases are getting more stable. But the big big problem we have is that even though we fixed ten things for each one thing we broke, those single breakages tend to be prominent, and people get upset. It's fairly bad PR that Dell Inspiron keyboards don't work in 2.6.11, for example...

And people will incorrectly (and even wildly) generalise as a result of such silly little isolated bugs. We can wholly address such problems with a 2.6.x.y productisation series.

And something else:

I don't think 2.2 and 2.4 models are applicable any more. There are more of us, we're better (and older) than we used to be, we're better paid (and hence able to work more), our human processes are better and the tools are better. This all adds up to a qualitative shift in the rate and accuracy of development. We need to take this into account when thinking about processes.

It's important to remember that all those kernel developers out there *aren't going to stop typing*. They're just going to keep on spewing out near-production-quality code with the very reasonable expectation that it'll become publically available in less than three years. We need processes which will allow that.

And another else:

Many people on this mailing list want a super-stable kernel as their first (and sometimes only) priority (the product group). But others have other requirements: to make their code avaialble, or to get their hardware supported, or to fix that scalability problem (the technology group). The product group's interests are in conflict with the technology group's.

There will be no solution to this problem which is completely satisfactory to either party.

Elsewhere were other mixed reactions to Linus' initial request for discussion. Josh Boyer liked the idea overall, but thought a w.x.y.z versioning system needed to be shoe-horned into it somehow. Willy Tarreau said to Linux, "For a long time, I've been hoping/asking for a more frequent stable/unstable cycle, so clearly you can count my vote on this one (eventhough it might count for close to zero). This is a very good step towards a better stability IMHO."

Jeff Garzik joined the "it's too confusing" chorus, adding:

it exacerbates an on-going issue: we are moving away from "release early, release often", as this proposal just pushes the list of pending stuff back even further.

Developers right now are sitting on big piles, and pushing that back even further means every odd release means you are creating a 2.4.x/2.5.x backport situation every two releases.

To take a radical position on the other side, I would prefer a weekly snapshot as the release, staging invasive things in -mm.

And I think -mm is not enough, even. We have to come up with new ways to manage this ever-increasing flow of data into our tree.

Elsewhere, Jeff said:

If we want a calming period, we need to do development like 2.4.x is done today. It's sane, understandable and it works.

2.6.x-pre: bugfixes and features
2.6.x-rc: bugfixes only

Linus, however, came down hard on this approach. He said, "No. It's insane, and the only reason it works is that 2.4.x is a totally different animal. Namely it doesn't have the kind of active development AT ALL any more. It _only_ has the "even" number kind of things, and quite frankly, even those are a lot less than 2.6.x has." He went on:

the reason it does _not_ work is that all the people we want testing sure as _hell_ won't be testing -rc versions.

That's the whole point here, at least to me. I want to have people test things out, but it doesn't matter how many -rc kernels I'd do, it just won't happen. It's not a "real release".

In contrast, making it a real release, and making it clear that it's a release in its own right, might actually get people to use it.

Might. Maybe.

At one point Jeff said, "We have all these problems precisely because _nobody_ is saying "I'm only going to accept bug fixes". We _need_ some amount of release engineering. Right now we basically have none." Linus replied:

I agree that this is one of the main problems.

But look at how to solve it. The _logical_ solution is to have a third line of defense: we have the -mm trees (wild and wacky patches), and we have my tree (hopefully not wacky any more), and it would be good to have a third level tree (which I'm just not interested in, because that one doesn't do any development any more) which only takes the "so totally not wild that it's really boring" patches.

In fact, if somebody maintained that kind of tree, especially in BK, it would be trivial for me to just pull from it every once in a while (like ever _day_ if necessary). But for that to work, then that tree would have to be about so _obviously_ not wild patches that it's a no-brainer.

So what's the problem with this approach? It would seem to make everybody happy: it would reduce my load, it would give people the alternate "2.6.x base kernel plus fixes only" parallell track, and it would _not_ have the testability issue (because I think a lot of people would be happy to test that tree, and if it was always based on the last 2.6.x release, there would be no issues.

Anybody?

I'll tell you what the problem is: I don't think you'll find anybody to do the parallell "only trivial patches" tree. They'll go crazy in a couple of weeks. Why? Because it's a _damn_ hard problem. Where do you draw the line? What's an acceptable patch? And if you get it wrong, people will complain _very_ loudly, since by now you've "promised" them a kernel that is better than the mainline. In other words: there's almost zero glory, there are no interesting problems, and there will absolutely be people who claim that you're a dick-head and worse, probably on a weekly basis.

That said, I think in theory it's a great idea. It might even be technically feasible if there was some hard technical criteria for each patch that gets accepted, so that you don't have the burn-out problem.

So let's loook at how we could set that up. We need:

Does this mean that some patches would never go into this tree? Yes. It would mean that patches that some people might feel very _strongly_ are good patches would never ever show up in this tree, but on the other hand, I can see this tree being useful regardless, and I think the lack of flexibility in this case is actually the whole _point_ of the tree. The lack of flexibility is the very thing that makes this be the kind of base that anybody else can then hang their own patches on top of. There should never be a situation where "I'd like that tree, but I think xxxx was done wrong".

Might something like this make people happier? (I wrote "happy" rather than "happier" at first, but let's face it, people are better at whining than they are at being happy ;)

Greg KH liked the challenge, and volunteered to be the 'sucker' Linus described. Theodore Ts'o said, "Linus's plan makes a lot of sense, as a scalable way of maintaining a 2.6.x.y release strategy." Chris Wright also volunteered to join Greg in maintaining the 'sucker' tree. Chris asked what would determine a new w.x.y.z release, and Linus replied, "Th ewhole point of this tree is that there shouldn't be anything questionable in it. All the patches are independent, and they are all trivial and small. Which is not to say there couldn't be regressions even from trivial and small patches, and yes, there will be an outcry when there is, but we're talking minimizing the risk, not making it impossible." Elsewhere, Linus said:

We're not aiming for "perfect". Just _trying_ to be perfect is what would kill the whole scheme in the first place. We'd be aiming for "known rules".

Whether people _agree_ with those rules is then actually not a huge issue. There will _always_ be things that people don't agree with. Aiming for consistency is worthwhile in itself.

(Of course, the rules _do_ matter in the sense that there has to be some point to the consistency. You can have a consistent rule that "the ChangeLog entries must rhyme", and I think it's a great rule, and I encourage anybody who wants to to set up such a "rhyming kernel tree", but that doesn't mean that it makes a lot of difference to people ;).

So havign strict rules that allow _one_ kind of consistency that people agree is good is a fine idea.

And Adrian, you can always have a different tree that has another set of rules - and if you use BK you can merge the two and have the "combination of the rules" tree. The reason I would _stronly_ urge very tight rules is that if they aren't tight, it ends up having all the problems we've always seen in other trees.

For example, if the "tight rules tree" allowed reverting an otheriwse good patch because it had a bug (instead of trying to fix the bug), then I would never be able to pull that tree into mine. It would take development _backwards_, and thus it might be sensible for a vendor, but it would automatically mean that it's not a good base for the next kernel version.

And if I can't just say "ok, I'll always take the 'tight rules' tree", then we'd get into the forward-and-backward porting hell again, which would make the whole tree totally pointless. See my point?

The discussion apparently degenerated along the way, to the point where Linus remarked, "this discussion has degenerated into nothing but whining. Which is kind of expected, but let's hope that the only non-whining that came out of this (Greg & co's trials with 2.6.x.y) ends up being worthwhile."

Somewhere along the way, Alan Cox gave his prediction about the fate of the new w.x.y.z system:

Almost without exception maintainers will forget the backport (there are some notable exceptions). Almost without exception maintainers will not be aware that their backport fix clashes with another fix because that isn't their concern.

Linus will try and sneak stuff in that is security but not mentioned which has to be dug out (because the bad guys read the patches too).

And finally Linus throws the occasional gem into the backporting mix because he will (rightly) do the long term fix that rearranges a lot of code when the .x.y patch needs to be the ugly band aid.

So for example Linus will happily changed remap_vm_area to fix a security bug by changing the API entirely and making it do some other things. Or in the case of the exec bug he did a fix that defaulted any missed fixes to unsafe. Fine for upstream where the goal is cleanness, bad for .x.y because the arch people hadn't caught up and did have remaining holes.

You also have to review the dependancy tree for a backport and what was tested - so I skipped the NFS df fix as one example as it had never been tested standalone only on a pile of other NFS fixes.

Andrew remarked, "I think you're assuming that 2.6.x.y will have larger scope than is intended." Shortly thereafter Linus also said:

Alan, I think your problem is that you really think that the tree _I_ want is what _you_ want.

I look at this from a _layering_ standpoint. Not from a "stable tree" standpoint at all.

We're always had the "wild" kernels, and 90% of the time the point of the "wild" kernels has been to let people test out the experimental stuff, that's not always ready for merging. Like it or not, I've considered even the -ac kernel historically very much a "wild" thing, not a "bugfixes" thing.

What I'd like to set up is the reverse. The same way the "wild" kernels tend to layer on top of my standard kernel, I'd like to have a lower level, the "anti-wild" kernel. Something that is comprised of patches that _everybody_ can agree on, and that doesn't get anything else. AT ALL.

And that means that such a kernel would not get all patches that you'd want. That's fine. That was never the aim of it. The _only_ point of this kernel would be to have a baseline that nobody can disagree with.

In other words, it's not a "let's fix all serious bugs we can fix", but a "this is the least common denominator that is basically acceptable to everybody, regardless of what their objectives are".

So if you want to fix a security issue, and the fix is too big or invasive or ugly for the "least common denominator" thing, then it simply does not _go_ into that kernel. At that point, it goes into an -ac kernel, or into my kernel, or into a vendor kernel. See?

The point is not to make a perfect kernel. Two reasons:

So as long as you see this sucker-tree as a _replacement_ for the -ac kernels, you will _never_ be happy. But my whole point is that it's not a replacement at all. It's a starting point for others. It's something that should be fairly easy to set up, and exactly _because_ the aim is to be inoffensive, it's somethign where people can basically rely on the fact that they don't have to think about things that go into the sucker-tree: we'll set it up so that it's an acceptable base-line for everybody.

Is it an acceptable _solution_ for everybody? No. It's not even aiming to be. It's aiming to be a "let's get at 45% of the way, with 5% of the effort". And the way we make the effort low is by having the hard rules, and having the "single vote throws it out" approach. That's also what limits the tree, but hey, that's ok.

So how do you get to your solution? You could have a "slightly more wild" tree that takes the "other" patches. That "slightly more wild" tree would be for somebody _else_ to maintain (ie it might be you), and that would be the fixes that aren't acceptable to everybody.

In other words: I'm talking about scalability of development, not about fixing every single serious bug. I think this one will catch the embarrassing brown-paper-bag kinds of things, and maybe 90% of the "duh, we had this race forever, but we never even realized", but it wouldn't solve the ones where we had "damn, we did the locking wrong".

But let's face it - _most_ security bugs are of the "duh" variety. That's easy to overlook, because those are the ones you don't worry about, but the fact is, if we can get a tree that makes it possible for most people to just get those fixes without thinking about it, then that's a _good_ thing.

So think of it as a piece in the puzzle, not the whole picture.

He went on:

Btw, I also think that this means that the sucker-tree should never aim to be a "2.6.x.y" kind of release tree. If we do a "2.6.x.y" release, the sucker tree would be _included_ in that release (and it may indeed be all of it - most of the time it probably would be), but we should not assume that "2.6.x.y" _has_ to be just the sucker tree.

We might want to release a "2.6.x.y" that contains a patch that is too big or too intrusive (or otherwise controversial) to really be valid in the sucker-tree.

And I'd want that to be very much explicit in the "charter" for the sucker-tree. Exactly because the whole point (to me, at least) is to make it _easy_ to maintain. There should never be any discussion at all about patches: either they are universally loved, or they are not. And if the sucker-tree is seen as a 2.6.x.y release tree, then that will _inevitably_ mean that people will start discussing whether one patch or the other is supposed to go in.

My personal gut feeling is that 90% of the patches I _ever_ see are "obvious". If we also cut them down to "must fix an oops/hang/roothole", I think we'll actually get quite far with a sucker tree. We'll never get all the way, but exactly because the tree wouldn't _try_ to get all the way, it would be a lot easier to maintain.

And let's face it, just getting 50% of the way and having somethign that catches the brown-paper-bag stuff so that nobody else every needs to worry about them is really worthwhile.

Somewhere in the midst of all this, Greg released 2.6.11.1, in a thread already covered in Issue #302, Section #10  (4 Mar 2005: Linux 2.6.11.1 Released; Some Discussion Of Protocol) . Linus looked this over, and said, "I'm not at all unhappy with your 2.6.11.1 - I just think that there might be more automation involved in the long run. But automation takes time to build up and learn, and in the meantime doing it by hand and learning early is definitely the right thing to do. Maybe you doing it by hand just makes it clear that I was wrong about the need for some strict rules that are automatically enforced in the first place." Jeff also remarked, "So far, 2.6.11.1 was what I was hoping, and expecting, it would be."

3. Linux 2.6.11-mm1 Released

4 Mar 2005 - 10 Mar 2005 (23 posts) Archive Link: "2.6.11-mm1"

Topics: Kernel Release Announcement, Version Control

People: Andrew MortonDavid Woodhouse

Andrew Morton announced Linux 2.6.11-mm1, saying:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm1/

4. New Open-iSCSI High-Performance iSCSI Initiator Project

6 Mar 2005 - 12 Mar 2005 (15 posts) Archive Link: "[ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux"

Topics: Disks: SCSI, FS: sysfs, Ioctls, Version Control

People: Alex AizmanMatt MackallChristoph Hellwig

Alex Aizman (speaking for himself and Dmitry Yusupov) said:

This is to announce Open-iSCSI project: High-Performance iSCSI Initiator for Linux.

MOTIVATION

Our initial motivations for the project were: (1) implement the right user/kernel split, and (2) design iSCSI data path for performance. Recently we added (3): get accepted into the mainline kernel.

As far as user/kernel, the existing iSCSI initiators bloat the kernel with ever-growing control plane code, including but not limited to: iSCSI discovery, Login (Authentication and Operational), session and connection management, connection-level error processing, iSCSI Text, Nop-Out/In, Async Message, iSNS, SLP, Radius... Open-iSCSI puts the entire control plane in the user space. This control plane talks to the data plane via well defined interface over the netlink transport.

(Side note: prior to closing on the netlink we considered: sysfs, ioctl, and syscall. Because the entire control plane logic resides in the user space, we needed a real bi-directional transport that could support asynchronous API to transfer iSCSI control PDUs: Login, Logout, Nop-in, Nop-Out, Text, Async Message.

Performance.
This is the major goal and motivation for this project. As it happens, iSCSI has to compete with Fibre Channel, which is a more entrenched technology in the storage space. In addition, the "soft" iSCSI implementation have to show good results in presence of specialized hardware offloads.

Our today's performance numbers are:

Prior to starting from-scratch the data path code we did evaluate the sfnet Initiator. And eventually decided against patching it. Instead, we reused its Discovery, Login, etc. control plane code. Technically, it was the shortest way to achieve the (1) and (2) goals stated above. We believe that it remains the easiest and the most practical thing on the larger scale of: iSCSI for Linux.

STATUS

There's a 100% working code that interoperates with all (count=5) iSCSI targets we could get our hands on.

The software was tested on AMD Opteron (TM) and Intel Xeon (TM).

Code is available online via either Subversion source control database or the latest development release (i.e., the tarball containing Open-iSCSI sources, including user space, that will build and run on kernels starting 2.6.10).

http://www.open-iscsi.org

Features:

TODO

The near term plan is: test, test, and test. We need to stabilize the existing code, after 5 months of development this seems to be the right thing to do.

Other short-term plans include:

a) process community feedback, implement comments and apply patches;

b) cleanup user side of the iSCSI open interface; use API calls (instead of directly constructing events);

c) eliminate runtime control path memory allocations (for Nop-In, Nop-Out, etc.);

d) implement Write path optimizations (delayed because of the self-imposed submission deadline);

e) oProfile the data path, use the reports for further optimization;

f) complete the readme.

Comments, code reviews, patches - are greatly appreciated!

THANKS

Special thanks to our first reviewers: Christoph Hellwig and Mike Christie.

Special thanks to Ming Zhang for help in testing and for insightful questions.

Matt Mackall asked about the size of the code, and Alex replied, "there's about 12,000 lines of user space code, and growing. In the kernel we have approx. 3,300 lines."

5. Sticky Background Image With Framebuffer

7 Mar 2005 - 15 Mar 2005 (26 posts) Archive Link: "[announce 0/7] fbsplash - The Framebuffer Splash"

Topics: Bootsplash, Compression, Framebuffer

People: Michal JanuszewskiPavel MachekJames Simmons

Michal Januszewski said:

Fbsplash - The Framebuffer Splash - is a feature that allows displaying images in the background of consoles that use fbcon. The project is partially descended from bootsplash.

Unlike bootsplash, fbsplash has no in-kernel image decoder. Picture decompression is handled by a userspace helper which provides raw image data to the kernel. There is also no support for things like the silent mode and progress bars, as these are best handled by userspace programs.

Truecolor, directcolor and pseudocolor modes are supported. Fbsplash has no dependency on a specific framebuffer driver. It has been tested with at least vesafb, rivafb and radeonfb.

Technical details about the userspace<->kernelspace interface can be found in patch 07/07, which contains the documentation.

The userspace utilities that make use of fbsplash can be found on: http://dev.gentoo.org/~spock/projects/splashutils/

James Simmons saw no point to this, and found it pure eye-candy with no merit. But Pavel Machek remarked, "At least some Debians, Gentoo and SUSE each use some variant of this eye candy; each one with different bugs. It would be nice to at least do the splash right (so that it does not require vesafb and therefore allows working with suspend-to-RAM)."

6. Realtime LSM And rlimits

7 Mar 2005 - 10 Mar 2005 (36 posts) Archive Link: "Re: [PATCH] [request for inclusion] Realtime LSM"

Topics: Real-Time, Security

People: Christoph HellwigAndrew MortonMatt MackallLee RevellPaul DavisJames MorrisUtz LehmannJack O'QuinPavel MachekChris WrightIngo Molnar

Andrew Morton pointed out that the real-time LSM (Linux Security Module) had been floating around for awhile, its proponents clamouring for inclusion in the main tree; and he asked if this would be OK, or if there were still objections from other quarters. Christoph Hellwig spoke up, saying, "It's still a really bad idea. You let the magic gid for oracle hugetlb patch go in with that reasonsing, now we have relatime-lsm, next we $CAPABILITY for $FOO and we're headed straight to interface-hell." Andrew said that maybe 'interface Hell' was what we deserved if no one could come up with a better alternative to the patch. He said, "It solves a real problem and is well encapsulated. The world won't end if we merge it." And he invited folks to give him something better if they could.

Ingo Molnar asked Andrew to describe the 'real problem' in simple terms, and Andrew said that audio applications needed to run in real-time, without having to be root in order to use !SCHED_OTHER and mlockall capabilities. Matt Mackall added that this argument also applied to "video, data acquisition, motion control, CD burning, etc.." Christoph replied:

Which all fits very nicely with MEMLOCK rlimit and a tiny wrapper that sets !SCHED_OTHER and execs the audio app..

and as I mentioned a few times if we really want to go for a magic uid/gid-based approach we should at least have one that's useable for all capabilities so it can replace the oracle hack aswell. But the proponents of the patch weren't iterested to invest the tiniest bit of work over what they submited.

Lee Revell replied:

as I mentioned a few times, the authors have neither the inclination nor the ability to do that, because they are not kernel hackers. The realtime LSM was written by users (not developers) of the kernel, to solve a specific real world problem. No one ever claimed it was the correct solution from the kernel POV.

I know Jack disagrees but I for one am glad to see the max-RT-prio rlimit patch going in. This probably reflects my sysadmin background, PAM does not scare me at all. Anyway it solves the same problem and will be invisible to any user with a reasonable distro. If musicians end up having to tweak the PAM configuration, then I would say the distro has failed miserably.

And Paul Davis said, "i would just like to add that its very disappointing that the LSM, having been included in the kernel (apparently very much against Christoph's and others' advice) turns out to be so useless. from outside lkml, LSM appeared to be a mechanism to allow non-kernel-developers to create new security policies (perhaps even mechanisms) without trying to tackle the entire kernel. instead, we are now getting a fix which, while it solves the same problem, has required substantive analysis of its effect on the overall kernel, and will require continued vigilance to ensure that it doesn't now or later cause unintended side effects. LSM appeared to be the "right" way to do this in terms of modularity - it is disappointing to find it has so little support (close to zero to judge from this debate) on LKML despite being present in the kernel." Andrew added:

That, plus the fact that inherited capabilities could also be used here, except they don't work right. That's a nice, simple and long-standing kernel feature which I think we should have fixed up before piling in more security features.

But I've said that often enough. If nobody has a sufficient need for fixed-up-caps to actually put work into it, nothing happens. And it's a lot of work, because this is a scary feature.

Christoph continued to rail against the LSM authors for not putting in more work, and Lee replied, "Consider it a proof of concept. I'm satisfied if any solution gets merged, it doesn't have to be this one. I am still confused about why the LSM framework was merged in the first place." James Morris said:

The purpose of LSM is to allow different security models to be implemented. IMHO, a security model here meaning a complete or otherwise significantly enhancing system-wide framework, such as SELinux.

I don't think LSM is a suitable framework for upstream merging of trivial or experimental access control enhancements. They should either be made part of the core kernel under LSM control or incorporated directly into an existing LSM.

One of the reasons I would put forward for this is that it can be dangerous to allow the user to arbitrarily compose security modules.

Also, from an architectural point of view, it's better to think about security models at a high level with broadly defined components (e.g. "DAC" and "MAC"), not as a collection of miscellaneous features.

In the case of this code, I would suggest integrating it into the core kernel, and providing an LSM hook to allow other LSMs to mediate it.

As an example, see the vm_enough_memory hook.

Completely elsewhere, Matt said in response to Andrew's initial inquiry, "I think Chris Wright's last rlimit patch is more sensible and ready to go. And I think I may have even convinced Ingo on this point before the conversation died last time around. So here's that patch again, updated to 2.6.11. Compiles cleanly. Chris, please add a signed-off-by." He included a patch to "Add a pair of rlimits for allowing non-root tasks to raise nice and rt priorities. Defaults to traditional behavior." Ingo confirmed that he supported this approach, and Chris offered a couple of technical suggestions. Utz Lehmann also said this solution would be specifically useful to hime, as "With it i can allow users to renice their previously niced jobs (eg. from 19 to 0). At the moment they need to call me and i do this as root." Andrew was perfectly happy to take Matt's (and Chris's) patch, especially because as he said, "I like rlimits - very straightforward, although somewhat awkward to use from userspace due to shortsighted shell design." He asked if anyone had any serious objections to make. Jack O'Quin spoke up:

  1. is likely to introduce multiuser system security holes like the one created recently when the mlock() rlimits bug was fixed (DoS attacks)
  2. requires updates to all the shells
  3. forces Windows and Mac musicians to learn and understand PAM
  4. is undocumented and has never been tested in any real music studios

Pavel Machek said in response to Jack's first point, that the default would still be unchanged; to the requirement to update all shells, Pavel said no, the feature could be set during login. To the requirement that Windows and Mac users would have to learn PAM, Pavel said, "While you force them to mess with security modules. I'd say thats an improvement. And "understanding PAM" in this case means updating two files, adding one line to each." And to the point about missing documentation and testing, Pavel replied curtly, "So write the docs and test it." Jack staunchly opposed the change, pointing out that any testing done by list members would of necessity be little more than 'toy' testing. He said, "The RT-LSM has been used for over a year by hundreds (probably thousands) of musicians in studios making real music. That's what I mean by "real music studios". We won't be able to do that kind of testing for the rlimits solution until next year." However, he stood alone, and the discussion petered out, with Andrew still favoring Matt's and Chris's patch.

7. Linux 2.6.11-mm2 Released

8 Mar 2005 - 14 Mar 2005 (29 posts) Archive Link: "2.6.11-mm2"

Topics: Framebuffer, Kernel Release Announcement, User-Mode Linux

People: Andrew MortonChristoph Hellwig

Andrew Morton announced Linux 2.6.11-mm2, saying:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm2/

Christoph Hellwig implored Andrew not to include the iscsi driver, as "It's fairly experimental and just one of three iscsi initiators we're (scsi folks) currently evaluating for inclusion." Andrew mollified him with, "I'll frequently add things like this just so they get additional compile-coverage testing and to get wider reviewing. And someone might run sparse, checkstack, reference_discarded or reference_init on it."

8. No-Exec Support For PPC64

8 Mar 2005 - 16 Mar 2005 (19 posts) Archive Link: "[PATCH 0/2] No-exec support for ppc64"

Topics: Version Control

People: Jake MoilanenAnton BlanchardPaul MackerrasBenjamin Herrenschmidt

Jake Moilanen said:

These patches add no execute support to PPC64. They prohibit executing code on the stack, or most any non-text segment for both user space, and kernel.

No execute is supported on Power4 processors and up. These processors support pages that have a no-execute permission bit.

The patches include a base fixup from Anton Blanchard. This includes a fix for the wrong bit being used for no-exec and for read/write on the hardware PTEs.

For distros that compile w/ pt_gnu_stacks, they depend on Ben Herrenschmidt's vDSO patches for signal trampoline. Without it, the application will hang on the first signal due to the return code being put on the signal context stack to return to the kernel on the completion of the signal handler. The changes should be in the latest BK tree.

Paul Mackerras, Benjamin Herrenschmidt, and Olof Johansson offered criticisms of the patch, and Jake released two additional versions in response to their suggestions.

9. Guidelines for the '-stable' w.x.y.z Tree

8 Mar 2005 - 11 Mar 2005 (35 posts) Archive Link: "[RFC] -stable, how it's going to work."

People: Greg KHLee RevellLinus TorvaldsAlan CoxAndi KleenMarcelo TosattiChris WrightNeil BrownArjan van de Ven

Regarding the new w.x.y.z versioning system, Greg KH said:

So here's a first cut at how this 2.6 -stable release process is going to work that Chris and I have come up with. Does anyone have any problems/issues/questions with this?

Everything you ever wanted to know about Linux 2.6 -stable releases.

Rules on what kind of patches are accepted, and what ones are not, into the "-stable" tree:

Procedure for submitting patches to the -stable tree:

Review cycle:

Review committee:

Neil Brown suggested that another rule for each patch could be that it had to fix a regression, i.e. something that had just broken in the most recent official release. Greg replied:

That, and a zillion other specific wordings that people suggested fall under the:
or some "oh, that's not good" issue
rule.

I didn't feel like being all lawyer-like and explicitly spelling out all of the different kinds of bugs that we would be accepting patches for :)

So yes, I don't have a problem with patches to fix regressions.

Lee Revell asked, "So just to be 100% clear, no sound with 2.6.N where the sound worked with 2.6.N-1 absolutely does qualify. Right?" And Linus Torvalds replied:

If you can send in a patch that fixes it in an obvious way and in less than 100 lines of context diff, hell yes.

Remember: all the other constraints still hold. Don't fall into the trap of believing that "if it fixes a regression, it's for -stable". It needs to be _obvious_, and it needs to be small enough that bugs are unlikely.

And that "small enough" is really important. Bugs do happen. Even in "obvious" patches. The whole _point_ of -stable is to try to make them less likely, and the strict constraints are very much a part of that.

Neil pointed out that in his original question, he hadn't meant that it should be OK to fix regressions, but that it be required that only regressions be fixed. But there was no further discussion on that point.

Andi Kleen had a suggestion of his own. He thought it should be required that everything going into the 'stable' w.x.y.z branch, should also go into the official tree. Arjan van de Ven absolutely agreed with this, while Alan Cox strongly disagreed. As Alan put it, "What if the mainline fix is a rewrite of the core API involved. Some times you need to put in the short term fix. What must never happen is people accepting that fix as long term." He suggested as an alternative, "It must be accepted to mainline, or the accepted mainline patch be deemed too complex or risky to backport and thus a simple obvious alternative fix applied to stable ONLY." Chris Wright and Greg both liked this, and Andi said this was what he'd really meant anyway.

But Andi also had another issue with one of Greg's items. He didn't like the idea that security patches would go through the kernel security team, instead of the normal review cycle. He said, "How come the security team has more competence to review patches than the subsystem maintainers? I can see the point of overruling maintainers on security issues when they are not responsive, but if they are I think the should be still the main point of contact." Arjan also supported this idea, but Marcelo Tosatti pointed out, "The security team is going to work with the subsystem maintainers, not overrule them. That would be indeed insane." Chris confirmed this, adding that the "Point here is, sometimes there's disclosure coordination happening as well." Andi was still not satisfied, and felt the rules were not clear enough on this point; but after some back-and-forth, Greg said, "let's stop arguing about the semantics of the rules, and see if what we have proposed actually works in real-life. If that doesn't work out, we can revisit it then."

10. Reviewing Patches For 2.6.11.3

10 Mar 2005 - 11 Mar 2005 (20 posts) Archive Link: "[00/11] -stable review"

Topics: I2C

People: Greg KHJean DelvareJosh Boyer

Greg KH said:

This is the start of the stable review cycle for the 2.6.11.3 release. There are 11 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let us know. If anyone is a maintainer of the proper subsystem, and wants to add a signed-off-by: line to the patch, please respond with it.

These patches are sent out with a number of different people on the Bcc: line. If you wish to be a reviewer, please email stable@kernel.org to add your name to the list. If you want to be off the reviewer list, also email us.

Responses should be made by Sat, March 12, 23:00 UTC. Anything received after that time, might be too late.

thanks,

the -stable release team (i.e. the ones wearing the joker hat in the corner...)

He responded to himself with a patch under consideration from Jean Delvare. As Jean's changelog entry said, it was "a rewrite of the saa7110_write_block function, which was plain broken in the case where the underlying adapter supports I2C_FUNC_I2C. It also includes related fixes which ensure that different parts of the driver agree on the number of registers the chip has." Josh Boyer pointed out that parts of the patch were mere whitespace cleanup, and added, "Not that I really care, but isn't there a rule that a patch "... can not contain any "trivial" fixes in it (spelling changes, whitespace cleanups, etc.)"?" Greg agreed with this, and asked Jean to fix the patch up. Jean did so.

Greg posted a number of other patches for consideration, but there was no discussion about them. All were very small.

11. Microstate Accounting For 2.6.11

10 Mar 2005 - 11 Mar 2005 (5 posts) Archive Link: "Microstate Accounting for 2.6.11"

Topics: SMP, Version Control

People: Peter ChubbAndi KleenAndrew Morton

Peter Chubb said:

Timing data on threads at present is pretty crude: when the timer interrupt occurs, a tick is added to either system time or user time for the currently running thread. Thus in an unpacthed kernel one can distinguish three timed states: On-cpu in userspace, on-cpu in system space, and not running.

The actual number of states is much larger. A thread can be on a runqueue or the expired queue (i.e., ready to run but not running), sleeping on a semaphore or on a futex, having its time stolen to service an interrupt, etc., etc.

This patch adds timers per-state to each struct task_struct, so that time in all these states can be tracked. This patch contains the core code do the timing, and to initialise the timers. Subsequent patches enable the code (by adding Kconfig options) and add hooks to track state changes.

Andrew Morton asked why the kernel needed this feature, and Peter replied, "I find that it's useful when trying to work out why a thread is going more slowly than it needs to. Userspace tools in the CVS repository at gelato.unsw.edu.au let you graph in real time the time spent in each state, so you get graphs like this: http://gelato.unsw.edu.au/patches/snapshot.png which shows mplay skipping because of a slow disk/filesystem." Andrew also asked what the overhead would be, and Peter replied, "Around 5% on LMbench context switch numbers for uniprocessor, negligeable on SMP (but SMP context switch results are horrible at the moment according to LMbench2 -- almost 16usec); select on 10 fd goes from 1.665 usec to 1.701" . Andi Kleen chimed in with his own impressions, saying, "It does RDTSC and lots of complicated stuff twice for each system call. On P4 this will be extremly slow (> 1000cycles combined) It is pretty unlikely that whatever it does justifies this extreme overhead in a critical fast path." Peter replied:

Not really `lots of complicated stuff'. Just swap a timer and set a flag on entry:

msp->timers[msp->laststate] += now - msp->lastchange
msp->lastchange = now
msp->laststate = ONCPU_SYS
msp->cflags |= MSA_SYS

And swap timers and clear the flag on exit. The flag's needed to force return to ONCPU_SYS rather than ONCPU_USR if the task preempted or interrupted while in a system call.

If there's a simpler, cheaper, faster way to track time spent in system calls (as opposed to time spent in interrupt handlers, or on the run queue) thn I'd like to know what it is.

And I recognise there're are lots of people who don't want this --- but there are some who do. I've maintained this patch since mid 2003, and have seen a steady trickle of downloads --- one or two a week.

12. NVidia Licensing Issue

12 Mar 2005 - 15 Mar 2005 (7 posts) Archive Link: "nvidia fb licensing issue."

Topics: Framebuffer, Version Control

People: Dave JonesAndrew MortonArjan van de VenJon Smirl

Dave Jones pointed out:

The nvidia framebuffer code added recently is marked as MODULE_LICENSE(GPL), but some things seem a little odd to me..

  1. The boilerplate at the top of drivers/video/nvidia/nv_dma.h, drivers/video/nvidia/nv_local.h, and drivers/video/nvidia/nv_hw.c doesn't seem to be a GPL-compatible license. It seems to be an nvidia specific license with an advertising clause, and something that adds restrictions on rights of U.S. Govt end users.
  2. Some of these files clearly came from XFree86 judging from the CVS idents in the source. Was this XFree86 code dual-licensed by its original authors? If so, it isn't clear.

Andrew Morton asked, "Does ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/ 2.6.11/2.6.11-mm3/broken-out/fbdev-nvidia-licensing-clarification.patch (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11/2.6.11-mm3/broken-out/fbdev-nvidia-licensing-clarification.patch) clear things up?" Arjan van de Ven replied, "somewhat; it would even make sense to consider dual licensing that thing (like most other not-originally-gpl code in the kernel) to clarify the legal status for real. Otherwise if you merge it with GPL it sort of becomes GPL only.. (due to the freedom of MIT and the viral nature of GPL) and I suspect the intention of the author was to keep allowing MIT use..." Jon Smirl pointed out, "All of the files in drivers/char/drm really should have an explicit dual MIT/GPL license on them too. The DRM project has been taking patches back into DRM from LKML without making it clear that DRM is MIT licensed. It might be construed that doing this has made DRM GPL without that being the intention." Arjan replied, "without explicit dual licensing this is a trap yeah... it's far far nicer to just make it explicit that it's dual licensed and that you expect all patches are also dual licensed unless they also remove one of the licenses (several dual licensed parts of the kernel have such language if you're looking for example text). Otherwise its very much an unclear situation and with licenses it's just better to be very explicit and clear."

13. Linux 2.6.11.3 Released

12 Mar 2005 - 13 Mar 2005 (4 posts) Archive Link: "Linux 2.6.11.3"

Topics: Version Control

People: Greg KHMatthias Andree

Greg KH announced Linux 2.6.11.3, saying:

As there were no complaints about the patches posted a few days ago, I've released 2.6.11.3 with them in it.

It's available now in the normal kernel.org places:

kernel.org/pub/linux/kernel/v2.6/patch-2.6.11.3.gz (http://www.kernel.org/pub/linux/kernel/v2.6/patch-2.6.11.3.gz)

which is a patch against the 2.6.11 release (note, this is different than before, and should fix all of the previous complaints.)

I've also rediffed the 2.6.11.2 patch against the 2.6.11 release, instead of the 2.6.11.1 release, and updated it. There are incremental patches between the 2.6.11.y releases at:

kernel.org/pub/linux/kernel/v2.6/incr (http://www.kernel.org/pub/linux/kernel/v2.6/incr)

If anyone has any issues with the way the patches are diffed, please let me know.

A detailed changelog can be found at:

kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.11.3 (http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.11.3)

A bitkeeper tree for the 2.6.11.y releases can be found at:

bk://linux-release.bkbits.net/linux-2.6.11

The diffstat and short summary of the fixes are below.

I'll also be replying to this message with a copy of the patch between 2.6.11.2 and 2.6.11.3, as it is small enough to do so.

Matthias Andree asked, "Do we then start switching trees with every new minor release?" And Greg said yes, that was the protocol.

14. Driver Model Class Code Revamp

15 Mar 2005 - 16 Mar 2005 (27 posts) Archive Link: "[RFC] Changes to the driver model class code."

Topics: Hot-Plugging, USB, Version Control

People: Greg KHDmitry TorokhovDominik Brodowski

Greg KH said:

There are 4 patches being posted here in response to this message that start us on the way toward cleaning up the driver model code so that it's actually usable by mere kernel developers :)

The main problem with the class code, is that _everyone_ gets it wrong when trying to use it (and that includes me.) So, because of that, the class_simple wrapper was written. So almost everyone used that. That pretty much proved that the class_simple interface was the proper type of interface for the main class code itself.

Because of that, Kay wrote a first cut at adding the class_simple type of interface to the class core (he posted it to lkml a month or so ago.) I've finally taken that code, tweaked it a bit (fixing a module ownership issue that sprang up due to the class core changes, and changed the locking model) and added it to my bk-driver tree. I've also taken his tty and input patches that convert those subsystems over to the new functions (it's pretty much a simple search and replace for existing class_simple users.)

Then I moved the USB host controller code to use this new interface. That was a bit more complex as it used the struct class and struct class_device code directly. As you can see by the patch, the result is pretty much identical, and actually a bit smaller in the end.

So I'll be slowly converting the kernel over to using this new interface, and when finished, I can get rid of the old class apis (or actually, just make them static) so that no one can implement them improperly again...

Dmitry Torokhov replied:

I disagree with this last step. What I liked about the driver model is that once you convert (properly) subsystem to using it you automatically get your proper refcounting and memory gets released at proper time. The change as it proposed disconnects class device instance from the meat so separate refcounting implementation is needed. This increases maintenance costs.

I always viewed class_simple as a stop-gap measure to get hotplug events in place until proper implementation is done. Please leave the original interface in place so it can still be used if one wshes to do so.

And what about device_driver and device structure? Are they going to be changed over to be separately allocated linked objects? If not then its enouther reason to keep original class interface - uniformity of driver model interface.

Greg replied that the problem with the existing interface was that it was so difficult to use, that kernel hackers themselves often had to try multiple times to get it right. The current inteface, he said, made it more difficult for kernel developers to write code, therefore it had to go. But later in the thread he did add, "I'm also not saying that I'm going to go off and delete those functions from the kernel today, or tomorrow. Just that we need to slowly, over time, make this easier to use, as it's too hard to do so today. I will not be removing any functionality, don't worry :)"

Dominik Brodowski and Dmitry continued to argue against these changes; and Greg continued to invite them to help him improve them in as painless a way as possible.

15. Linux 2.6.11.4 Released

15 Mar 2005 - 16 Mar 2005 (6 posts) Archive Link: "Linux 2.6.11.4"

People: Greg KH

Greg KH released Linux 2.6.11.4, saying:

I've release 2.6.11.4 with two security fixes in it. It can be found at the normal kernel.org places.

The diffstat and short summary of the fixes are below.

I'll also be replying to this message with a copy of the patch between 2.6.11.3 and 2.6.11.4, as it is small enough to do so.

16. Module Incompatibility Between w.x.y.z Releases

16 Mar 2005 (2 posts) Archive Link: "2.6.11.x, EXTRAVERSION and module compatibility"

People: Michael TokarevArjan van de Ven

Michael Tokarev said:

As far as I can see, the "super-stable" kernel releases should not affect module ABI in any way, that is, a module compiled for 2.6.11 or 2.6.11.2 should work with 2.6.11.4 and vise versa. Ofcourse I'm talking about modules which are out of the main kernel tree.

But. EXTRAVERSION gets changed with every 2.6.11.x release, thus making out-of-tree modules incompatible just because they contain different kernel version tag.

The question is obvious: Is this a correct/intended behaviour? Maybe, just maybe, EXTRAVERSION should not be taken into account when desciding if a given module compiled for a given kernel?

Regarding the idea that the w.x.y.z releases should not affect the kernel's application binary interface (ABI), Arjan van de Ven replied, "that is an assumption that seems quite invalid to me in general at least." There was no further discussion.

17. Reviewing Patches For 2.6.11.5

16 Mar 2005 (15 posts) Archive Link: "[0/9] -stable review"

People: Chris Wright

Chris Wright said:

This is the start of the stable review cycle for the 2.6.11.5 release. There are 9 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let us know. If anyone is a maintainer of the proper subsystem, and wants to add a Signed-off-by: line to the patch, please respond with it.

If you wish to be added or removed from the reviewer list please email stable@kernel.org.

Responses should be made by Fri, March 18, 23:00 UTC. Anything received after that time, might be too late.

Chris posted his patches, but there was no real discussion.

 

 

 

 

 

 

Sharon And Joy
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.