Kernel Traffic #227 For 11 Aug 2003

By Zack Brown

If you like Kernel Traffic and want to send me a little money, click here:

Table Of Contents

Mailing List Stats For This Week

We looked at 2201 posts in 10781K.

There were 589 different contributors. 301 posted more than once. 214 posted last week too.

The top posters of the week were:

1. Progress Toward 2.4.22; Problems With BitKeeper Gateway; Mysterious Kernel Lockups

5 Jul 2003 - 2 Aug 2003 (63 posts) Archive Link: "Linux 2.4.22-pre3"

Topics: FS: devfs, Version Control

People: Ben CollinsLarry McVoyAdrian BunkMarcelo TosattiJim GiffordAndrea ArcangeliJeff Garzik

Marcelo Tosatti announced 2.4.22-pre3, and Ben Collins complained that the -pre version number was not shown in the sources. Neither -pre2 nor -pre3 had proper version tagging, he said, adding, "It just makes it easier when tracking down regressions to have known points of reference common across BK/CVS/SVN/tar+diff." Marcelo said the sources looked properly tagged to him, and Larry McVoy asked, "Hmm. Ben, look again in the CVS tree and make sure that the tags aren't there. Maybe the converter screwed up?" Ben replied, "Doesn't show up in linux-2.4/ChangeSet,v as a tag." Adrian Bunk also pointed out, "-pre2 and -pre3 are also missing at http://ftp.kernel.org/pub/linux/kernel/v2.4/testing/cset/." Jeff Garzik confirmed that whatever other problems there were, Marcelo had definitely tagged his own BitKeeper tree properly.

A couple days later, Larry said, "I think I've found the bug - it's in the code that collapses multiple changesets into one CVS checkin. It looks like we are picking up tags only if the tag was on the last changeset in the sequence instead of any changeset in the sequence. We're fixing it." Ben thanked him, and that was it for that subthread.

Elsewhere, Jim Gifford reported on an ongoing problem he'd been having with the 2.4 series: lockups, after about two days. The 2.4.22-pre3 did not fix his problem, he said, although he could avoid the crash by running 'sync' every hour. Marcelo and Alan asked for more information, and together they eliminated compiler versions and bad memory. Jim continued to reproduce the problem over the course of several new pre-releases. At one point Marcelo noticed that Jim's kernel was not based on pristine sources (Jim had added some netfilter and megaraid patches), and asked if Jim could reproduce the lockups with the official kernel. Yes, the lockups still persisted.

Andrea Arcangeli also got into the act, asking Jim to remove devfs and other modules that were not part of the main tree. Jim did this, and was unable to produce the lockup in -pre7. He asked if Marcelo wanted him to run other tests, but Marcelo replied, "I guess most of us is already convinced that the lockups were caused by the non-stock code." But Jim tried running the stock -pre6 kernel, and was able to reproduce the lockup. He said, "something in -pre7 seems to have fixed the problem." So he started adding the non-stock code back into -pre7, and later -pre8, piece by piece, to see if any of them would cause the problem. Eventually he felt he'd narrowed the problem down to some netfilter patches he'd applied; and Marc Heckmann, who also experienced similar lockups on his system, remarked that everyone who'd seen the problem had been using iptables.

The situation was quite complex, and there was no absolutely clear resolution on the list. As Jim put it toward the end of the discussion, "The wierd part is that people are having problems with different modules and it's hard to track down what is in common."

2. Scheduler Interactivity Improvements And Lingering Problems In 2.6-test

16 Jul 2003 - 25 Jul 2003 (44 posts) Archive Link: "[PATCH] O6int for interactivity"

People: Con KolivasDavide LibenziMike GalbraithValdis KletnieksMarc-Christian Petersen

Con Kolivas posted a patch to improve interactivity of the 2.5 (and 2.6-test) scheduler. As far as he could tell, it made a massive improvement; and he asked folks to test it thoroughly. A lot of folks did exactly that, and there was much praise of Con's patch. Most of the testers were motivated by a desire for skipless sound and video, so there were a lot of reports of xmms (and mplayer) behavior. Not everyone experienced an improvement, however. Under heavy load, Marc-Christian Petersen's box had bad interactivity with or without Con's patch. And Wiktor Wodecki also reported an initial improvement with the patch, degenerating more choppy behavior after a couple of hours. Con replied that he'd noticed that bug as well, and would post a patch soon to fix it.

Davide Libenzi took a close look at Con's initial patch, and gave some suggestions. He was worried that Con's patch focused overly much on solving special cases for multimedia applications, and that this would create the illusion of improvement, without actually making the system better in the general case. But Con reassured him, "Please don't assume I'm writing an xmms scheduler. I've done a lot more testing than xmms." This relieved some of Davide's concerns, but he had others as well. Chief among them was the idea that interactivity was not the most important aspect of a scheduler. Using a test program, he was able to starve other processes completely, which he said was not an interactivity problem but a more general scheduler problem. He added, "Guys, I'm saying this not because I do not appreciate the time Con is spending on it. I just hate to see time spent in the wrong priorities." He said that whatever special cases anyone created for multimedia applications, "it can be exploited w/out a global throttle for the CPU time assigned to interactive and non interactive tasks. This is Unix guys and it is used in multi-user environments, we cannot ship with a flaw like this."

The problem, as he said elsewhere, was that of "uncontrolled unfairness" being applied to user processes. As long as that situation existed, it would be possible for certain user processes to be starved into oblivion. Valdis Kletnieks asked if controlled unfairness would be the way to go, and Davide replied, "I'm sorry to say that guys, but I'm afraid it's what we have to do. We did not think about it when this scheduler was dropped inside 2.5 sadly. The interactivity concept is based on the fact that a particular class of tasks characterized by certain sleep->burn patterns are never expired and eventually, only oscillate between two (pretty high) priorities. Without applying a global CPU throttle for interactive tasks, you can create a small set of processes (like irman does) that hit the coded sleep->burn pattern and that make everything is running with priority lower than the lower of the two of the oscillation range, to almost completely starve. Controlled unfairness would mean throttling the CPU time we reserve to interactive tasks so that we always reserve a minimum time to non interactive processes."

Mike Galbraith felt that there had to be some way to solve the starvation problem without a throttle, although he couldn't think of the right method at that time. Davide said, "Everything that will make the scheduler to say "ok, I gave enough time to interactive tasks, now I'm really going to spin one from the masses" will work. Having a clean solution would not be an option here." He added later, "the problem is not only the expired tasks starvation. Anything in the active array that reside underneath the lower priority value of the range irman2 tasks oscillate inbetween, will experience a "CPU time eclipse". And you do not even need a smoked glass to look at it :)"

3. Status Of Module Code

24 Jul 2003 - 31 Jul 2003 (34 posts) Archive Link: "[PATCH] Remove module reference counting."

People: Rusty RussellRahul KarnikLinus TorvaldsAlan CoxDavid S. MillerInaky Perez-GonzalezGreg KH

Rusty Russell explained:

When the initial module patch was submitted, it made modules start isolated, so they would not be accessible until (if) initialization had succeeded. This broke partition scanning, and was immediately reverted, leaving us with a module reference count scheme identical to the previous one (just a faster implementation): we still have cases where modules can be access on failed load.

Then Dave decided that the work of reference counting network driver modules everywhere is too invasive, so network driver modules now have zero reference counts always. The idea is that if you don't want the module removed, don't do it. ie. only remove the module if there's a bug, or you want to replace it.

If module removal is to be a rare and unusual event, it doesn't seem so sensible to go to great lengths in the code to handle just that case. In fact, it's easier to leave the module memory in place, and not have the concept of parts of the kernel text (and some types of kernel data) vanishing.

He posted a lengthy patch to accomplish this, and there was a lot of feedback. David S. Miller said he was fine with it, and offered some technical criticism. Greg KH also had some technical criticism and questions, but was also fundamentally in favor of it. But elsewhere, Rahul Karnik objected, "Module removal is *not* a rare event. One common case it is used is on laptops during suspend. A lot of drivers do not do proper PM and so must be unloaded before suspend and relaoaded after resume." For these cases, he said, it was important to be able to unload and reload modules. But Rusty replied, "that cuts both ways: noone fixes these broken drivers, but work around them using module removal, leaving newbies with broken laptops 8(" . Inaky Perez-Gonzalez pointed out that laptop suspend was not the only time where module unloading would be useful. During driver development, it would be much more time consuming, he said, to have to reboot in order to test each driver modification. The ability to unload and reload different versions would be much quicker, he concluded.

Elsewhere, Linus Torvalds put the brakes on the whole thing, saying, "First off - we're not changing fundamental module stuff any more." And Rusty replied:

OK. Who are you and what have you done with the real Linus?

I guess it's back to fixing up reference counting in the rest of the kernel. It's not hard, it's just not done. 8(

Linus replied, "Well, it's _never_ been done, so saying "we have to fix it for 2.6.x" is obviously not true. It's one of those "nobody ends up really caring" issues, since only root can unload anyway." And close by, Alan Cox suggested to Rusty, "If you want to be really paranoid add a MODULE_UNLOADABLE that people can add to their modules that do unload safely" .

4. Developers Worry About The SCO Lawsuit And Plan For The Worst

25 Jul 2003 - 31 Jul 2003 (11 posts) Subject: "2.4 -> 2.2 differences?"

Topics: Big Memory Support, Networking, SMP

People: Mike FedykJ.W. SchultzBernd EckenfelsAlan CoxRobert L. HarrisJoe Pranevich

Robert L. Harris wanted to prepare for the worst. If SCO won its lawsuit, and parts of Linux from version 2.4 and forward did violate their copyright, he wanted to know what would be involved in rolling back to 2.2 and continuing development from there. Mike Fedyk replied:

No highmem support,

No journaled filesystems.

No netfilter. Fewer networking features. Period. (ethernet bridging, etc)

Slower SMP

I don't know if it's psycological, but whenever I booted 2.2 on my desktop, it felt slower.

J.W. Schultz also suggested, "You could start with Joe Pranevich's "Wonderful World of Linux 2.4" at http://linuxtoday.com/news_story.php3?ltsn=1999-10-03-001-05-NW-LF. There have been a number of improvements and features added since but any 2.2 -> 2.4 features summary should indicate much of what you would loose in a 2.4 -> 2.2 transition."

Elsewhere, Bernd Eckenfels also asked, "BTW: what will happen if there is some SMP code from IBM in the kernel which is owned by SCO? Isnt it a matter of days to remove that code? Does anybody have to pay for past usage of the code?" Alan Cox replied, "The core 2.2 SMP code is stuff I wrote. Caldera (aka SCO) even provided me the hardware and asked me to do it. The later table parser code is from Intel." Robert remarked, "Too bad you don't have anything they gave you or which they took back from you that could be used against them." And Alan replied, "Its a matter of archived public record since long ago." And Robert said, "Yeah but likely nothing along the line of correspondence of them offering you the stuff if you give them a copy of your work on SMP which would happen to be GPL'd...."

5. Status Of Serial ATA In 2.4

25 Jul 2003 - 1 Aug 2003 (3 posts) Archive Link: "SATA (Serial ATA) support in 2.4.x?"

Topics: Disks: IDE, Serial ATA

People: Erik StefflJeff Garzik

Erik Steffl asked about the status of SATA (Serial ATA) support in Linux 2.4; he said, "it seems like the first kernel that supports SATA is 2.4.21-ac4. I found few messages on lkml but not much more info about status of development." Specifically, he wondered if SATA would support disks larger than 137G. Jeff Garzik said SATA did supposedly support disks that big, but that he still had to do more testing. Several days layer Erik asked how the testing was going, but there was no reply.

6. Real-World FAT Improvement Preferred Over Abstract Elegance

27 Jul 2003 - 1 Aug 2003 (10 posts) Archive Link: "[PATCH] Inline vfat_strnicmp()"

Topics: FS: FAT

People: Denis VlasenkoHirofumi OgawaAndrew Morton

A constructive stylistic debate. Someone posted a patch to inline a function, resulting in a slight decrease in the size of the compiled vfat.o binary. Hirofumi Ogawa (The new FAT maintainer as of Issue #223, Section #7  (4 Jul 2003: FAT Filesystem Maintainership) ) said this was fine and that he'd push the patch on to Linus, but Denis Vlasenko objected, "Come on, automatically inlining static functions with just one callsite is a compiler's job. Don't do it." Hirofumi pointed out that gcc version 3.2.3 20030415 (the Debian prerelease) didn't do it, and said that the patch saved 48 bytes. Denis said a future version of the compiler would take care of it, adding, "Since there is no substantial wins in hunting down such statics, and there is some risk of code bloat when big inlined statics get called from more that one callsite, and it will be automatically handled by smarter compiler someday, I think it makes perfect sense to avoid doing this." Hirofumi said there would be no problem undoing the patch in the future, once the compiler caught up. Denis admitted that the compiler might never successfully catch up, but he pointed out, "Andrew Morton kills extra large inlines, and you are creating them :(. That's not ok. Just leave those poor static functions alone until compiler will do them, all at once. There are lots of other stuff to do in the kernel source." But Hirofumi stood firm, saying the original patch resulted in a "real world" savings, and that unless Denis volunteered to fix the compiler, the patch would go in.

7. Linux 2.6.0-test2 Released

27 Jul 2003 - 31 Jul 2003 (32 posts) Archive Link: "Linux v2.6.0-test2"

Topics: Digital Video Broadcasting, Disks: IDE, Disks: SCSI, Forward Port, Networking, Power Management: ACPI, USB

People: Linus Torvalds

Linus Torvalds announced 2.6.0-test2 () , and said:

Lots of small updates and fixes all over the map (diffstat shows a flat profile, except for the DVB merge, the new wl3501 driver, and the new sound drivers from Alan).

An example: Alexander Atanasov fixed an UP APIC handling bug, which in turn explained the IDE problems with irq disabling, and allowed IDE to be fixed up. Yah!

Alan started doing forward-porting of 2.4.x driver updates, and Andrew is merging the fixes from his tree. And janitorial cleanups.

Various architectures are congealing: sparc64, ia64, alpha, m68k, ppc32, v850 and s390 all had updates.

Network (and network driver) fixes, ISDN slowly getting there, ACPI, DVB and USB updates.

And a number of people worked on (and fixed) SCSI queue handling issues with the anticipatory scheduler.

8. Finessing The NUMA Scheduler

28 Jul 2003 - 1 Aug 2003 (20 posts) Archive Link: "[patch] scheduler fix for 1cpu/node case"

Topics: Ottawa Linux Symposium, SMP

People: Erich FochtMartin J. BlighAndrew TheurerDavide LibenziAndi Kleen

Erich Focht made this proposal:

after talking to several people at OLS about the current NUMA scheduler the conclusion was:

  1. it sucks (for particular workloads),
  2. on x86_64 (embarassingly simple NUMA) it's useless, goto (1).

Fact is that the current separation of local and global balancing, where global balancing is done only in the timer interrupt at a fixed rate is way too unflexible. A CPU going idle inside a well balanced node will stay idle for a while even if there's a lot of work to do. Especially in the corner case of one CPU per node this is condemning that CPU to idleness for at least 5 ms. So x86_64 platforms (but not only those!) suffer and whish to switch off the NUMA scheduler while keeping NUMA memory management on.

The attached patch is a simple solution which

The timer interrupt based global rebalancing might appear to be a simple and good idea but it takes the scheduler a lot of flexibility. In the patch the global rebalancing is done after a certain number of failed attempts to locally balance. The number of attempts is proportional to the number of CPUs in the current node. For only 1 CPU in the current node the scheduler doesn't even try to balance locally, it wouldn't make sense anyway. Of course one could instead set IDLE_NODE_REBALANCE_TICK = IDLE_REBALANCE_TICK, but this is more ugly (IMHO) and only helps when all nodes have 1 CPU / node.

Martin J. Bligh liked Erich's plan and the patch, but offered some technical criticism. In particular, he remarked, "I really feel there's no point in a NUMA scheduler for the Hammer style architectures. A config option to turn it off would seem like a simpler way to go, unless people can see some advantage of the full NUMA code?" Erich replied, "But the Hammer is a NUMA architecture and a working NUMA scheduler should be flexible enough to deal with it. And: the corner case of 1 CPU per node is possible also on any other NUMA platform, when in some of the nodes (or even just one) only one CPU is configured in. Solving that problem automatically gives the Hammer what it needs." But Martin argued:

what we have now is a "multi-cpu-per-node-numa-scheduler" if you really want to say all that ;-)

The question is "does Hammer benefit from the additional complexity"? I'm guessing not ... if so, then yes, it's worth fixing. If not, it would seem better to just leave it at SMP for the scheduler stuff. Simpler, more shared code with common systems.

Erich felt that in the long term, an SMP solution would not be up to the task, and that for any real future improvements, NUMA support was the way to go.

Close by, Andrew Theurer had seemingly the opposite viewpoint from Martin. He thought all architectures should use a NUMA scheduler, where single-processor machines would simply default to the single-node case. This would allow them to remove all the ugly NUMA #ifdefs in the code. Erich agreed with this, though he thought it was an optimistic goal. But Andrew said, "at some point we have to. We cannot have two different schedulers. Non numa should have the exact same scheduling policy as a numa system with one node. I don't know if that's acceptable for 2.6, but I really want to go for that in 2.7."

Getting back to the plight of the Hammer, and other NUMA architectures with only a single CPU per node, Andrew wondered if there were really a problem with supporting that particular type of hardware. He said, "I would think, even if we have an idle cpu, sometimes a little delay on task migration (on NUMA) may not be a bad thing. If it is too long, can we just make the rebalance ticks arch specific?" But Erich replied:

The fact that global rebalances are done only in the timer interrupt is simply bad! It complicates rebalance_tick() and wastes the opportunity to get feedback from the failed local balance attempts.

If you want data supporting my assumptions: Ted Ts'o's talk at OLS shows the necessity to rebalance ASAP (even in try_to_wake_up). There are plenty of arguments towards this, starting with the steal delay parameter scans from the early days of multi-queue schedulers (Davide Libenzi), over my experiments with NUMA schedulers and the observation of Andi Kleen that on Opteron you better run from the wrong CPU than wait (if the tasks returns to the right cpu, all's fine anyway).

But Martin objected, "That's a drastic oversimplification. It may be better in some circumstances, on some benchmarks. For now, let's just get your patch tested on Hammer, and see if it works better to have the NUMA scheduler on than off after your patch ..."

9. Configuration Options For Various Problem Cases

29 Jul 2003 - 2 Aug 2003 (13 posts) Archive Link: "[2.6 patch] let broken drivers depend on BROKEN{,ON_SMP}"

Topics: Disks: SCSI, SMP

People: Riley WilliamsJohn BradfordAdrian BunkAlan Cox

Adrian Bunk posted a patch to make all broken drivers depend on a CONFIG_BROKEN option in 2.6-test; and all drivers broken for SMP depend on a CONFIG_BROKEN_ON_SMP option. He said he might have missed a few, and added that Alan Cox preferred "CONFIG_OBSOLETE" over "CONFIG_BROKEN", and that Alan's choice was fine with him if the patch were otherwise acceptible. Riley Williams replied (slightly reformatted by KT author):

To me at least, BROKEN and OBSOLETE have different meanings, and choice of which to use should depend on the circumstances. Here's my choice of definitions for the cases that I can see:

Personally, I'd like to see CONFIG_ANTIQUE (defaulting to "n") as a dependency for all drivers matching the description above simply to cut down on the amount of irrelevant choices in the configuration process.

John Bradford agreed that "BROKEN" and "OBSOLETE" had significantly distinct meaninghs. He said "CONFIG_OBSOLETE" should only require that the feature in question be slated for removal at some future date. It wasn't necessary to have a replacement, as long as the feature was definitely going away. He felt "CONFIG_ANTIQUE" was overkill however. Either something worked or it didn't. If it would be leaving the kernel, then "CONFIG_OBSOLETE" was appropriate. If it worked and would not be leaving the kernel, there was no need to call it antique. Likewise, CONFIG_BROKEN seemed pointless to him in any incarnation of compilation problems or out and out failure. Instead, CONFIG_BROKEN should be reserved, he said, for cases "where a driver such as a SCSI driver builds successfully, but it silently corrupts data under certain, (possibly rare), circumstances. In that case, it's important to warn people that it's broken, because it's not necessarily obvious, and could case significant data loss." But Adrian opined, "You forget one important thing: If a _user_ of a stable kernel notices "it doesn't even compile" this gives a very bad impression of the quality of the Linux kernel." John replied:

I don't agree. The stock kernel is a work in progress, and things get broken from time to time as a normal part of development. Experienced users will realise that, and I wouldn't encourage inexperienced users to compile their own kernel from the stock trees anyway, because they could easily miss bugfixes, including data corruption and security ones, simply because they assume that they are in the mainline kernel.

Compiling your own kernel from the stock kernel trees is still something that should be considered for experienced users only.

Besides, what's worse? Possible data corruption or a bad impression?

Adrian said that non-experts compiled their kernels all the time, and that there was a third alternative to data corruption and a bad impression, and that was what his patch did: disabled the driver in question so it no one would accidentally try to use it. At this point the thread petered out with no real conclusion.

10. Microtech CompactFlash ZiO! USB Support

29 Jul 2003 - 1 Aug 2003 (14 posts) Archive Link: "Zio! compactflash doesn't work"

Topics: USB

People: Greg KHMatthew DharmAndries BrouwerLinus Torvalds

Grant Miner had a Microtech CompactFlash ZiO! USB device, but it didn't seem to show up on his system, either under 2.6-test or 2.4. Greg KH replied somberly, "Linux doesn't currently support this device, sorry." Andries Brouwer replied that he seemed to remember seeing people using that device with no problems. He gave one link (http://www.scm-pc-card.de/service/linux/zio-cf.html) and another (http://usbat2.sourceforge.net/) for projects that seemed to be going well. Greg replied, "In looking at the kernel source, I don't see support for this device. I do see support for others like it, but with different product ids. Perhaps Grant can play with the settings in drivers/usb/storage/unusual_devs.h to try to tweak things to work for his device." Matthew Dharm said that some of ZiO!'s CF readers were supported in Linux and some weren't. He said, "this particular one is not, and likely never will be." Andries said that Grant's device was supported, and referenced the links he'd listed in his previous post. Greg took a look, and said the driver seemed OK to him. He said if Matthew agreed, he'd apply it to his set of patches intended for Linus Torvalds. Matthew replied:

Apparently these guys made more progess than I thought. Last time I talked to them there seemed to be the general opinion that a driver would never get done.

However, if you read the web page, it sounds like they're really not ready to have this merged into the mainstream kernel. I don't like to merge things before their authors want them merged.

I don't really have any objection to the 2.4 patch, but the 2.5 patch needs some serious cleanup before it gets applied.

Andries agreed the 2.5 patch did need more work; and the thread ended.

11. CCISS Authorship

30 Jul 2003 - 31 Jul 2003 (4 posts) Archive Link: "cciss updates for 2.4.22"

People: Jens AxboeMarcelo Tosatti

Mike Miller posted a patch against 2.4.22, changing the author listed in the CCISS driver (the driver for Hewlett-Packard SA5xxx SA6xxx controllers). Charles M. White III had been listed, but Mike's patch changed it to simply Hewlett-Packard. Jens Axboe replied, "Don't feel bad for Charles, he never was the original author of the driver anyways. So the label was misleading." He told Marcelo Tosatti to apply the patch.

12. Cleaning Up USB Configuration Options

31 Jul 2003 (10 posts) Archive Link: "[PATCH] reorganize USB submenu's"

Topics: USB

People: Stephen Hemminger

Stephen Hemminger complained, "The USB configuration menu's in 2.6 are a mismash of sub-menu's and comments. This patch tries to rationalize it so it comes out looking more like the current filesystems menus. I think it is easier to navigate, there should be no functional change from this. Though some elements may appear/disappear differently based on earlier choices." A bunch of people had criticism of changes in the configuration logic, and Stephen posted several improved versions of his patch before the thread petered out.

13. NFS Compatibility Breakage From 2.4 To 2.6

31 Jul 2003 - 3 Aug 2003 (8 posts) Archive Link: "nfs-utils-1.0.5 is not backwards compatible with 2.4"

Topics: Backward Compatibility, FS: NFS

People: Steve DicksonChip SalzenbergNeil BrownAndrew Morton

Steve Dickson complained that as of nfs-utils version 1.0.4, "the NFSEXP_CROSSMNT define was changed to 0x4000 and the NFSEXP_NOHIDE define (which is not supported in 2.4) took over the 0x0200 bit." This, he said, broke backward compatibility with the 2.4 kernel, and he asked if the values could be switched around to preserve compatibility. Chip Salzenberg replied, "This looks like an actual kernel incompatibility 2.4 <-> 2.6, as the 2.4 and 2.6 trees disagree about the value of NFSEXP_CROSSMNT." And Andrew Morton confirmed that this was indeed the case. Elsewhere, Neil Brown told Steve the whole story:

Once upon a time (2.2 era) there was this export flag called NFSEXP_CROSSMNT and "crossmnt" which was un-implemented. I guess it was a hang over from the user-space nfsd and was probably meant to say "mount points in this filesystem can be crossed". But as there was no code and no documentation, one couldn't be sure.

In the kernel nfs server at the time, the concept of "crossmnt" was effectively unimplementable (due the the way the export table was set up and the way file handles were managed). A closely related concept was implementable. This concept is given the name "nohide" in Irix and possibly others. This is a flag set on the child filesystem (rather than the parent) and says that the child should not be 'hiden' when the mountpoint in the parent is accessed.

So, I used the NFSEXP_CROSSMNT flag to implement nohide (it was one of my earliest nfsd patches I think) and told nfs-utils that it could use the name "nohide" to refer to this new flag.

So for sometime, NFSEXP_CROSSMNT, "nohide", 0x0200 meant "this child filesystem should be visible from the parent".

Possibly this was a mistake. Possibly I should have used a different flag or at least changed the name, but I didn't.

As part of the substatial rewrite that went into 2.6, it is possible to implement "crossmnt" type semantics sensibly. When a mountpoint is 'crossed' (by a LOOKUP operation) the kernel can ask user-space to provide export information for that filesystem and act according to the response. (This is not completely implemented in nfs-utils 1.0.5, though it should work to some extent. I hope to figure out the remaining details and get it working before 1.1.0).

So I needed a new flag, and chose 0x4000. This flag can be set on the parent and says that all mount points should be crossed (if possible).

The most obvious name for this flag was NFSEXP_CROSSMNT which was currently inuse as a misnomer for the nohide option. So I renamed the old NFSEXP_CROSSMNT to NFSEXP_NOHIDE, both in nfs-utils and in the kernel. I then added the new flags 0x4000 named NFSEXP_CROSSMNT with the textual representation "crossmnt".

As far as I can tell, the only incompatability that this will cause is if some code outside of the kernel and outside of nfs-utils uses the header files from either the kernel or nfs-utils. Such code will get a new value for NFSEXP_CROSSMNT if it changes it's header files. I don't know if there is any such code, but if there is I apoligise for breaking it and suggest that the best fix is to not use the header file it was using but it explicitly include the values for NFSEXP_* in that code.

Chip Salzenberg objected, "The only really bad thing about the current situation is that the name "NFSEXP_CROSSMNT" is poisoned by having had two historical definitions. So it that name should be dropped, IMO, and replaced by something textually different. "NFSEXP_XMOUNT", perhaps. Even "NFSEXP_CROSSMNT2" would work. Just as long as code that said "CROSSMNT" to mean "NOHIDE" wouldn't accidentally get CROSSMNT instead." And Neil replied, "If we were to change it, I would prefer NFSEXP_CROSSMOUNT. I might send such a patch to Linus and update nfs-utils if/when it gets included" .

14. Including .config In Kernel Binary

2 Aug 2003 - 5 Aug 2003 (12 posts) Archive Link: ".config in bzImage ?"

Topics: Version Control

People: Alan CoxRandy DunlapDiego Calleja GarciaWilliam Lee Irwin III

A quiet end to a long controversy...

Sean Estabrooks remembered some discussion of including the .config file inside the compiled kernel. He asked whether this was actually going to be done or not. Alan Cox said, "Randy Dunlap's ikconfig has been in 2.4-ac for a while and was just accepted for 2.6 proper. You can embed the config for /proc, attach it to the binary, or just say no." And Randy Dunlap himself said to Sean:

Alan sent my ikconfig patch to Linus a couple of days ago and it's in 2.6.0-test-current ... except for the Kconfig part of it, which Alan or I will send soon (if Alan hasn't already done so).

The full was (which is partially merged) is at

http://developer.osdl.org/rddunlap/patches/ikconfig/ikconfig_260c.patch

It includes a script (extract-ikconfig) and a small C program (binoffsets.c) that are used to extract the saved .config image.

The .config file is also available in /proc as a CONFIG option.

Elsewhere, Diego Calleja Garcia quoted from the config text:

This option enables the complete Linux kernel ".config" file contents, information on compiler used to build the kernel, kernel running when this kernel was built and kernel version from Makefile to be saved in kernel. It provides documentation of which kernel options are used in a running kernel or in an on-disk kernel. This information can be extracted from the kernel image file with the script scripts/extract-ikconfig and used as input to rebuild the current kernel or to build another kernel. It can also be extracted from a running kernel by reading /proc/ikconfig/config and /proc/ikconfig/built_with, if enabled. /proc/ikconfig/config will list the configuration that was used to build the kernel and /proc/ikconfig/built_with will list information on the compiler and host machine that was used to build the kernel.

William Lee Irwin III confirmed that the patch was already in Linus' BitKeeper tree.

15. Linus Discusses Mailing List Banning

3 Aug 2003 (1 post) Archive Link: "Re: your mail"

People: Linus Torvalds

H-Peter Recktenwald was banned from the linux-kernel mailing list, and complained privately to Linus Torvalds, asking why he'd been banned. Linus replied on the list:

Maybe because this has nothing to do with the kernel?

It's ok to discuss kernel issues on the kernel mailing list, but we've had tons of totally off-topic flames, rants and general noise.

To the point that a lot of people don't even have time to follow linux-kernel any more, since a lot of the discussion has nothing to do with the technical kernel work.

Since some of these rants are started (and kept going) by people who don't ever seem to actually get involved in _real_ kernel-related technical discussions, David felt that one way to curb it was to just blacklist people who repeatedly post things that aren't related to the kernel.

It's ok to be off-topic every once in a while, but it's not ok to consistently be so.

That said, David is also not the most politic person I know, and I suspect this could have been handled slightly more gracefully. One potential less annoying approach is to not block posting from people, but rewrite the subject line for such posters with a prepended "[OFF-TOPIC]", and just let people filter those out on the receiving end. Or just automatically shunt them off to another list.

I dunno. I don't personally much care - but I've never been the maintainer of the mailing list, and I sure as hell don't ever want to be. Whoever is the maintainer gets to set the rules.

16. ia64 Architecture Successfully Builds On Official 2.6-test Tree

4 Aug 2003 (17 posts) Archive Link: "milstone reached: ia64 linux builds out of Linus' tree"

Topics: Power Management: ACPI, Version Control

People: David MosbergerAndy GroverChristoph HellwigAndrew Morton

An exuberant David Mosberger reported:

As of this morning, Linus's current bk tree (http://linux.bkbits.net:8080/linux-2.5) builds and works out of the box for ia64!

Thanks to everybody who helped make this happen. In particular, thanks to Andrew Morton and Christoph Hellwig for their efforts, and Andy Grover for a last-minute ACPI patch!

For maximum performance/stability, I'd still recommend to use the ia64-specific patches, but for someone who needs to build bleeding edge kernels for multiple architectures, being able to use Linus' tree should make it a lot easier to include ia64 in their regular testing etc.

Now that Linus' tree works for ia64, the next question is how we can keep it that way. I think it would be useful to have someone setup a cron job which does daily builds/automated tests off of Linus tree. If something breaks, the person doing this could perhaps come up with a minimal patch which gets Linus' tree to build again (and submit a patch to the appropriate maintainer, with cc to the linux-ia64 list). I plan on continuing to put out roughly monthly ia64-specific patches and during those normal cycles, I'd then integrate the "quick fix up" patches as needed. Does this sound reasonable? Anybody want to volunteer for this "Linus watchdog" role?

17. RSBAC 1.2.2 Released

4 Aug 2003 (1 post) Archive Link: "Announce: RSBAC v1.2.2 released"

Topics: Access Control Lists, Version Control

People: Amon Ott

Amon Ott announced:

Rule Set Based Access Control (RSBAC) version 1.2.2 has been released. Full information and downloads are available from http://www.rsbac.org

RSBAC is a flexible, powerful and fast open source access control framework for current Linux kernels, which has been in stable production use since January 2000 (version 1.0.9a). All development is independent of governments and big companies, and no existing access control code has been reused.

The system includes a big range of decision modules, some of which implement professional access control models like ACL, MAC or Role Compatibility. It supports both 2.4 and 2.2 kernel series. Now that 2.6 seems to stabilize, the port to 2.6.0-test is in progress.

New features compared to version 1.2.1:

18. gmodconfig 0.4 Released

6 Aug 2003 (1 post) Archive Link: "[ANNOUNCE] gmodconfig 0.4 is released"

People: Cyril Bortolato

Cyril Bortolato announced:

gmodconfig is a tool to manage Linux kernel modules. It features a GNOME graphic interface which enables users to:

Some of this data is not found in modinfo's output and has to be supplied to gmodconfig in XML files. A companion tool to gmodconfig called gmodconfigedit can help module authors create and update those XML files.

For details please visit: http://gmodconfig.sourceforge.net/

This tool can help inexperienced Linux users to configure modules (as is sometimes required with consumer devices like webcams) without having to learn about /etc/modules.conf, and install or upgrade modules with the click of a mouse, thanks to DKMS. Experienced users might find it helpful, too.

It still has some rough edges and I welcome feedback from the community. Please Cc me your comments as I'm not subscribed to linux-kernel.

 

 

 

 

 

 

Sharon And Joy
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.