Kernel Traffic #202 For 24 Jan 2003 By Zack Brown Table Of Contents * Standard Format * Text Format * XML Source * Mailing List Stats For This Week * Threads Covered 1. 10 Jan 2003 - 19 Jan 2003 (202 Status Of 2.5 posts) 2. 12 Jan 2003 - 19 Jan 2003 (6 Support For via686a Sensors In 2.5 posts) 3. 12 Jan 2003 - 14 Jan 2003 (9 Secure User NFS Authentication Using posts) RPCSEC_GSS 4. 13 Jan 2003 - 16 Jan 2003 (8 Complaints About The New Configuration posts) Process 5. 13 Jan 2003 - 14 Jan 2003 (4 Support For AMD Processors posts) 6. 13 Jan 2003 - 15 Jan 2003 (8 Linux 2.5.57 Released posts) 7. 13 Jan 2003 - 16 Jan 2003 (3 sysfs Interface To cpufreq In 2.5, And posts) Deprecation Of /proc/cpufreq 8. 14 Jan 2003 - 15 Jan 2003 (14 Confusion Over IPMI Documentation posts) 9. 14 Jan 2003 - 15 Jan 2003 (4 Support For 'Pending Break Enable' Bit posts) In CPUID Processor Info 10. 14 Jan 2003 - 15 Jan 2003 (5 Looking For Archives Of linux-kernel posts) 11. 14 Jan 2003 (1 TTY Subsystem Unmaintained post) 12. 15 Jan 2003 (1 Status Of 2.5 post) 13. 16 Jan 2003 - 17 Jan 2003 (7 Open Source Hardware posts) 14. 16 Jan 2003 - 19 Jan 2003 (6 NUMA-Aware Scheduler; Hyperthreading posts) 15. 16 Jan 2003 - 20 Jan 2003 (28 Linux 2.5.59 Released posts) 16. 17 Jan 2003 - 22 Jan 2003 (27 New Module Builder Project; Complaints posts) About Module Standards 17. 17 Jan 2003 (1 User-Mode Linux 2.5.58-1 Released post) 18. 17 Jan 2003 (1 Kernel Bug Database Version 2.0 post) Released 19. 18 Jan 2003 - 20 Jan 2003 (8 ntfsprogs 1.7.0beta Released posts) 20. 20 Jan 2003 (7 Compiling The Kernel With Non-GCC posts) Compiler 21. 20 Jan 2003 - 22 Jan 2003 (10 Rewriting The SMP Parsing Code posts) 22. 21 Jan 2003 (1 Virtual Memory Documentation post) 23. 21 Jan 2003 (1 Linux Security Module 2.5.59-lsm1 post) Released Mailing List Stats For This Week We looked at 2320 posts in 11186K. There were 606 different contributors. 308 posted more than once. 217 posted last week too. The top posters of the week were: * 60 posts in 302K by "Martin J. Bligh" * 57 posts in 383K by William Lee Irwin III * 46 posts in 179K by Andrew Morton * 43 posts in 199K by Zwane Mwaikambo * 34 posts in 128K by Rob Wilkens * Full Stats 1. Status Of 2.5 10 Jan 2003 - 19 Jan 2003 (202 posts) Archive Link: "any chance of 2.6.0-test*? " Topics: Code Freeze, Disks: IDE, FS: sysfs, Framebuffer, Ioctls, Networking, PCI, Power Management: ACPI, Real-Time, USB People: Dave Jones, Alan Cox, Greg KH, Andi Kleen, Linus Torvalds, William Lee Irwin III William Lee Irwin III thought 2.5 was running really well, and that it was time to shift into high gear to get 2.6 out the door. He asked what the issues were that were holding this up. Dave Jones replied, "There's still a boatload of drivers that don't compile, a metric shitload of bits that never came over from 2.4 after I stopped doing it circa 2.4.18, a lot of little 'trivial' patches that got left by the wayside, and a load of 'strange' bits that still need nailing down" [...] "I think we're a way off from a '2.6-test' phase personally, but instigating a harder 'code freeze' would probably be a good thing to do." Elsewhere, Alan Cox gave his own assessment of 2.5: IDE is all broken still and will take at least another three months to fix - before we get to 'improve'. The entire tty layer locking is terminally broken and nobody has even started fixing it. Just try a mass of parallel tty/pty activity . It was problematic before, pre-empt has taken it to dead, defunct and buried. Most of the drivers still don't build either. I think its important that we get to the stage that we can actually say * It compiles (as close to all the mainstream bits of it as possible) * The stuff that is destined for the bitbucket is marked in Config and people agree it should go * It works (certainly the common stuff) * Its statistically unlikely to eat your computer * It passes Cerberus uniprocessor and smp with/without pre-empt Otherwise everyone wil rapidly decide that ".0-pre" means ".0 as in Windows" at which point you've just destroyed your testing base. Given all the new stuff should be in, I'd like to see a Linus the meanie round of updating for a while which is simply about getting all the 2.4 fixes and the 2.5 driver compile bugs nailed, and if it doesn't fix a compile bug or a logic bug it doesn't go in. No more "ISAPnP TNG" and module rewrites please There was a round of agreement on this last point, and Jochen Friedrich added that the Framebuffer code was also a mess, as was ISDN and (to a lesser extent) USB. Elsewhere, Andi Kleen asked what specifically was wrong with the IDE code, and Alan replied: Low level drivers are basically sorted. The main problems are * Incorrect locking all over the place * Incorrect timings on some phases * Some ioctls can cause crashes due to locking * ISAPnP IDE doesn't work right now * Flaws in error recovery paths in certain situations * Lots of random oopses on boot/remove that were apparently introduced by the kobject/sysfs people and need chasing down. (There are some non sysfs ones mostly fixed) * ide-scsi needs some cleanup to fix switchover ide-cd/scsi (We can't dump ide-scsi) * Unregister path has races which cause all the long standing problems with pcmcia and prevents pci unreg * PCI IDE driver registration needs busy checks * PCI layer needs some stuff from 2.4 * PCI layer in 2.4/2.5 needs an IRQ bug fixing * ACPI doesn't seem to handle compatibility IRQ mode * We don't handle a few errata (MWDMA on 450NX for example) * IDE raid hasn't been ported to 2.5 at all yet Thats off the top of my head right now. Elsewhere, regarding the TTY layer, Greg KH said: I've looked into this, and wow, it's not a simple fix :( But this is really the first it's been mentioned, I can't see holding up 2.6 for this. It's a 2.7 job at the earliest, unless someone wants to step up and do it right now... Alan remarked, "2.5.x crashes erratically and randomly under high tty/pty load. At the moment I'm assuming this is the tty code. That means we can't decide not to fix it since its already fatally broken." Close by, Linus Torvalds said he didn't think the TTY code was in such bad shape. He guessed there were just a few locking problems that had crept in, coupled with the preemption patches' tendency to expose existing locking bugs. He, Andi, Greg and others began working up some fixes, but someone happened to mention that they didn't like 'goto's in C code, and that blossomed into a large debate that left the TTY problem in the dust. 2. Support For via686a Sensors In 2.5 12 Jan 2003 - 19 Jan 2003 (6 posts) Archive Link: "[PATCH] via686a sensors support" People: GertJan Spoelman, Pavel Machek, Christoph Hellwig GertJan Spoelman announced, "This patch adds via686a sensors support to 2.5.56 (tested with bk1). Christoph this patch applies against the patch you sent me yesterday. and again, please check it." Christoph Hellwig first of all had an objection, namely that GertJan's patch tried to do several things at once, and so should have been submitted as several seperate patches. In particular, some of GertJan's documentation fixes should have been pealed off. Pavel Machek added, "Please submit that documenation fixes through trivial patch monkey. That should make sure they are not lost." Christoph also had some technical objections to the patch, and GertJan submitted an updated patch. Christoph replied, "This patch looks really good to me. Please try to submit and your other stuff to Linus." GertJan thanked him and said he would. 3. Secure User NFS Authentication Using RPCSEC_GSS 12 Jan 2003 - 14 Jan 2003 (9 posts) Archive Link: "[PATCH] Secure user authentication for NFS using RPCSEC_GSS [0/6]" Topics: FS: NFS, FS: ramfs, Feature Freeze People: Trond Myklebust, Dax Kelson, Paul Jakma Trond Myklebust implemented portions of RFC 2203 (http://www.faqs.org/rfcs/ rfc2203.html) and announced: The following set of 6 patches implements support for the RPCSEC_GSS security protocol (authentication only) and the Kerberos V5 security mechanism. These patches constitute a resend (modulo some bugfixes) of a set that was originally sent to you and the L-K list on 31/10/2002. I received no comment on them at the time (and they were not immediately applied), and so I've been waiting for the general hubbub after the feature freeze to die down before. RPCSEC_GSS is the security mechanism that is mandated for all compliant NFSv4 implementations by RFC3010. It provides a protocol for negotiating secure authentication and data transfers on a per-user basis. It does so in a manner that does not depend on the actual security mechanism that is used, and so can support a variety of such mechanisms. The mechanisms that are mandated for NFSv4 by RFC3010 are Kerberos V5 (see RFC1964), SPKM-3 (RFC2025), and LIPKEY (RFC2847). The actual security negotiation can be done out of band, so it makes sense to delegate as much of this as possible to a userland daemon. The result of negotiation is a security 'context' which is cached in the kernel, and is subsequently used for authentication (as part of the credential in the RPC header) and/or for data integrity/privacy protection (using whatever crypto mechanism your chosen security mechanisms support). Our wish is to provide basic kernel RPC client support for the generic RPCSEC_GSS protocol, and for communicating with a userland daemon that does the actual the security context negotiation with the RPC server. Communication between kernel and userland is done over a set of named pipes (in much the same way as the CODA upcall/downcall is done) in a private ramfs-like filesystem. Dax Kelson replied: As a user and sysadmin, I've been waiting for this for a LONG time. Standard NFS security/authentication sucks rocks. Without this NFS home directory servers are just waiting to be ransacked by a rouge (or compromised) root user on a client machine. NFSv4 w/RPSEC_GSS is finally a native UNIX filesharing solution that I don't have to be ashamed of when hanging with admins of those "other OSes". Paul Jakma pointed out that a root user on a client machine could still ransack the server. Dax replied: Yes, if you login to a compromised machine, and then obtain krbv5 credentails the evil root user can access/delete/modify your files stored on a RPSEC_GSS NFS server. With RPSEC_GSS, a compromised machine, on it's own (no logged in users except evil root), can not access/delete/modify files stored on the NFS home directory server, which is quite different than the normal case. This helps when the exploit-of-the-day hits at 4am Saturday morning. As a matter of practice you shouldn't leave cached credentials lying around when you not logged in. Unless you have a very strong reason not to, kill your ssh-agent and run kdestory on logout (.bash_logout and friends). Trond had a more hard-lined view, that once a root account had been cracked, the game was over already, so it was pointless to worry about the mischief that root user could do. He said, "The RPCSEC_GSS security model is not meant to protect you against root monitoring. It is meant to prevent some third party (on another machine for instance) from spoofing RPC requests in you name (== strong authentication), intercepting valid RPC requests and modifying the payload (== cryptographic data integrity checking), or listening in on the client/server communication (== data privacy)." 4. Complaints About The New Configuration Process 13 Jan 2003 - 16 Jan 2003 (8 posts) Archive Link: "why the new config process is a *big* step backwards" Topics: Disks: SCSI, Networking, Power Management: ACPI People: Robert P. J. Day, Tomas Szepe Robert P. J. Day vented: (apologies to those who are thoroughly sick of this topic, but i'm now firmly convinced that i don't much care for the new config process, and i'm curious as to whether it's just me. Answer: probably.) IMHO, the new config process (and i'll restrict myself to talking about the graphical "make xconfig" process here) not only doesn't improve substantially over the old one, but is actually worse in a number of places. where to start? first, the hierarchical structure of the options in the left window (i'm going to make up names and call these the "menu window", "option window" and "help window") is non-intuitive, in that the top-level selection will bring up a set of selectable options, while submenus will *also* bring up options. example: Power management options. if i select that menu option explicitly, i get options including APM in the option window. but if i expand that option, i can select the submenu "ACPI Support", for further options. this is confusing -- it's analogous to a directory having files both directly inside it *and* within a sub-structure. this is inconsistent with other common things people are familiar with -- in the pine mailer, for example, you can't use a folder both for storing files *and* for having subfolders. and think about bookmarks in a browser (a model i wish the new config process had followed). the current design is messy since it suggests that some options belong strictly to the top level, while others belong to more specialized sub levels. if that's the case, then the menu window should contain something like: [+] Power management options (APM, ACPI) Basic APM options ACPI Support (obviously, this would apply to *all* entries in the menu window thave have submenus.) but wait, you say, there's an advantage to this approach. it means i can, with one click, get to the more common settable options, rather than needing to expand the top level menu. so we get to my second complaint. there's no reason to not have checkboxes *right* *in* the menu window, so i can see *immediately* whether i have entire submenu options selected. consider "IrDA (infrared) support". from the menu window, there's no way to tell if i have this selected. instead, i must select that option, get it's option window displayed, and only then can i see/select/deselect *all* of IrDA in one fell swoop. (of course, the same is true of submenus where, e.g., under Networking support, i can only deselect all of "Ethernet 1000 Mbit" by first selecting that option, getting its menu, then turning it off at the top.) this is hideously uninformative, since it's impossible to tell at a glance what entire submenus are selected or not. why *shouldn't* i be able to see, with one look, that my current configuration is not selecting Plug and Play, SCSI, Amateur Radio, IrDA, IDSN, Power Management and Bluetooth? adding selection checkboxes to top-level entries in the menu window would make this trivial, and it's one area that the previous configuration program fell down as well. it's disappointing that this was not addressed. my third complaint represents where the new config process is actually *worse* than the previous. the fact that there is a single menu window and a single option window makes it impossible to work in detail in more than one part of the main menu at a time (assuming i haven't overlooked some neat feature of this new process). at least in the old "make xconfig", i could bring up two children dialogs at a time. perhaps i want to examine/configure both "Block devices" and "Filesystems" at the same time, since there are some related features (loopback device support under Block devices lets me mount filesystem images). under the new scheme, this is impossible (unless there's a trick or feature i haven't found). and that option window is just confusing. given that we already have +/- expand /collapse icons, and checkboxes for selection, it just makes things messier to have these submenu boxes with the internal triangle. and once it takes you to that submenu, is it really painfully obvious how you back up one level? (the arrow icon in the tool bar?) frankly, i would like to see the option window disappear entirely. i see no need to have more than two frames -- a menu window with expandable/collapsible choices, where i can select/deselect entire chunks with a click, where it's obvious at a glance which parts are deselected, and where i can expand more than one part of the top-level menu to configure more than one set of options at a time. (this would be even more practical if the number of top-level entries in the menu window was reduced. i mean, is it really necessary to have separate top-level entries for MTD, Fusion MPT and related selections? why not just a top-level entry for some kind of all-encompassing "Device support"? i know, that's a bad name, but you get the idea.) Tomas Szepe replied: please study scripts/kconfig/*, not how one particular frontend is. The new kernel configurator is actually a big improvement over the traditional stuff we used to have up to 2.4. Okay, it is a fact that xconfig is far from great, but that doesn't matter -- the important thing is Kconfig provides a clean, generic system for the actual kernel configuration. As I already pointed out a fortnight ago or so, the only config frontend likely to stay in linux.tar in the long run is menuconfig, serving as a reference to userland people who are certain to come up with heaps of different Kconfig frontends (that is when 2.6 ships I guess). If you need a nifty graphical frontend right away, I suggest you go ahead and write the first off-tree xconfig. This all made sense to Robert, and he said he'd check out scripts/kconfig/* for a true understanding of the system. 5. Support For AMD Processors 13 Jan 2003 - 14 Jan 2003 (4 posts) Archive Link: "Is linux kernel is available for any AMD processors?" People: Geert Uytterhoeven, Dave Jones Vadlapudi Madhu asked if Linux would work on any AMD processors. Dave Jones replied it probably worked on all of them. Either a generic kernel, compiled for 386, or a kernel configured specifically for Athlon/Duron, should work fine, he said. But Geert Uytterhoeven interjected, "Don't know which AMD CPUs the original poster intended, but i386 kernels don't boot on Am29000. Yes, AMD produced non-i386 compatible CPUs as well." Dave Jones slapped his own forehead, saying, "Ach, yes. Easy mistake to make when 'all your worlds an x86'" 6. Linux 2.5.57 Released 13 Jan 2003 - 15 Jan 2003 (8 posts) Archive Link: "Linux v2.5.57" Topics: Disks: IDE, FS: NFS, FS: sysfs, Networking, Virtual Memory People: Linus Torvalds, Adam Belay, Derek Atkins, Brian Gerst, Mikael Pettersson, Andrew Morton, Jaroslav Kysela Linus Torvalds announced Linux 2.5.57 ( ) and said: Ok, Alan worked on fixing the network packet padding thing (small changes to a _lot_ of network drivers), and merged some more of his IDE work. And latency fixes and some VM updates from Andrew Morton. Ppc, ppc64, ISDN and sparc updates. NFSd and sysfs updates. And special mention for Brian Gerst, who figured out and fixed a x86 page table initialization fix that would leave old machines unable to boot 2.5.x. That might explain a number of the "I can't run 2.5.x" that weren't seen by developers (most developers tend to have hardware studly enough that they'd never see the problem). He replied to himself: Actually, I should also mention Mikael Pettersson, who actually debugged and chased the problem down to the initialization. Sometimes finding where the problem happens is harder than fixing it once found. (On that same vein, kudos to Derek Atkins for chasing down where the problems he saw with init started happening.) Adam Belay noticed that, in the ChangeLog, Jaroslav Kysela was credited with PnP Support 0.94; Adam said, "The Linux PnP Support 0.94 update was from me, not Jaroslav. I'd appreciate if you would change this in the changelogs." 7. sysfs Interface To cpufreq In 2.5, And Deprecation Of /proc/cpufreq 13 Jan 2003 - 16 Jan 2003 (3 posts) Archive Link: "[PATCH 2.5.57] cpufreq: add sysfs interface" Topics: FS: sysfs, Version Control People: Dominik Brodowski, Patrick Mochel Dominik Brodowski announced, "This patch adds a sysfs interface to the cpufreq core, and marks the previous /proc/cpufreq interface as deprecated." Patrick Mochel did some work of his own, posted a patch, and said, "The following updates the patch to reflect the sysfs changes currently in Linus's BK tree (reinstating the count parameter to sysfs store() methods)." 8. Confusion Over IPMI Documentation 14 Jan 2003 - 15 Jan 2003 (14 posts) Archive Link: "IPMI" People: Paul Mackerras, Corey Minyard, Rusty Russell Rusty Russell noticed that the configuration text for IPMI didn't mention what it actually was or why a user might want it. Paul Mackerras remarked, "There is a Documentation/IPMI.txt, which would serve as an excellent example of how _not_ to write a documentation file, should you ever decide to write a "Rusty's Unreliable Guide to Writing Kernel Documentation" and need an example to pillory." He pointed out that the documentation didn't say anything about what IPMI actually was. Corey Minyard added the following explanation to the doc: The Intelligent Peripheral Management Interface, or IPMI, is a standard for controlling intelligent devices that monitor a system. It provides for dynamic discovery of sensors in the system and the ability to monitor the sensors and be informed when the sensor's values change or go outside certain boundaries. It also has a standardized database for field-replacable units (FRUs) and a watchdog timer. To use this, you need an interface to an IPMI controller in your system (called a Baseboard Management Controller, or BMC) and management software that can use the IPMI system. This satisfied Paul. 9. Support For 'Pending Break Enable' Bit In CPUID Processor Info 14 Jan 2003 - 15 Jan 2003 (4 posts) Archive Link: "new CPUID bit" Topics: Version Control People: Ulrich Drepper, James H. Cloos, Mikael Pettersson, Dave Jones, James H. Cloos Jr. Ulrich Drepper said, "Northwood P4's have one more bit in the CPUID processor info set: bit 31. Intel calls the feature PBE (Pending Break Enable). The attached patch for the current BK kernel adds the necessary entry." James H. Cloos Jr. replied: For the curious, from : Adrian> Bit 31 is PBE (Pending Break Enable) which you can find in the Adrian> latest P4 instruction manual (document 24547106, page Adrian> 159-162). To quote: 24547106> Pending Break Enable. The processor supports the use of the 24547106> FERR#/PBE# pin when the processor is in the stop-clock state 24547106> (STPCLK# is asserted) to signal the processor that an 24547106> interrupt is pending and that the processor should return to 24547106> normal operation to handle the interrupt. Bit 10 (PBE 24547106> enable) in the IA32_MISC_ENABLE MSR enables this capability. Mikael Pettersson replied: A better reference for this stuff is (IMHO) AP-485, the "Intel Processor Identification and the CPUID Instruction" application note. It's regularly updated, and in this particular case, its description of CPUID with EAX=1 differs from the IA32 Volume 2 manual (245471xx) in two ways: * EBX bit 31 is called "SBF", Signal Break on FERR. * ECX is defined to contain additional feature flags. Currently only one is defined: ECX bit 10 is the "Context ID" feature for putting the L1 D-cache in adaptive or shared mode, which matters for hyper-threaded CPUs. Supporting the new ECX feature flags in the kernel will require some surgery, since the current code assumes x86_capability[0] is Intel, [1] is AMD, [2] is Transmeta, and [3] is for conflicting or synthesized feature flags. We either shift AMD etc down one index and put ECX in [1], or add a new index [4] for ECX, or kludge the few ECX-defined features in [3]. And Dave Jones suggested, "Or we change it so we end up with something like.. x86_capability[0].standard and x86_capability[0].extended" 10. Looking For Archives Of linux-kernel 14 Jan 2003 - 15 Jan 2003 (5 posts) Archive Link: "mbox archive of linux-kernel ?" Topics: Mailing List Administration People: Matti Aarnio, Stefan Gorling, Chris Funderburg Scott McDermott wanted to find all the archives of the linux-kernel mailing list going back to its inception. Matti Aarnio said, "See the pointers at: http://vger.kernel.org/vger-lists.html#linux-kernel" And Stefan Gorling also said: ftp.uwsg.indiana.edu (ftp://ftp.uwsg.indiana.edu) carries most of the mbox-files, except for Apr-June 2000 which are missing for some odd reason. So if anyone have any idea where I might find it in a convenient format I'd appriceate it. If you just need senders and subject I could mail you a sql-dump. If you're going to parse them, I've found perl and Mail::Box very convenient. Chris Funderburg also said: mbox files are here: ftp://ftp.uwsg.iu.edu/pub/mail.archive/kernel It only goes back to March, 1996, and the files aren't compressed, so they're rather large. 11. TTY Subsystem Unmaintained 14 Jan 2003 (1 post) Archive Link: "TTY subsystem maintainership" People: Russell King Russell King had apparently been getting a lot of mail about the TTY subsystem, and had had enough. He explained: Before anyone gets any smart ideas, I'd like to make the following point completely crystal clear. I am _not_ repeat _not_ going to take over maintainership for the TTY subsystem. I have too much other stuff to look after to take on that job. However, I will review patches to the TTY layer from time to time, and provide (hopefully) useful feedback. The patches I review will be decided by myself, and will depend on what they touch, how complex they are and how busy I am. And, just for completeness of the message, please do not send TTY layer, TTY line discipline layer, nor random TTY driver patches to me either. There was no reply. 12. Status Of 2.5 15 Jan 2003 (1 post) Archive Link: "[STATUS 2.5] January 15, 2003" Topics: Bug Tracking People: Guillaume Boissiere, John Cherry, Andrew Morton Guillaume Boissiere updated the 2.5 Status For January 15, 2003 (http:// kernelnewbies.org/status/Status-15-Jan-2003.html) on his status page (http:// kernelnewbies.org/status) , and said: Probably the most important thing since last week is the merge of Andrew Morton's work to remove the last remaining sources of high scheduling latency aka "Linux 2.6, multimedia OS". On the bugzilla side, 155 bugs and counting. And there are lots and lots of compile warnings since the introduction of the deprecated keyword (see http:// www.osdl.org/archive/cherry/stability/ John Cherry's excellent page for details). 13. Open Source Hardware 16 Jan 2003 - 17 Jan 2003 (7 posts) Archive Link: "Open source hardware" Topics: SMP People: John Bradford, Jeff Garzik, Eric W. Biederman, Herman Oosthuysen John Bradford remarked: I've been reading some of the threads about the GPL, and binary-only drivers, and I'm suprised that nobody has brought up open source hardware, (or rather, the lack of it). Open source hardware more or less sidesteps the whole issue of closed-source drivers - an open source driver would be so easy to write with all the specifications available that there would be very little point in writing a closed-source driver. At the moment there is not very much open source hardware, and what does exist is generally peripherals, and not things like CPUs, but I expect this will change soon, mainly because it would be easy to develop a cheap, and simple CPU that is designed for multi-processor use from the beginning. This means that each CPU would be cheap and easy to produce, (simple design = high yeild from each wafer, and mass production = low cost per unit). Typical machines would have several orders of magnitude more processors than those of conventional design, (E.G. 4 to 16 for a desktop), but they would be far cheaper, because anybody would be free to fabricate the CPUs. So, basically, the idea is to design a low-cost, low-computational-power CPU, which works well in multi-processor configurations, and make the specification open source. Anybody could make the processors, and building a machine of a given computational power would be cheaper using them than using conventional CPUs. I personally expect to see this within 10 years. Jeff Garzik and Herman Oosthuysen pointed him to http://www.opencores.org/, and Jeff said with a smirk, "You're behind the times :)" . John replied: Interesting - I'd only seen open source CPU projects which were at the planning stage. It seems that most of the components necessary to build a usable machine are at least well-advanced, although most of the non-CPU parts are based around the WISHBONE interface, whereare most of the CPUs are not, so maybe the goal is further away than it first appears, but still, progress is being made. Do you know of anybody who has actually made a prototype board from any of these CPU designs? Is my idea of running a lot of simple CPUs together fundamentally flawed, or is it possible to overcome the inefficiencies of SMP, if the CPUs are designed for it from the ground up? Eric W. Biederman replied: The fundamental problem is not inefficiencies of SMP. But rather there are some tasks that simply do not parallelize well. Big supercomputer kinds of applications that require a lot of number crunching usually benefit from multiple cpus. But small every day applications don't. The only applications that scale perfectly with the number of cpus are the embarrassingly parallel ones, in which no communication is involved between the various subtasks. This is not to say an elegant design might not get there, AMD is trying for that. But simple brute force will certainly not get you there. 14. NUMA-Aware Scheduler; Hyperthreading 16 Jan 2003 - 19 Jan 2003 (6 posts) Archive Link: "[PATCH] (0/3) NUMA aware scheduler" Topics: Hyperthreading, SMP People: Martin J. Bligh, Linus Torvalds, Andrew Theurer, Pavel Machek, Michael Hohnbaum, Robert Love, Erich Focht Martin J. Bligh said to Linus Torvalds: Following is a sequence of patches to add NUMA awareness to the scheduler. These have been submitted to you several times before, but in my opinion were structured in such a way to make them too invasive to non-NUMA machines. I propsed a new scheme of working in "concentric circles" which this set follows (Erich did most of the hard work of restructuring), and is now completely non-invasive to non-NUMA systems. It has no effect whatsoever on standard machines. This can be seen by code inspection, and has been checked by benchmarking. These patches are the culmination of work by Erich Focht, Michael Hohnbaum and myself. We've also incorporated feedback from Christoph and Robert Love. I believe these are now ready for mainline acceptance. I've tested them on NUMA-Q, standard SMP and UP. Erich has run them on the NEC ia64 NUMA machine. Linus replied: Applied. I also have to say that I hope this means that the HT-specific scheduler stuff will go away. HT _should_ be just another NuMA issue, and right now the two seem to be just slightly different ways of covering the same needs. However, I'm going away for two weeks starting tomorrow, so even if there is some experimental HT/NUMA patch, I don't want it at this point. The NUMA scheduler merge is more of a "get the infrastructure in place" thing for me right now. Martin was thrilled to see the patch go into the tree, and agreed that infrastructure was the best focus in the immediate future. Regarding the hyperthreading issues, he agreed, "Yup, Andrew Theurer from our performance team has been working on this. Initial results look encouraging." Elsewhere, Andrew Theurer replied to Martin's initial post, saying: FYI, I have used a topology to map HT aware processors (in this case P4) to a NUMA topology while using this scheduler. This was done to help address the same problems that Ingo's shared runqueue implementation fixed. The topology is quite simple. Sibling logical procs are members of a node. Number of nodes = number of physical procs. This primarily avoids sharing cpu cores (and avoiding resource contention) on low loads. In my case, 4 tasks on 8 logical proc system, we want to load balance the tasks across nodes/cores for better performance. For my test, I did a make -j4 on a 2.4.18 kernel. Results are: stock sched, no numa: 56.523 elapsed 202.899 user, 18.266 sys, 390.6% numa sched, ht topo: 53.088 elapsed 189.424 user, 18.36 sys, 391% ~6.5% better. These results are the average of 10 kernel compiles. I did make one minor change to sched_best_cpu(). The first test case was elimintaed, and that change is currently under discussion. I did this mainly to demonstrate that a numa scheduler's policies may be able to help HT systems and to capture a wider interest in numa scheduler. By no means is P4 HT required to use this. This is simply a numa topology implemantation. I would like some feedback on any interest in this. One of the reasons we probably have not had much interest in numa patches is that numa systems are not that prevailent. However, numa-like qualites are showing up in commonly available systems, and I believe we can take advantage of policies that these patches, such as numa scheduler provide. Does anyone have any other ideas where numa like qualities lie? x86-64? Pavel Machek replied, "Yep, x86-64 SMP systems are in fact NUMA systems that don't penalize remote memory *that* badly." 15. Linux 2.5.59 Released 16 Jan 2003 - 20 Jan 2003 (28 posts) Archive Link: "Linux 2.5.59" Topics: FS: XFS, Framebuffer, Kernel Build System People: Linus Torvalds Linus Torvalds announced 2.5.59 (http://www.kernel.org/pub/linux/kernel/v2.5/ ChangeLog-2.5.59) and said: Updates to sparc, alpha, ppc64, fbdev, XFS, AGP, kbuild, arm... Likely the last release by me in a while, but Andrew & co can hold the fort.. 16. New Module Builder Project; Complaints About Module Standards 17 Jan 2003 - 22 Jan 2003 (27 posts) Archive Link: "ANN: LKMB (Linux Kernel Module Builder) version 0.1.16" Topics: Kernel Build System, Networking, Sound: ALSA People: Shlomi Fish, David Woodhouse, Sam Ravnborg, Olaf Titz, Arjan van de Ven , John Levon, Linus Torvalds Shlomi Fish gave a URL to some sources (http://fc-solve.berlios.de/CLAN/ download/arcs/Linux-Kernel-Modules-Builder-0.1.16.tar.gz) and a URL to the CLAN temporary homepage (http://fc-solve.berlios.de/CLAN/) and announced: LKMB version 0.1.16 is the humble codeware beginning of the CLAN project. It is essentially a Perl package (proper with Makefile.PL and all, but not CPANed yet), which enables one to process LKMB packages. The latter ones are packages that LKMB can create installation and compilation packages for kernel modules that can run in any enviornment the Linux kernel can be compiled and installed on. (a GNU environment). It contains an example module for the Ethernet DMFE module. Currently, the makefile for the kernel module's package supports only the "all" and "install" targets. I will upload it to CPAN soon, but would like to get some initial feedback beforehand. There was no discussion of the project itself, but David Woodhouse caught Shlomi making use of kernel headers under /usr/src/linux, which was a no-no. He said: you need to get the proper kernel CFLAGS, and you shouldn't assume there's anything useful in /usr/src/linux. Use "/lib/modules/`uname_-r`/build/index.html" as a default kernel directory, but allow it to be overridden somehow from the command line. Then do something like... make -C $(LINUXDIR) SUBDIRS=`pwd` modules ... to build your module. That way, all the kernel build stuff will be correct; it'll be just as if you were in a normal subdirectory of the kernel tree during a 'make modules' run. Shlomi asked, "Do you mean I'll need a live Linux kernel to build the kernel module package?" He added, "The LKMB package needs to compile on every system it was intended to. It's still a source package that has to compile on any GNU system that has the Linux kernel headers." Sam Ravnborg replied, "Yes, you fundamentally need the full kernel to compile a module. Modules may refer to different headers, and some may even be arch specific. The trick dwmw2 gave you is the only _sane_ way to build a module." Sympathizing with Shlomi, Olaf Titz said to him: Whoever invented this /lib/modules/... scheme should have known that it provokes this sort of misunderstandings, not to mention is broken in other ways too. You need the _source_ of the kernel the module will run on to compile modules. You don't need to _run_ this kernel while compiling. Putting build infrastructure into a deployment directory at the least causes confusion, not to mention that the deployment directory might not even exist on the development machine. (I routinely compile kernels and modules of different configurations for three boxes on one of them, the other two don't even have a complete development toolset.) Compiling modules is one of the things which always have been among the most broken things in the kernel build systems, can this please be fixed and properly documented? Arjan van de Ven pointed out that Linus Torvalds was the one who decided that the current situation would be the standard; but Olaf pointed out that yes, even Linus could make a mistake. Arjan had said the current situation was 99% correct, and Olaf said in his reply: what's exactly wrong with the other 99% solution of putting it in /usr/src/ linux-`uname -r` ? This has exactly the same advantages but doesn't mix up between development and runtime environment; /usr/src is clearly where source belongs and /lib/modules is an install target. Even Linus has finally accepted that the root of the source tree is best called linux-$VERSION rather than just linux, so this is not an obstacle either. Arjan said, "back then the argument (not mine btw) was that /usr on a lot of machines is RO (I think debian has an option for that) so that sysadmins there compile stuff in /root. /lib/modules however IS standardized and needs to be writable to install a new kernel so making a symlink to the real place there isn't too bad. In addition it already is the only directory with per kernel files.. adding a second one was judged not needed. It has to be somewhere. /lib /modules/ or /usr/src.. who cares. Linus made the final call and everybody complies with it since then, just because it doesn't matter THAT much. It just needs to be SOMEWHERE standard and /lib/modules suffices so far it seems." Olaf replied, "Frankly, I think the main reason is that Linus doesn't care at all about the kernel build process. We've had a _much_ better solution already in the 2.5 cycle which was rejected for completely bogus formal reasons coupled with an explicit "why do we need this at all", even though it was pointed out over and over again what is broken currently (or was back then, granted it has improved but not as much as is desirable and possible)." Elsewhere, John Levon asked Olaf for a more complete bug report, and Olaf replied: The general bug is that there is incomplete infrastructure for building modules outside of the kernel. You see the problem when looking at the CIPE configure scripts, or the ALSA configure scripts. Up to kernel version 2.4, it takes considerable effort to find out just what compiler options to use. This is information which belongs in some easily accessed location. The desirable situation for module developers would be that a kernel tree after configure run contains a Makefile (or equivalent) with all necessary definitions which can be called from an outside module source tree and just DTRT. The 2.5 kbuild stuff is close, but not complete. It is a bug that Documentation/modules.txt is so outdated that it contains little useful information any more. It is a bug that Documentation/kbuild/ makefiles.txt is at least a bit outdated. It is a bug that the build process outside of the kernel tree changes files inside the kernel tree when MODVERSIONS is enabled. (At least this was the case last time I checked.) This means the kernel tree can't be mounted read-only, or at least you would have to do dirty tricks with symlinks. It is a bug that the current Makefile can't compile modules in an object directory different from the source directory. This means the module source tree can't be mounted read-only (again, without resorting to symlinks). It is also a bug that parts of the development infrastructure are installed in /lib/modules/ and it's somewhat documented that compiling modules needs this /lib/modules/ stuff. That may be true for the ideal, simplified Red Hat world but in reality the machine and running OS version of the development machine is likely different from the box it will run on. Mixing development environment and install target only causes confusion. I don't know if real cross-compilation (i.e. for a different architecture than the compiler runs on) of modules is possible yet. If not, that's a bug too. 17. User-Mode Linux 2.5.58-1 Released 17 Jan 2003 (1 post) Archive Link: "uml-patch-2.5.58-1" Topics: User-Mode Linux People: Jeff Dike, Oleg Drokin Jeff Dike announced: This patch brings UML up to date with Linus (at least until he released 2.5.59 last night :-). There's nothing new here except for the updates to .58 - in large part, these are thanks to Oleg Drokin. The 2.5.58-1 UML patch is available at http://uml-pub.ists.dartmouth.edu/uml/uml-patch-2.5.58-1.bz2 For the other UML mirrors and other downloads, see http://user-mode-linux.sourceforge.net/dl-sf.html Other links of interest: The UML project home page : http://user-mode-linux.sourceforge.net The UML Community site : http://usermodelinux.org 18. Kernel Bug Database Version 2.0 Released 17 Jan 2003 (1 post) Archive Link: "[ANNOUNCE] Kernel Bug Database 2.0" People: John Bradford, Larry McVoy John Bradford announced: I've been working on it all day, and I've finally got version 2.0 ready and working, and put it on-line: http://grabjohn.com/kernelbugdatabase I've added a major new concept in this version - bug reports and confirmed bugs, (I.E. bugs which are being actively investigated by somebody with administrative access to the Kernel Bug Database), are now separate things. In other words, anybody can submit a bug report, but only designated people can collect those reports together into a confirmed bug. A bug report can be related to several confirmed bugs. Confirmed bugs can be read and commented on by all users. Thanks to Fergal Daly for submitting this idea, and I'd also like to point out that I read more or less the same idea in this email by Larry McVoy, where Jens mentions the idea of sorting a queue of new bugs into the real database. http://www.cs.helsinki.fi/linux/linux-kernel/2001-13/0084.html Hopefully this system also makes Alan's idea of keeping all bug reports for data mining, (also mentioned above), more practical. I haven't written any documentation for it yet, but hopefully it's fairly self-explanatory anyway. I've added a single confirmed bug, which relates to two vaguely related bug reports, (missing help text, and a missing comment), but anybody who wants administrative access to add and modify confirmed bugs, just drop me an E-Mail. 19. ntfsprogs 1.7.0beta Released 18 Jan 2003 - 20 Jan 2003 (8 posts) Archive Link: "[ANN] ntfsprogs (formerly Linux-NTFS) 1.7.0beta released" Topics: FS: NTFS, FS: ext2, Version Control People: Anton Altaparmakov, Pawel Kot, Jim Nance, Joshua Kwan Anton Altaparmakov announced: This is to announce the new release of the ntfsprogs package (formerly Linux-NTFS). This is a massive update featuring an almost complete rewrite of the ntfs library (the API should hopefully remain stable from now on) as well as several new utilities: ntfslabel, ntfsresize, and ntfsundelete. Note this is a beta release and can contain bugs. Please backup your data before using any of the utilities in write mode. You can download the source code as a tar ball or source rpm or binary rpms for intel 386 architecture from our website: http://linux-ntfs.sourceforge.net/downloads.html Or you can get the latest source from our bitkeeper repository by doing a: bk clone http://linux-ntfs.bkbits.net/ntfsprogs You can also browse the source code online here: http://linux-ntfs.bkbits.net:8080/ntfsprogs Joshua Kwan asked how well NTFS writing was supported, and Pawel Kot replied: Ntfsprogs is a library and set of utilities to do variuos things with the ntfs filesystem. It is not the kernel driver. And the kernel driver is what you give the write ability to the ntfs filesystem. And you are right -- the old driver in fact does not support writing (yeah, DANGEROUS means your filesystem will get damaged with very high probability). There exists a new ntfs driver called NTFS-TNG, which is present already in 2.5.x kernel series and it has its backport to the 2.4.x kernel series (you'll find it at http://linux-ntfs.sf.net/). This driver has no write support yet, but it allows you to overwrite the files, without changing their attributes and size (ie. mmap() the file, change the contents, write() the file). And the overwrite is considered safe. Jim Nance asked, "Is this stable enough to allow you to put an ext2 image on an NTFS partition and then mount that image as a r/w loopback mount from Linux?" Pawel and Anton both said yes it was; and Anton also said: This was the most desired item by people which is why I made sure it was the first thing to be implemented (it also happens to be the easiest thing to implement as it doesn't involve any changes to metadata at all). I consider that completely stable although there have been some reports of hangs but I have never seen one and everyone who has filed a bug report wasn't able to reproduce the hang on request so I am not really sure where the hangs come from... It might not even be the ntfs driver per se but a bad interaction between ntfs and some other kernel subsystem like the mm layer or the block layer. But I have only seen three reports of a system freeze so far and Mandrake who ship the new driver I would assume have more users than that and either they are not complaining or they are not having problems. I hope the latter. (-; In any case, even if there is a bug somewhere which causes the kernel to hang, no damage to the ntfs partition will occur from the new driver as it is now. It simply doesn't modify any metadata at all so it can't cause any damage. Pawel asked if the reported system crashes were with 2.4 or 2.5 kernels, and Anton said 2.4. 20. Compiling The Kernel With Non-GCC Compiler 20 Jan 2003 (7 posts) Archive Link: "Intel C++ compiler?" People: John Bradford, Ville Herva, Jun Nakajima, Jeff Garzik Henrik Andersen asked if Intel's C++ compiler would compile the Linux sources. John Bradford said he doubted it very much, since Linux made extensive use of GCC extensions. Jeff Garzik, however, said Intel's C++ compiler worked just fine on the Linux tree. Jun Nakajima confirmed this, and John said, "I'm suprised. Sorry once again for the mis-information, (heh, but at least it was on-topic, which is somewhat amasing for this mailing list :-) ). Is there a concious effort to make it compile the kernel, or are they aiming for general GCC compliance?" Ville Herva replied: I guess both. See http://lists.insecure.org/lists/linux-kernel/2002/Oct/6450.html Also, Intel has for long aimed to make icc on Linux to be as gcc compliant as possible. 21. Rewriting The SMP Parsing Code 20 Jan 2003 - 22 Jan 2003 (10 posts) Archive Link: "[PATCH] SMP parsing rewrite, phase 1" Topics: Power Management: ACPI, SMP, Version Control People: Andy Grover Andy Grover announced: The below patch against 2.5.59 is also available from ftp://ftp.kernel.org/pub/ linux/kernel/people/grover/, or bk pull http://linux-acpi.bkbits.net/ linux-smp-init . Before I spent any more time carving up mpparse.c, I just wanted to have the chance for feedback from others. This patch begins to draw a distinction between the structure of the MPS table's items, and the kernel's internal data structures. Previously, it made sense to just use MPS format throughout, but with the introduction of a second method to enumerate CPUs, IOAPICs etc. on x86 (i.e. ACPI), this really is no longer ideal. A clean, minimal interface for ACPI and MPS to report discovered resources will cut down on cross-module dependencies, shared global arrays, and will probably even reduce the kernel image somewhat. See below for more detail, but to sum up, I: 1. Renamed MPS-specific structs starting with "mpc_" to "mps_", to reflect their actual purpose. 2. Do the same thing for variables. 3. Created arch/i386/kernel/smpenum.c, for the enum-method-neutral APIs. To begin with, I have only implemented the new interface to replace MP_processor_info - the others will be done in a similar manner. 4. An unrelated ACPI init changeset sneaked in, sorry :) It's been tested on my machine in ACPI and MPS mode - obviously some more testing coverage would be nice. 22. Virtual Memory Documentation 21 Jan 2003 (1 post) Archive Link: "Linux 2.4 VM Documentation - Take 3" Topics: Virtual Memory People: Mel Gorman, Ingo Oeser Mel Gorman announced: This is the third draft at a pair of papers aimed at documenting fully how the 2.4 VM functions. I have made a large number of additions and corrections so I felt another release would not hurt even if I still have a few chapters to go. The most notable change is the introduction of a chapter on the boot memory allocator. The full list of changes as best as I can remember is listed at the end of this mail. It can be found in the various formats at Understanding the Linux Virtual Memory Manager PDF: http://www.csn.ul.ie/~mel/projects/vm/guide/pdf/understand.pdf HTML: http://www.csn.ul.ie/~mel/projects/vm/guide/html/understand/ Text: http://www.csn.ul.ie/~mel/projects/vm/guide/text/code.txt Code Commentary on the Linux Virtual Memory Manager PDF: http://www.csn.ul.ie/~mel/projects/vm/guide/pdf/code.pdf HTML: http://www.csn.ul.ie/~mel/projects/vm/guide/html/code Text: http://www.csn.ul.ie/~mel/projects/vm/guide/text/code.txt Any and all comments and corrections, especially on the bootmem allocator, are welcome. If there is some section that you feel is not covered in adequate detail or is omitted entirely, email me and I'll see what can be done. Fullish list of changes, can't remember them all :-/ * Added a chapter description how the boot memory allocator works * Added an explanation on the difference between mm_users and mm_count * Fixed the explanation on pages_min, pages_low and pages_high. The language was quite confusing the way it was and open to misinterpretation * Added sections on exception handling and how it applies to copying to/from userspace. Thanks go to Ingo Oeser for highlighting the importance and clarifying exactly how it worked to me (Thanks Ingo!) * Large number of grammar and spelling mistakes, thanks to all who sent corrections as I am useless at proof reading this document now, the list of people is too large to list * Corrected a part of the buddy allocator code commentary where a typo reversed the meaning of __GFP_WAIT * Fixed a section where it is explained why 64GiB is an impractical amount of memory because of ZONE_NORMAL pressure. I calculated the amount of memory needed for mem_map wrong (Thank you Jean Francois Martinez) * Fixed some call graphs where the order when traversed depth-first did not match what was in the code due to a bug in gengraph. New release of gengraph is out which works with recent 2.5 kernels and fixes the traversals * Various other bits and pieces I can't recall 23. Linux Security Module 2.5.59-lsm1 Released 21 Jan 2003 (1 post) Archive Link: "[ANNOUNCE] 2.5.59-lsm1" Topics: Version Control People: Chris Wright, Stephen Smalley Chris Wright announced: The Linux Security Modules project provides a lightweight, general purpose framework for access control. The LSM interface enables security policies to be developed as loadable kernel modules. See http://lsm.immunix.org for more information. 2.5.59-lsm1 patch released. This is a rebase up to 2.5.59 as well as some minor interface and module updates. Out of tree projects will want to resync with interface changes. Full lsm-2.5 patch (LSM + all modules) is available at: http://lsm.immunix.org/ patches/2.5/2.5.59/patch-2.5.59-lsm1.gz The whole ChangeLog for this release is at: http://lsm.immunix.org/patches/2.5/ 2.5.59/ChangeLog-2.5.59-lsm1 The LSM 2.5 BK tree can be pulled from: bk://lsm.bkbits.net/lsm-2.5 2.5.59-lsm1 * merge with 2.5.53-59 (GregKH and me) * remove inode_post_lookup hook, add d_instantiate hook (Stephen Smalley) * email addr updates (Stephen Smalley) * merge with mainline ipc updates (me) * Fix ipc merge whitespace diffs (Stephen Smalley) * DTE: fix compilation errors (Stephen Smalley) * SELinux: restore sem_semop (Stephen Smalley) Sharon And Joy Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.