Latest | Archives | People | Topics
Latest | Archives | People | Topics
Latest | Archives | People | Topics
|Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic|
Table Of Contents
|1.||17 Jul 2000 - 24 Jul 2000||(4 posts)||Approaching 2.0.39|
|2.||17 Jul 2000 - 25 Jul 2000||(6 posts)||Status Of Asynchronous I/O|
|3.||18 Jul 2000 - 26 Jul 2000||(3 posts)||DAC960 RAID Problems With Latest Development Kernels|
|4.||19 Jul 2000 - 28 Jul 2000||(389 posts)||IDE Flamewar|
|5.||20 Jul 2000 - 26 Jul 2000||(9 posts)||Making General Use Of A PCI Card's On-Board RAM|
|6.||20 Jul 2000 - 27 Jul 2000||(16 posts)||Linus On Unions|
|7.||20 Jul 2000 - 26 Jul 2000||(12 posts)||'Virgin Connect' Violates GPL|
|8.||23 Jul 2000 - 28 Jul 2000||(5 posts)||Developer Interaction|
|9.||23 Jul 2000 - 30 Jul 2000||(20 posts)||Status Of 2.4 To Do List; Kernel Bug Tracking System|
|10.||23 Jul 2000 - 26 Jul 2000||(5 posts)||Draft Press Release For 2.4|
|11.||25 Jul 2000 - 26 Jul 2000||(7 posts)||Simulating SMP Under UP Systems|
|12.||26 Jul 2000 - 27 Jun 2000||(7 posts)||Some Explanation Of Hard IRQ Atomicity|
|13.||26 Jul 2000 - 28 Jul 2000||(10 posts)||Reading Files From Device Drivers|
|14.||26 Jul 2000 - 29 Jul 2000||(2 posts)||Status Of CML2|
|15.||27 Jul 2000||(1 post)||Linus Announces 2.4.0-test5|
|16.||30 Jul 2000 - 31 Jul 2000||(15 posts)||Status Of zImage|
Thanks go to Leena Heino and Tom Davey for catching an HTML bug that caused an end-comment tag to show up where it wasn't wanted. Thanks, folks!
Some folks have written me asking how to find the latest KC news now that the right nav bar is gone. These can all still be found on the Linuxcare Homepage or the Kernel Cousin News Page.
Mailing List Stats For This Week
We looked at 1441 posts in 6470K.
There were 432 different contributors. 194 posted more than once. 150 posted last week too.
The top posters of the week were:
1. Approaching 2.0.39
17 Jul 2000 - 24 Jul 2000 (4 posts) Archive Link: "pre-patch-2.0.39-6"
Topics: Disks: IDE, FS: ext2
People: Alan Cox, Ville Herva, David Weinehall, Rask Ingemann Lambertsen
David Weinehall initially took over the 2.0 series with Alan Cox's blessings in Issue #48, Section #5 (13 Dec 1999: Major Security Hole In 2.0.x!! Alan Hands Off The 2.0 Tree To David Weinehall!!) . He gave a status report in Issue #53, Section #1 (4 Jan 2000: Discussion Of The Development Process) , and released 2.0.39-pre4 in Issue #66, Section #8 (30 Apr 2000: 2.0.x Development Continues) , and in Issue #71, Section #7 (27 May 2000: Backporting Filesystem Fixes To 2.2/2.0) revealed that he was willing to backport fixes from 2.2 and 2.4 if it seemed worthwhile.
This week he announced 2.0.39-pre6; apparently -pre5 had some IDE breakage. He intended pre-6 to fix only those breakages, deferring other fixes until -pre7 and later. Two replies came to this. Ville Herva reported an oops with the old 2.0.37, to which there was no reply; and Rask Ingemann Lambertsen reported that he couldn't find 2.0.39-pre6 on any of David's directories in the kernel archive repositories. David replied that he'd messed up somehow, but that the patch should be available now. End Of Thread.
2. Status Of Asynchronous I/O
17 Jul 2000 - 25 Jul 2000 (6 posts) Archive Link: "Async disk i/o in Linux"
People: Ramon Garcia Fernandez, Stephen C. Tweedie, Jeff V. Merkey, Ingo Molnar
This previously came up in Issue #31, Section #1 (11 Jul 1999: Operating System Ideas Discussed) .
This week, Ramon Garcia Fernandez said, "Traditionally, Linux only supports async i/o with sockets and terminals. You can use select on several handles to wait until pending input is available on any of them (not exactly async); or you can use fcntl/sigio and the kernel will tell you when input is available." But he added that a remark from Ingo Molnar on Slashdot had seemed to suggest that asynchronous disk I/O with these tools ('select' and 'fcntl/sigio') was also possible in 2.4; he asked if this were true, and Stephen C. Tweedie replied, "The kernel has async block device and file IO support already --- it uses that sort of thing for file readahead, for example. There isn't an exported async IO api for user applications yet, although we're working on one. Kernel internal modules such as TUX can access the async properties of the page cache if they want to." And Jeff V. Merkey added, "NWFS has a very good Asynch IO API for Linux AIO. You can get the code at 22.214.171.124. I already handle all the buffer head allocations and returns and locking issues with the Linux kernel. The file block.c has generic functions for all Linux versions 2.0/2.2/2.4 with a consistent asynch IO interface that runs across all Linuxes. The code is all GPL, so you can pull it apart and use it anyway you want."
3. DAC960 RAID Problems With Latest Development Kernels
18 Jul 2000 - 26 Jul 2000 (3 posts) Archive Link: "2.4.0-test4 and DAC960"
Topics: Disk Arrays: RAID, Ottawa Linux Symposium
People: Jens Axboe, Leonard N. Zubkoff, Matthew Dharm
Matthew Dharm reported that his DAC960 RAID controller had stopped working somewhere between 2.4.0-test1 and 2.4.0-test4. During boot, as soon as the kernel began to initialize the controller the machine would oops with a null pointer dereferenced in the swapper. Matthew couldn't capture the oops, but he explained that it happened every time the kernel tried to check the partition on boot before mounting it, and it also happened regardless of whether the driver was compiled directly into the kernel or only as a binary. Jens Axboe replied, "This is a known problem. It happens because the DAC960 driver tries to use the block queue _before_ calling blk_init_queue(). I have informed Leonard about the problem, so he is aware of it and will fix it. I saw him at OLS today, so it probably won't get fixed within the next couple of days ;)" and Leonard N. Zubkoff posted a patch and explained, "The DAC960 driver in 2.4.0-testN is currently out of date and I will be sending an update as soon as I verify that the next release works. The above problem was actually already fixed in the 2.4.6 driver on my web page, but changes to 2.4.0-test4 broke compilation. If you need a temporary fix for the moment, please apply the following patch to the 2.4.6 driver from my web page." End Of Thread.
4. IDE Flamewar
19 Jul 2000 - 28 Jul 2000 (389 posts) Archive Link: "Wide spread test needed before 2.4.0 is released."
Topics: Disks: IDE, Disks: SCSI, Microsoft, Security
People: Andre Hedrick, Chris Kloiber, Horst von Brand, Oliver Xymoron, Stephen Frost, Bob Taylor, James Sutherland, Linus Torvalds, Victor Khimenko, Bartlomiej Zolnierkiewicz
This was a long and bitter series of threads, in which Andre Hedrick was very upset and agitated through pretty much all of it. He had noticed that a benevolent root user writing to an IDE drive (something one might not generally expect to be dangerous to hardware), could wipe or even physically break the drive, if a small miscalculation were made in the data sent to the drive. He reported that this was true for all versions of Linux, and over the course of threads added that since the data in question was only a few bytes long, any simple exploit such as a small buffer overrun, could be enough for "Joe six-pack hacker" to take advantage of the weakness and destroy the drive.
However, in order to avoid giving away the details of how to do this, Andre felt the need to be somewhat oblique in his initial descriptions of the problem, and as a result many folks got the idea that he was simply concerned with security, to which they replied that there were so many ways for root to trash a system already, that it wasn't neccessary to include Andre's patch in 2.4; 2.5 would be soon enough.
Linus also rejected the patch on the grounds that the changes it made to the guts of the driver were too extensive for a code-freeze. Andre felt that this was not only a true fix, but a very important one that could significantly embarrass Linux as being a 'trash your hardware' OS. At one point he argued:
even priviledged users can not be allowed to send disk level distructive commands.
The subsystem needs to protect itself and the hardware from general abuse. In order to do that you must construct a superset (or complete set) of commands that to get blanket access must be a complied option and user is at their own risk. The default should be able to accept all commands and reject the harmful ones. Currently you could issue a modified setfeature used in configuring drive speed and destroy the drive or device.
The object is to protect this from happening even if you are ROOT or have stolen ROOT priviledges.
Does this help explain the issue or should I provide a "disk2brick.c" program to make the point clearer? This will vaporize a drive to the replacement level. Yes you can to that today!
Later he amended, "I used a bad word of choice. I can not DISKTOBRICK. But can genrate code that will attempt and may have success. Regardless do you want access to such attempts available in the kernel? Unchecked?"
At one point Chris Kloiber confirmed Andre's report and the patch to fix it, saying, "I know it's true. I have run the disk-destroyer program. Twice. I compiled a 2.4.0-test5-pre2 kernel with an earlier version of Andre's patch and actually ran the disk-destroyer program as a test. Andre specifically did not include the fry-your-drive codes in the test program, but on the first try it sucessfully hosed the MBR and partition table (my /dev/hda1 was/is swap, so I don't know how much of that went bye-bye too). After a fresh install of Red Hat 6.2, I (glutton for punishment that I am) recompiled 2.4.0-test5-pre2 with the latest patch (has it been restored to www.kernel.org?) and now my system can survive the disk-destroyer (The drive still makes god awful noise when run, but no permanant damage occurs). I captured the output of the disk-destroyer and sent it to Andre with a list of my hardware (mobo & drive models) so he could further refine the patch."
This and other reports were brushed aside by various folks, who reiterated that if someone had root access, they could damage the system however they pleased. Horst von Brand replied to Chris, "That "disk-destroyer" just overwrites the partition table, something that can be done in many easier ways. No real proof of the existence of the mythical DISK2BRICK ATA command (that would in any case work only for certain brands/models) that would make all this a real threat has been put forward. No Windows/DOS virus/troyan has made use of this mythology either (and there _have_ been instances of destroyed hardware, or close enough, in recent history)."
Eventually Andre started replying "Thanks for the vote of no confidence" to his patch's detractors, and began calling for the discussion to end. The discussion continued however, and various other arguments were put forward. Some folks pointed out that it was really the hardware that was broken already, if it allowed a simple out-of-spec command to destroy it. Andre countered this by saying that even if the hardware was poorly designed, the kernel should prevent itself from accidentally taking advantage of that.
The discussion raged back and forth for a long time, with neither side giving an inch, and in many cases talking directly past each other. Many people failed to grasp the scope of Andre's report, while Andre in turn did not appreciate the counter arguments folks gave him. To give an example of the kind of misunderstandings that ran rampant throughout the entire series of threads, at one point Andre, extremely agitated by this time and very frustrated, asked, "Why is this such a big fight? Does everybody want the kernel to be able to screw itself just by blinking?" To which Oliver Xymoron replied, "No, of course not, but we also don't want to make large changes to the kernel to paper over a hole that we can't cement closed." I interpret Oliver as referring to the fact that root will always be able to do lots of damage, and that to take away just one of those possibilities was to "paper over" the problem, since to truly "cement" it would involve putting many more restrictions on root. But as far as I can see, Andre misinterpreted Oliver as saying that Andre's initial bug report was the one that could not be cemented over. And to that idea, Andre replied, "Here is you damn steel-plate-of-armor! I can close this hole completely and you will never know the difference."
Eventually Andre boiled over completely, giving this one-line reply to several different posts: "Here is your SECURITY HOLE! JOE-SIX-PACK-HACKER can fry your butt." To one of these, Stephen Frost replied, "Will you *please* wake up and realize that it makes no difference if it can fit into a buffer overrun or not? That does not mean *anything* since you can almost always fit the code neccessary to gain a root shell to the machine in the same space."
At one point, Andre threw up his hands and said:
At this point I have no interest in publishing a finished patch to TASK and allow people to have fuller and more robust control over the kernel.
Basically screw the project!
But would you like to know the predictive write caching of a given drive to optimize things like the BLOCK_ELEVATOR, basically make it intellegent. You know I can do this for SCSI but screw that project too!
At another point, Andre said:
Screw your hardware! Since all of you are so damn smart and know the ATA-ATAPI specification, you are not sensible enough to see that I showed you that it can be violated. This violation of the standard, which none of you have the membership to vote on, is what I was trying to point out.
The reason why you can not do this level of destruction in Microsoft is that it does not have a HOLE the size of a MAC TRUCK! There is no one to blame for this, but I should have found it earlier. Now that it is found something needs to be done.
Since most of you POMPUS SCSI users are down playing the point you have no room or business to comment on a subsystem that you do not use. I base this because if you were using ATA you might think twice before calling NIMBY. It has been clearly pointed out that you have a problem of the same level, but the hole is smaller and harder to work through.
Since 90% of Linux depends on the ATA subsystem being stable and safe, all I will claim is that it is stable.
Nice to see that those with the biggest mouths fighting me on the issue are as big an ASS as me pushing the issue. I am glad to have the company of ASSHOLES as big as me debating the issue.
A lot of people told Andre at various points throughout the thread to calm down. Bob Taylor was one, and advised, "BTW, you seem to lose your temper all to easily. Count slowly to (some number) while taking a deep breath between each count. :-)"
At one point Victor Khimenko, after being entirely opposed to the patch, admitted that for some extremely rare configurations, the patch might be valuable, though he doubted that anyone actually used systems with that configuration. Bartlomiej Zolnierkiewicz felt there might be some out there somewhere, but Andre replied:
Let them eat cake ............
The patch has been pulled .........
Linus Torvalds got into the debate toward the end, when James Sutherland observed, "IF capabilities can be used to block this (and similar), and Andre's "sanity checking" for ATA is added, then surely it *IS* possible to prevent root screwing the HDD (without replacing the kernel, at which point all bets are off, of course)." Linus replied:
What's the point?
If the system is secure, then adding sanity checking to the ATA code makes no difference: nobody gets to do anything improper anyway.
If the system is not secure, then adding sanity checking to the ATA code makes no difference: people who could use the ATA thing can use other things that are much more insidious.
The mechanism that everybody wants is _already_ there. It's called "permissions". No new driver code necessary.
If those permissions do not work, then they don't work, and adding last-minute band-aids makes no difference.
Just as a comparison, look at Windows. It takes the opposite approach: it has no real seurity, but a LOT of band-aids to avoid the "obvious" holes. Leaving it wide open.
To this, James replied, "The "security" aspect of this is largely a red herring, I suspect; at best, fixing this will make a malicious root marginally less damaging. The real issue is just that the kernel is accepting unvalidated parameters from userspace, and shooting itself in the foot with them. MS took a fair bit of flak, IIRC, for doing this with WIN32K.SYS in NT4. Do we now expect higher standards of design from NT than Linux? :-)" To which Linus replied:
The OS doesn't even know what the commands do. They are undocumented. And they vary from drive to drive.
How do you expect the OS to validate the drive firmware update commands for every drive manufacturer?
In short, should we
- know every single drive, know every command it can take, and do all of this inside the OS
OR should we
- move this policy into user space, and potentially have programs that know what different drives can do, and upgrade them the proper way
I think the thing is fairly clear. If you want "the OS" to validate the parameters, then you should create a user-mode program that validates the thing.
Basically, Linux already validates everything it _can_ validate. Sure, it could also verify that only "approved" commands are sent, but what about the undocumented yet potentially useful ones?
Let's take a hypothetial example (you judge on just _how_ hypothetical it actually is): imagine that you have a drive that can be made to refuse to read certain removable media based on where the drive was purchased. Imagine that this was actually done in firmware, and that there was a way of overriding it. Imagine further that you moved, and you wanted to make the drive read certain removable media in the new location, using undocumented commands..
Should the kernel block those commands because it doesn't know what they do?
Or should the kernel assume that "Oh, he has the permission to do this, then sure, I'll let him do it..".
Note that everybody has gotten very lathered up about the fact that you can kill certain hardware. Guess what? This is neither new nor very exciting. Look at your XFree86 configuration file some time, and read the warnings in the documentation. And ruminate on it. A monitor can be quite a bit more expensive than your harddisk.
Firthermore, destroying your harddisk may be the most _polite_ thing that can be done to you. Quite frankly, I'd personally rather have a dead harddisk than have all my data on that harddisk be siphoned back to an intruder.
A dead harddisk you might get a refund for under the warranty. Your credit card information (or your browsing habits or copies of your personal emails) made available all over the place might be more of a bother.
"There are worse things than death". Even with harddisks.
Andre replied to this with a third choice in addition to Linus' two above:
Or validate the commands listed in the SPEC for the default enable interface, and grant unrestriced access to all command with a compile option plus a sysctl flag that defaults in the off.
You will then no longer violate the usage of the SPEC by default. You will preserve the drive warrenties, and protect the commerial distribution market place for being liable for intent.
Specifically, it is public record that a distribution shipping without a taskfile filter in ATA and SCSI (I have this now) to carry copablity of knowingly with intent to sell product that allows no protection for improper calls to the hardware.
All you have to do is provide a default approved filter, and include the non-default option to disable the filter to not violate unix policies of no-policies and cover linux and distrubtors from issues that can be deemed actionable.
Upon the user disabling the command-filter, everyone is cleared of any and all issues that can be deemed actionable, and make the individual responsible for their actions and not you or the maintainers copablity.
This is all I want.........not to be sued in the future for decisions that are not mine to be made.
I can live with that. The simple switch-statement that you posted as an alternative "minimal" approach would work for me. Preferably with an opt-out (sysctl-like thing).
What I do NOT want to see is big re-organizations around this, especially considering that this is not something new. Simple.
This satisfied Andre.
5. Making General Use Of A PCI Card's On-Board RAM
20 Jul 2000 - 26 Jul 2000 (9 posts) Archive Link: "adding physical memory-pages to Linux' pool-of-pages"
People: Timor Tabi, Mads Bondo Dydensborg, Folkert van Heusden, Jamie Lokier, Timur Tabi, Jeff Garzik
Folkert van Heusden wanted to use the on-board RAM from his PCI card as regular RAM. Jeff Garzik remarked that this was likely to be very slow compared to regular RAM; Timur Tabi agreed, and added, "if you insist, ... the easiest way is to create another Zone, and modify the zonelist structures to use that Zone as a backup zone for ZONE_NORMAL. Then, whenever you run out of normal RAM, it will allocate your PCI RAM." He also suggested moving the discussion over to the 'mm-linux' mailing list.
Lenart Gabor also replied to Jeff, suggesting that the PCI RAM might make good swap space, and Mads Bondo Dydensborg replied:
I know that SCIos (http://sci-serv.inrialpes.fr/) does something like this: They use SCI networks to do distributed shared memory in a cluster. Then, if a node in the cluster wants to swap, it first checks if there are free memory in the distributed shared memory. If there is, it swaps to these pages. AFAIK they have found that it is quite a bit faster then disk. The SCI controllers are on the PCI bus, but the memory is main memory of a remote node. This means that they will have to travel 2 pci busses, the network and the main memory bus of the remote node. Only going through the local pci bus should be quite fast compared to this. (Slow compared to main memory, yes).
SCIos is based on Linux (is a set of patches) - maybe there are some code in there that could inspire someone.
Folkert also replied to Jeff, saying, "it's less slower than swapping to harddisk. Especially for the system I want to implement this. That one has harddisks doing 1MB/sec. I think access trough PCI can do better then that." Jeff replied that using the PCI RAM as swap was perfectly feasible, and mentioned that patches had appeared on 'linux-kernel' at one time or another, though he couldn't give an exact reference.
Folkert mentioned that Linux was aware of several different kinds of RAM, and suggested that he might try to set the PCI RAM to the "slow" variety (he also mentioned that the patches Jeff referred to might be Folkert's own previous attempt at this). Several days later, Jamie Lokier explained:
The kernel has "zones" for different kinds of RAM that can be accessed in different ways, but it still assumes they are more or less the same speed. This means that if the PCI RAM were allocated for a user space page, it would stay continue using the PCI RAM even if it's used heavily.
If instead you use the PCI RAM as a ramdisk and swap to it, any program which makes heavy use of a page will cause the page to stay in main RAM where it should remain.
The thread ended there.
6. Linus On Unions
20 Jul 2000 - 27 Jul 2000 (16 posts) Archive Link: "[patch-2.4.0-test5-pre3] struct inode shortened"
People: Linus Torvalds, Tigran Aivazian
Tigran Aivazian felt that by putting some inode data in a union, 4 bytes could be saved from the inode structure on 32-bit machines, and 8 on 64-bit machines. Linus Torvalds didn't think the saved space would be noticable enough to warrant the change, and remarked, "You'd save more by making the quota stuff go away when quotas aren't enabled.." Later he explained his reluctance more fully:
Unions are imho only acceptable when they implicitly know their own type. The in-kernel example of this is the inode per-filesystem thing. An inode has a filesystem-specific part, and there is no way any other filesystem can access it except by a major bug somewhere.
Any union that needs code like
if (xxx->type == yyy).. ....
is a design mistake.
I don't think you'll find all that many unions in Linux. And I don't think we should add new ones..
7. 'Virgin Connect' Violates GPL
20 Jul 2000 - 26 Jul 2000 (12 posts) Archive Link: "Linux GPL violations."
Topics: BSD, Clustering: Mosix
People: Joseph Elwell, Mike A. Harris
Joseph Elwell asked which was the best forum for reporting GPL violations regarding the Linux kernel. He reported that Virgin Connect sold a web device that shipped with Linux preinstalled, but that there was no mention of this fact, the GPL, or any location of sources in the docs. He gave a link to a discussion board where the problem was discussed, and the the VirginConnectMe home page; as well as the Merinta home page, the company selling the machine to Virgin. From the Merinta press release he verified that the machines were running Linux. Finally, he concluded, "I like the idea of all these new Internet devices coming out, running Linux. But it worries me that they'll all ignore the GPL as they go. Making it more difficult for fututre improvements in the kernel code."
Mike A. Harris replied that the best place to report that kind of thing was linux-kernel. He went on:
It's the best place indeed. It might not be the most appropriate place, but if you are looking for results, it generally reaches the widest audience of insane bigmouths (and I don't exclude myself) that will fight to no end arguing about it and causing a big public stir. This causes it to get seen on Slashdot and the rest is history. If the FSF doesn't jump in, they likely will eventually, or someone will try and establish communications with the offending vendor.
It usually results in one of:
Either way, a huge pissing match occurs that goes on for at least a month or so, but it usually ends up getting the job done, so it is worth it.
The pissing match did not occur in this thread, which petered out fairly quickly. Other discussions of GPL violations were covered in Issue #8, Section #13 (27 Feb 1999: Possible GPL Violation By Mosix) , Issue #50, Section #3 (23 Dec 1999: Kernel-Based Windowing For Embedded Systems; License Debate) , Issue #59, Section #6 (8 Mar 2000: Some Kernel Files Use BSD License) , Issue #72, Section #13 (6 Jun 2000: ABIT Violates The GPL -- Again) , Issue #74, Section #14 (23 Jun 2000: Anonymous Poster Claims GPL Violations Are Accepted By Linux Community) , and Issue #76, Section #1 (21 Jun 2000: Lucent Violates The GPL) .
8. Developer Interaction
23 Jul 2000 - 28 Jul 2000 (5 posts) Archive Link: "[PATCH] Remove extra shift in __SI_CODE macro"
People: Robert H. de Vries, Linus Torvalds
Robert H. de Vries posted a patch and reported, "The __SI_CODE macro shifts its argument 16 bits to the left while the only argument used is already shifted 16 bits to the left. In this way no bits are left on a 32 bit architecture. Hence this patch, which removes the superfluous shift." Linus Torvalds replied, "Sounds like the _users_ of __SI_CODE should be fixed, rather than __SI_CODE. The new definition of __SI_CODE doesn't make much sense, in my opinion." But Robert explained, "If I would change the user, that would trigger an avalanche of changes, while my proposition is limited to just one definition." He gave a single (lengthy) example, and concluded, "I thought I was being clever in finding the simplest patch. But then again you decide." Linus replied, "After having looked at that mess, you're right. Let's just change __SI_CODE."
9. Status Of 2.4 To Do List; Kernel Bug Tracking System
23 Jul 2000 - 30 Jul 2000 (20 posts) Archive Link: "status page?"
Topics: Bug Tracking, Compression, Disk Arrays: RAID, Disks: IDE, Disks: SCSI, FS: FAT, FS: NFS, FS: NTFS, FS: UMSDOS, FS: devfs, FS: ext2, FS: procfs, FS: ramfs, I2C, I2O, Networking, PCI, Power Management: ACPI, Real-Time, SMP, Samba, Security, USB, Virtual Memory, VisWS
People: Alan Cox, Linus Torvalds, Theodore Y. Ts'o, Peter Enderborg, Steve Dodd, Andi Kleen, Alexander Viro, Kai Henningsen, Steven Walter, Derek Martin, Rogier Wolff, Andrew McNabb, Mikael Pettersson
Derek Martin asked if there were a web page tracking the list of things needed before 2.4 could come out. Alan Cox posted his latest list of items, which he said was out of date but still useful. The last time he posted his list was in Issue #74, Section #1 (25 May 2000: Latest List Of Things To Do Before 2.4 Can Come Out) . This week Alan added, "Dont mail me updates, find someone else to maintain it." Linus Torvalds replied:
Note that this part of the status page really is worth some attention.
Alan doesn't have the time to maintain 2.4.x issues. Which means that right now nobody is really doing it. I have a kind of general "map" in my head, but that isn't nearly good enough..
Would somebody be interested in taking this over?
Note! It's more than just maintaining a list. It's realizing what are show-stoppers, and what are just problems that should be fixed. It's trying to work with people in trying to get enough information to get a good list of problems, but it's also trying to confirm that the problem still exists or is fixed.
It's also possibly maintaining potential fixes: keeping track of who has suggested what fix (possibly with an actual patch). Sometimes the fixes aren't actually usable as-is due to other issues, but even when they are not usable they can be supremely important for people who try to fix the same thing in a acceptable manner.
It's also good if you're respected by people already, or at least think you can work with people without making them hate you.
In short: it's important. And it's something that Alan did for both 2.0 and 2.2. Nobody can quite live up to "being Alan", but hey, if you don't like long beards you may be just as heppy knowing that.
Theodore Y. Ts'o volunteered, with, "I'm willing to take this over. Note that unless Alan can give me his secret of making clones of himself, I can't necessarily read every single message on linux-kernel; so folks will need to send me mail explicitly if they want to make sure I'm going to read it. I'll try to skim postings on linux-kernel, but like most kernel developers, that isn't necessarily the most reliable way of making sure I'll read something." Kai Henningsen suggested putting the list up on a web page for easy reference. At this point, Linus said:
A number of people have suggested to me that using one of the automated bug-tracking systems would be nice.
I don't like using them myself, but some people do. Whoever would be the maintainer (and hey, Ted has been around since pretty much the beginning of Linux, so I'd certainly be inclined to trust him) would make his own choice on _how_ he does it. A web-interface sounds like a good idea. It boils down to how much work/help it is, and personal taste of the developer in question.
There was no reply here, but under the Subject: 2.4 status page, lightly updated, Theodore reported:
OK, this is my first attempt at publishing Alan's 2.4 todo list. It's been lightly updated so far, but I *know* that there's large parts of this list which are out of date.
So if you know that a particular problem has been dealt with, could you please let me know? Especially if your name appears next to the item as the one who reported the problem, or the kernel subsystem maintainer who's working on the issue.
I'll be publishing another version which will hopefully be a bit more up to date, and better at reflecting reality.
He posted the list:
Linus replied to item 2.1 (Capable Of Corrupting Your FS: "E820 memory setup causes crashes/corruption on some laptops[**VERY NASTY**]"), saying:
This should be fixed as of -pre5. I finally got a hold of a laptop that did this.
Please, anybody that had random hangs/oopses that went away if you specified the memory size by hand with the "mem=xxx" boot option, test if 2.4.0-test5 works without the workaround..
Peter Enderborg replied to item 14.70 (Fixed: "Loopback fs hangs"), giving an exploit:
Sorry to say this. But the fs loopback is not fixed, or this some other bug fixed but than i have shown before. This an example. This kills my 2.4.0test5 system, an Intel PII SMP box. Im out of the office so I can only verfiy that i still exist on my machine.
This work is 2.2.14, but test[2-5] systems stops running without any messages.
Create an big image.
dd if=/dev/zero of=/someware/dosfs.img bs=64k count=10000
Create an msdos filesystem on the image. (Im using dosfstools-2.4) I have also done this with ext2 and get the same results.
mount -o loop /somewhere/dosfs.img /mnt/mymountpoint
Do the bad thing...
dd if=/dev/zero of=/mnt/mymountpoint/bigfile.foo bs=64k
After a while the system can not create process and dies. A difference with 2.4.0test4 is that the filesystem is less corrupted with test5. And sync command not complete during the dd on the loopback device.
Steve Dodd replied, "There are several deadlock issues, and only one (the tq_disk problem) has been fixed so far (by disabling plugging). I spent sometime looking at this, and I can't see any obvious 'quick fix' solutions."
Elsewhere, Andi Kleen replied to item 3.1 (Security: "Fix module remove race bug (mostly done - Al Viro)"), saying, "It is not at all done, there were some quick hacks for some file system objects, but the general problem for zillion of other modules is still unsolved."
Alexander Viro didn't reply to this directly, but he replied to the same item (and a number of others) in Theodore's list. To item 3.1, he said, "Already done: everything that exports file_operations, fb stuff, procfs stuff. Not yet: TTY, ldisc, I2C, video_device. Hell knows: network devices."
Alexander also reported that item 5.1 (In Progress: Dcache threading (Al Viro)) was done. He also reported for item 8.4 (To Do: Devfs races, Sockfs (removing NULL ->i_sb stuf) (Al Viro)), "Sockfs - done. Removing bogus checks still isn't. Devfs races... somewhat went down, but there's still a lot of them."
To item 8.8 (To Do: Audit all char and block drivers to ensure they are safe with the 2.3 locking - a lot of them are not especially on the open() path.), he remarked, "What about the open() path? They have (or grab) BKL there, so it should not be something special. Now, read() and write()..." Alan replied, "They dont all get open right either ;) And yes a lot get read/write wrong or assume incorrectly that single opener == single threaded read/write." Alexander was surprised by that, and wondered aloud where he had "fscked up", but Alan replied, "You didnt. These have been broken since 2.0 or earlier. Especially the read/write assumptions. These arent things Al broke but things that never worked right ever."
Elsewhere, under the Subject: Updated 2.4 status/todo list, Theodore posted the latest list, and gave a URL to its location on Sourceforge. He also said:
everyone and his brother has approached me suggesting that I use their favorite bug tracking system. The answer which I'm giving folks is that for now, trying to get the information into a sane state is far more important than the mechanism of which bug tracking system to use. At the moment, the hardest parts of the problem are:
There was a bit more discussion.
10. Draft Press Release For 2.4
23 Jul 2000 - 26 Jul 2000 (5 posts) Archive Link: "Linux 2.4 - press release?"
Topics: FS: NFS, Networking, Power Management: ACPI, SMP, Small Systems, USB
People: Joe Pranevich, Daniel Stone, Jim Dennis, Albert Cahalan, Linus Torvalds
Joe Pranevich announced:
I'm not sure if this has been discussed yet, I'm not completely caught up with the current kernel list. However with the pending release of Linux 2.4, it may be time to bring this up again.
I've created a rough draft (intended for discussion only!) of a press release for Linux 2.4, if we decide to go with one. Greg Smart and Albert Cahalan attempted this for Linux 2.2 but I don't think anything came of that work -- now I've decided to step up and make a fool of myself for Linux 2.4. :)
Please take a look at this and let me know what you think. There's probably someone on this list who actually knows what to do with a press release, it's quite a bit beyond my experience. Feel free to edit this as appropiate.
Oh, and I stuck a small plug for my "Wonderful World" document at the end. For a final version, we would want to point that at a good resource for more information about the release-- I'm a bit biased in this respect.
The 2.2 press release was covered in Issue #2, Section #4 (14 Jan 1999: Future Press Release For 2.2) , where it had a fairly bad initial reception, and by Issue #3, Section #11 (20 Jan 1999: 2.2 Press Release Seen Early On Slashdot) the working draft had been unexpectedly posted to Slashdot, without having gone through Linus first.
Not discouraged, this week Joe posted his draft:
LINUX KERNEL 2.4 (DRAFT!)
The Internet, August XX, 2000
Linus Torvalds and the kernel development team would like to announce the immediate availability of Linux 2.4, the latest revision of the popular open source operating system kernel. This update brings increased scalability and performance to all Linux users, in addition to new hardware support.
This update is already available to advanced users through multiple kernel mirrors (ftp.kernel.org) Although no timeframes have been announced at this time, all current Linux distributors will be updating to this version of the kernel within the next several months.
Linux is developed by a online team of programmers headed by Linus Torvalds, a resident of Santa Clara, CA. Linux has been developed using the open source methodology which provides for source code and peer review at all stages of development. It is because of this system of openness that Linux has grown to be the most successful non-corporate operating system to date.
For more informatoon, please consult www.linux.org for a list of other Linux-related websites. More information on the new features in Linux 2.4 can be found in the "Wonderful World of Linux 2.4" document, available ...
Linux is a trademark of Linus Torvalds.
Daniel Stone also replied to Joe:
I think you should mention:
USB and FireWire - possibly appealing to the multimedia market, etc? Note the high proliferation of USB Zip drives, etc, so you could tout this as how it's great that we've got all the k-whrad standards.
Netfilter - I think I mention this to everyone I talk to about Linux 2.4. Stateful inspection, totally extensible, separate conntrack/NAT.
Jim Dennis also replied to Joe with lengthy commentary on the press release. To item 1, Jim objected to the phrase "largely rewritten" as being a bit of an overstatement. He explained, "The SMP from 2.0 to 2.2 was "largely re-written." Most of the SMP locking code through 2.3.x has been refined. I gather that the network devices and code path have been "unserialized.""
Jim also moved item 4 to be just below item 1, since "unserialized net is more interesting in the server/enterprise context." He then added several items under that one:
To Joe's item 2, Jim added, "Let's not forget StrongARM/ARM, MIPS, 68k (including the Coldfire and Dragonball processors in that family). These are used in many embedded systems (including many PDAs). (Palm Pilot is Dragonball, moving to StrongARM?)"
After Joe's item 3, Jim added several more bullet points:
To Joe's item 6, Jim remarked:
The Linux developers sarcastically acknowlege Mindcraft and thank them for their efforts in helping us identify the need for these improvements.
Also the "kernel-level web server" is better described as a kernel level web "accelerator." That is more accurate, and it is less likely to scare off business users who have already chosen a web server/daemon.
Finally, after Joe's item 7, Jim added a final bullet point (though he acknowledged it to be an oversimplification):
There was no reply.
11. Simulating SMP Under UP Systems
25 Jul 2000 - 26 Jul 2000 (7 posts) Archive Link: "synthetic parallel processing"
People: Jeff Dike, Richard B. Johnson, Borislav Deianov, Rik van Riel, Bartlomiej Zolnierkiewicz
Aaron Macks asked if it were possible to fool a UP kernel into behaving like an SMP system; in other words, he asked, was it possible to have the scheduler keep two queues and alternate them? Jeff Dike replied, "There isn't now, but there will be in the not-too-distant future. The user-mode port (see http://user-mode-linux.sourceforge.net) can do this. I had SMP turned on and more or less working around 2.3.26. It shouldn't be a big deal to turn it bad on, fix the bugs, and get it working. Once that's done, you can start it up with "ncpus=2" or "ncpus=32", and you get that many virtual processors." Richard B. Johnson also replied to Aaron, saying, "You could have two (or more) schedulers. However, this would not replicate nor even emulate two CPUs. A single CPU can only do one thing at any instant. Two or more CPUs can do as many things as there are CPUs. Some shared resources have to be single-threaded with SMP, but otherwise a lot of activity can be done in parallel."
Borislav Deianov also replied to Aaron, and suggested the fair scheduler. He gave what he called his standard pitch, "Fairsched is a hierarchical fair CPU scheduler. Processes are divided into groups and each group receives guaranteed CPU time allocation proportional to its weight. The standard scheduler is used to schedule processes within a group. It can be used to divide CPU time fairly among users or for more flexible CPU time allocation on busy compute servers." He added that Rik van Riel had also written a variant that was less flexible but significantly simpler. Bartlomiej Zolnierkiewicz asked if 'fairsched' would one day allow users to choose a scheduling algorithm for each group of processes, and Borislav replied, "Actually this feature was present in fairsched's predecessor QLinux (http://www.cs.umass.edu/~lass/software/qlinux/). Dropping it simplified the code quite a bit and I didn't think it's that useful."
12. Some Explanation Of Hard IRQ Atomicity
26 Jul 2000 - 27 Jun 2000 (7 posts) Archive Link: "[PATCH] removal of unnecessary irq save/restore in tasklet_hi_schedule"
Topics: Disks: SCSI
People: Linus Torvalds, Tigran Aivazian, Stuart MacDonald
Daniel Marmier posted a small patch to take out some code from 'include/linux/interrupt.h' that saved and restored flags. He felt the code wasn't necessary because "hard interrupt requests" (hard IRQs) could not be re-entered by other IRQs. Since the code was not in danger of being re-entered, the flags were not in danger of changing and thus did not need to be saved and restored. But Linus Torvalds explained, "No, hard-irq's _can_ be re-entered. One hard-irq cannot re-enter itself, but you can have _different_ irq's enter each other. As such, to avoid deadlock in on the local CPU the current code is needed.." Stuart MacDonald asked, if hard IRQ A entered hard IRQ B, which in turn entered hard IRQ A again, then hard IRQ A would have re-entered itself. But Linus replied that while it was possible for A and B to enter each other, it was not possible for them to subsequently re-enter themselves, because "We use the interrupt blocking hardware to make sure that as long as irq A is running, further instances of irq A are blocked. OTHER interrupts can still happen." Tigran Aivazian replied that he thought this was only true if SA_INTERRUPT was passed with the interrupt (in which case the IRQ would be atomic not only with regard to itself, but also to other interrupt handlers as well), as was the case with 2.2; but Linus explained:
SA_INTERRUPT is deprecated, as it fundamentally cannot work with shared interrupts (or at least they all have to _agree_ on SA_INTERRUPT, which basically means that you have to know exactly the type of sharing. Ugh).
And non-shared interrupts are _definitely_ deprecated, unless the hardware really forces you to use them. That kind of hardware is getting quite rare. Thank God.
That said, SA_INTERRUPT still works, but it's really only meant to be used for the timer handler (you don't want the timer to be interruptible: it's fast, and at least the SCSI layer historically wanted the timer to tick _while_ a SCSI interrupt is active, so you must not have a SCSI interrupt interrupt the timer, because if that happens..)
13. Reading Files From Device Drivers
26 Jul 2000 - 28 Jul 2000 (10 posts) Archive Link: "Reading a file inside the device driver."
Topics: Ioctls, Modems, Web Servers
People: Theodore Y. Ts'o, Richard B. Johnson, Jeff Garzik, Ashok Raj
Vinay Vernekar asked how to open and read a file from within a driver. As far as he knew, normal file operations like fopen() couldn't be used within drivers. Theodore Y. Ts'o gave a link to the rocketport driver and replied:
The best way to do this is to do it in a user-mode program which is run out of an /etc/rc.d script, which uses ioctl()'s to download the firmware into kernel. This allows you to have much better error recovery and timeout handling than if you try to do this in the kernel.
For an exmaple program of how to do this, see the rocketport driver, especially how the loadrm2 program loads the Rocketmodem II firmware. There's some clever code there which allows multiple modems to be downloaded in parallel to speed things up. Feel free to steal it for your project; it's under the GPL.
Ashok Raj also suggested looking in the 'khttpd' code, which he said had plenty of examples. But Richard B. Johnson expounded at length:
The easiest way, and the way which will always work in the future, is to provide open() close() and ioctl() in your driver.
After the driver is installed (insmod) upon bootup, you execute a user-mode program that opens the device and sends file-data to the device via ioctl(). Your ioctl() routine in your driver puts this stuff in the right place inside your hardware and finishes initializing the device. The sysV-init scripts can be modified to do this automatically. You could, of course do the same thing using write(), but ioctl() allows you to pass function parameters so your startup code might do something like:
fd = open("MyDevice", O_RDWR);
file = open("MyBinaryJunk", O_RDONLY);
read(file, buf, len);
ioctl(fd, MYDEV_UPLOAD_WINDOW0, buf);
read(file, buf, len);
ioctl(fd, MYDEV_UPLOAD_WINDOW1, buf);
ioctl(fd, MYDEV_.... etc.
Another way, is to make a 'C' array of the data that you need to put into your hardware. You can make a simple 'C' development tool that reads data from your file, converts it to text the compiler understands, and you include this during compilation. This has at least two major problems. The first is that you have to recompile to update your hardware-code. The second is that some code may be very large. This stays within the kernel when the module is installed and wastes kernel space.
You can't deallocate an array that the compiler initializes.
The last way, and the way I don't recommend, is to actually open and read the file from kernel space. This is complex because your driver needs to borrow a "process context" in order for file I/O to work. The file-system was designed to run from a user's data area (segment) and, for a file-descriptor to actually mean something, it has to be associated with a process. This can be done and I am sure that others will point you in that direction. However, what works today might not work for the next kernel version.
I wrote such a driver about a year ago. It seemed to work after I got a lot of help. However, even though the file was closed after the read, I was never able to unmount '/index.html'. This means the next boot was a long fscking wait.
Ripping that up, and using user-mode access via ioctl() created a stable driver with the minimum of resident code.
Elsewhere, Jeff Garzik also recommended finding a user-space solution, though he added, "If you must do it in the kernel, just use normal Unix syscalls... Inside the kernel, they are called sys_open, sys_read, etc. Make sure the code calling those functions doesn't mind if the called functions sleep." Theodore objected, "I wouldn't recommend that approach at all. First of all, sys_open means you're messing with a process's file descriptor table, and if another process can get at the fd table (i.e., because of a clone system call), they may be able to cause mischief for the kernel (the fd can get changed to be something else, etc.). Secondly, sys_read()/sys_write() assumes that it will be reading and writing into user memory, not kernel memory. There are games you can play to get around this, but such approaches are fragile and complicated, and not what I'd recommend."
14. Status Of CML2
26 Jul 2000 - 29 Jul 2000 (2 posts) Archive Link: "CML2 0.7.4 is available"
Topics: Kernel Build System, USB
People: Eric S. Raymond, Randy Dunlap
For CML2's initial announcement and discussion, see Issue #71, Section #4 (24 May 2000: CML2 Replacement For The 'kbuild' System; Language Dispute) . Shortly thereafter in Issue #72, Section #8 (5 Jun 2000: Pushing For Sign-off On CML2) , Eric was trying to round up config-file maintainers to coordinate agreement on the switchover.
This week, Eric S. Raymond announced:
The latest version is always available at http://www.tuxedo.org/~esr/kbuild/.
Release 0.7.4: Wed Jul 26 12:07:47 EDT 2000
That third bullet item means that CML2 will not be able to do the right thing and support passing around partial configs until about 300 lines in various Makefiles are changed. Sigh...this may mean I put off issuing 1.0.0 until after CML2 has been accepted into the tree and those changes get made,
There are currently a couple of minor look and feel issues in the Tk interface, but no known bugs.
There was no reply, but elsewhere under the Subject: CML2 0-7-5 is available, he announced:
Release 0.7.5: Sat Jul 29 05:34:59 EDT 2000
There was no reply.
15. Linus Announces 2.4.0-test5
27 Jul 2000 (1 post) Archive Link: "Linux-2.4.0-test5"
Topics: Disk Arrays: MD, Forward Port, Kernel Release Announcement, SMP, USB
People: Linus Torvalds, Manfred Spraul
Linus Torvalds announced Linux 2.4.0-test5, saying:
There's a 2.4.0-test5 kernel out there. The diff is pretty huge, to a large degree due to a bttv driver syntactic split-up and due to the NLS forward-port from 2.2.x.
Other notable bugfixes:
There was no reply.
16. Status Of zImage
30 Jul 2000 - 31 Jul 2000 (15 posts) Archive Link: "zImage support in test3"
Topics: Compression, Small Systems
People: Kai Schulte, H. Peter Anvin, Stephen Frost, Mike Castle, Alan Cox
This was first covered in Issue #1, Section #7 (8 Jan 1999: zImage Vs. bzImage) , where folks were already discussing dropping 'zImage' support, although bzImage was not yet supported on all systems. Recently it came up again in Issue #74, Section #2 (13 Jun 2000: Getting Rid Of zImage) , where the concensus was that no one needed 'zImage' anymore, and that it should be deprecated in 2.4 and obliterated in 2.5.
Now, Kai Schulte asked for a variable to be returned to the code, because 'zImage' required it. He added, "my good old 2MB 386SX terminal won't be amused if I ask it to load a bzImage ;)" Stephen Frost asked if Kai would give 'bzImage' a try, and Kai replied, "Why would you try? The compressed image will load fine but there won't be any memory to decompress into, so it'll grind to a halt." H. Peter Anvin (who had led the earlier charge to get rid of 'zImage') replied, "No, it will grind to a halt because of a hard-coded limit (in setup.S, I believe.) It is currently set to 2MB for zImage and 4MB for bzImage. I suspect those values are completely arbitrary; trying without it would be interesting." Kai agreed, there was no reason bzImage couldn't work in 2MB, and said he'd give it a try. A couple hours later he replied to himself, "It does :) There must be some other memory location that isn't being initialized, though. Rebooting without hardware reset causes it to reboot continuously right after the decompression - not quite what you'd expect from an embedded system ;)"
Elsewhere and a bit earlier in the discussion, Stephen remarked, "It was my understanding that anything that zImage worked on bzImage would work on." Mike Castle explained:
They're HOPING for that.
There is currently a call out to find all machines that work with zImage but not bzImage so they can make bzImage work on them and then get rid of zImage.
Apparently one was found here.
They really really want zImage to go away. Rather than trying to get zImage to work on this particular computer again, I'd suggest concentrate on trying to get bzImage to work.
Here, Alan Cox brought the hammer down, mentioning, "zImage wont go away. You cant boot a bzImage kernel on a 2Mb system such as a low end embedded AMD board." A couple posts down the line, H. Peter concluded the thread with, "Saying "zImage won't go away" may be true (although I definitely wish it wasn't so), but boot loaders probably will start to drop zImage support, because of the hideous pain it takes to support them cleanly in a way that makes sense at all. SYSLINUX simply load them high and copy them down before starting the kernel -- anything else is too painful."
Sharon And Joy
Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.