Table Of Contents
|1.||7 Jun 2000 - 13 Jun 2000||(10 posts)||SPX Unfinished In Stable Series|
|2.||8 Jun 2000 - 14 Jun 2000||(10 posts)||Linux 2.5 To Do List Looks For Web Server Space|
|3.||8 Jun 2000 - 18 Jun 2000||(64 posts)||Stallman Advocates "MSDOS-Style Floppy Handling"|
|4.||9 Jun 2000 - 14 Jun 2000||(39 posts)||Linus On Micro-Kernels|
|5.||9 Jun 2000 - 16 Jun 2000||(18 posts)||Putting Restrictions On Untrusted Code|
|6.||11 Jun 2000 - 19 Jun 2000||(29 posts)||Some Debate Over POSIX And Symlinks|
|7.||11 Jun 2000 - 15 Jun 2000||(5 posts)||Attempt At New Slab Allocator|
|8.||11 Jun 2000 - 14 Jun 2000||(12 posts)||NFSv3 In The Stable Series: The Saga Continues|
|9.||11 Jun 2000 - 16 Jun 2000||(8 posts)||Developers Discuss Microsoft|
|10.||12 Jun 2000 - 16 Jun 2000||(18 posts)||Developer Philosophy: Quietly Breaking Hardware Ports In Unstable Series|
|11.||12 Jun 2000 - 15 Jun 2000||(31 posts)||Developers Argue Over Virtual Memory: 'classzone' Vs. 'strict zone'|
|12.||13 Jun 2000 - 19 Jun 2000||(9 posts)||Alan Cox Not Updating EXTRAVERSION In -ac Patches|
|13.||13 Jun 2000 - 17 Jun 2000||(64 posts)||Alan's Latest List Of Things To Do Before 2.4 Can Come Out|
|14.||14 Jun 2000 - 15 Jun 2000||(8 posts)||Dell Binary-Only Drivers May Go Open Source|
|15.||15 Jun 2000 - 18 Jun 2000||(14 posts)||Possible Solution For Recent VM CPU Hogging|
IntroductionThanks go out to Marius Gedminas for finding some serious HTML errors in last week's issue, and a bug in the compiler that produces these pages! Good eye, Marius! Thanks!!
Mailing List Stats For This Week
We looked at 1363 posts in 5519K.
There were 463 different contributors. 217 posted more than once. 173 posted last week too.
The top posters of the week were:
1. SPX Unfinished In Stable Series
7 Jun 2000 - 13 Jun 2000 (10 posts) Subject: "how about SPX ?"
People: Jay Schulist, Jeff V. Merkey, Alan Cox, H. Peter Anvin
Lorinczy Zsigmond got an oops trying to use Sequenced Packet Exchange (SPX) (http://developer.novell.com/research/appnotes/1995/december/03/04.htm) in 2.2.15. He posted the error and asked who the SPX maintainer was. As far as he could tell, it hadn't been touched since January 1999. Jay Schulist replied:
I am the maintainer, but I hardly have any time for SPX at the moment. I would be more than happy to work with you to get it working again.
Let me know if you are available to debug and test it with me.
There was no reply to this, but Alan Cox replied to Lorinczy (apparently without having seen Jay's post), confirming that SPX was broken, and didn't seem to have a maintainer. Jeff V. Merkey volunteered, with, "To whomever is attempting to use SPX, please forward me the Ooops info, and I'll attempt to take a look at it and fix it. The current SPX engine in Linux is incomplete, but I am willing to take a stab at looking at the bugs. Alan -- I'll try to help keep it current." Alan replied with his assessment, saying, "Finish writing it is closer. The connection accept logic for example is entirely ficticious ;)"
Jeff replied that he'd be happy to do a full SPX implementation, with NetWare-compatible APIs; and added that he'd done it on three other OSes as well. But H. Peter Anvin warned, "Note that we already have a socket-compatible interface, which is the interface of choice. A NetWare-compatible API should be done as a user-space library." He went on, though:
However, would certainly appreciate your help in getting the IPX/SPX implementation cleaned up and improved -- it clearly has been suffering from neglegt/disuse -- and to get that user-space library written!
This, combined with your NWFS implementation, should make Linux pretty much a drop-in replacement for NetWare, I'm guessing.
There was no reply, but elsewhere and a week later Jeff reported briefly, "I'm going through this code in detail now. Packet burst isn't working correctly either."
2. Linux 2.5 To Do List Looks For Web Server Space
8 Jun 2000 - 14 Jun 2000 (10 posts) Subject: "Linux 2.5 TODO on the web"
People: Alan Cox, James Sutherland, Matthias Andree, Bartlomiej Zolnierkiewicz, Gary Lawrence Murphy, Kenneth C. Arnold
Kenneth C. Arnold posted a URL to the 2.5/2.6 To Do List (http://kena.8k.com/linux-kernel/) . But he replied to himself about an hour and a half later, to explain the "404" error some folks had been getting from the server. Apparently the host company had HTML requirements, so that banner ads could be displayed automatically on the page. Since the To Do list was in plain text, several different browsers were running up against this wall. Bartlomiej Zolnierkiewicz commented unhappily that the wrapper page could list all "broken" browsers, but Alan Cox replied, "It isnt a 'broken' browser. Its a correctly implemented browser. Referrer is an unacceptably flawed privacy problem. Good tools do not send referrer entries." James Sutherland replied historically:
Mozilla dropped this entry from their request headers for a while, until they discovered that broke too many WWW sites, when they had to put it back in. (According to one of the developers, they had been trying to make requests as small (=> fast) as possible, but been a little overzealous.)
One of the privacy settings available in Squid is to strip out these header fields. They do make special mention of User-Agent, since enough sites modify the results based on which browser you are using, but last time I looked, Referer wasn't mentioned.
Maybe if enough people report this to FreeServers as being a problem, they'll fix it?
Elsewhere, Matthias Andree interjected, "Fuck it! If they want to give me their information only if I sell my privacy for marketing or advertising sake, I give a shit. Please move the information elsewhere. Espionage as industry standard? How deep have we fallen? The marketing dudes everywhere plug the fingers in your asses and you are considering which browser works and which fails. That's the wrong approach. Unregister that stuff and place it elsewhere." He offered to host it on his own web site, so long as it stayed smaller than 64K. James also offered 10M of web space. Gary Lawrence Murphy also offered to host the To Do list, if Kenneth didn't mind banner ads for other GPLed projects.
3. Stallman Advocates "MSDOS-Style Floppy Handling"
8 Jun 2000 - 18 Jun 2000 (64 posts) Subject: "Floppy handling"
People: Richard M. Stallman, Ian McKellar, Alex Buell, Alan Cox, Jeff Garzik, Adam Sampson
Richard M. Stallman requested:
Is there any possibility of making Linux handle file systems on floppies like MSDOS, so that there is no need to explicitly mount and unmount a floppy drive in order to access floppies through the file system?
Because of the inconvenience of mounting and unmounting floppies, I never do that; I only use mcopy. mcopy is covenient enough, for people who don't mind learning another few commands. But that isn't very easy for non-hackers, the kind of people who now prefer Windows. A complaint from a non-hacker about this is what inspired me to write this message.
We want to make GNU/Linux appeal to Windows users, and this is one of the things necessary to do that. And if MSDOS could do this, surely we can.
There were a number of replies, and a couple longer threads. Adam Sampson suggested that Stallman's feature could be implemented via a user-space filesystem. But he acknowledged that Linux currently had no interface for such a thing, although 'podfuk' worked by emulating the userspace portion of a networked filesystem.
Elsewhere, Ian McKellar mentioned he'd been working on implementing MSDOS filesystem support in the GNOME Virtual Filesystem layer. But Richard replied, "Despite my loyalty to GNOME (as a part of the GNU Project), on technical grounds I have to disagree. File access needs to be available to all the programs on the system, and the only natural way to do that is through the kernel and the C library. Doing this in GNOME will produce a feature that half-works, which means it will appear to the users as an unreliable feature." Ian ended the subthread with, "I sort of agree with you on this, but you can extend this argument to just about any feature of a non-core library. In GTK we implement widgets which the X libraries don't. This leads to user interface inconsistencies. In gnome-vfs we support remote filesystems which libc doesn't. This leads to inconsistent filesystem handling between applications."
Alex Buell also replied to Richard's initial request, saying, "Over my dead body, mate. There's the mtools package to do this anyway, and RH 6.x defaults to non-root users mounting /dev/fd0. I don't see why we have to bend over backwards when it's better for them to learn to get to grips with UNIX rather than be cosseted with historical MS crud. Even CP/M needed to mount floppies." Richard replied in all seriousness, "I do plan to try to find someone to implement this fully, but I think it would be unfortunate if your death were the result. I hope you will find the courage to keep on living despite the existence of this feature."
Alan Cox also replied to Richard's initial request, suggesting 'supermount' as a possibility. But he added, "It needs more hands to port it to 2.3.x and it needs a very good review of the code to fix remaining questionable habits (or rewriting as a stackable fs). The stuff needed is out there however." Jeff Garzik gave a historical overview:
Porting supermount from 2.0.x to 2.2.x was a cosource.com project, undertaken by Alexis Milhailov. He also ported it to 2.3.x. Unfortunately the web page where supermount is supposed to be located doesn't resolve (http://supermount.cornpops.cx/)
I uploaded the latest I have, against 2.3.99-pre5, at http://gtf.org/garzik/kernel/files/supermount-0.3.1-2.3.99-pre5.diff.gz
It definitely needs review, especially by the original SuperMount author (sct?) Al Viro looked at one version of the patch and worried about races.
There were several other threads in which 'supermount' came up as the solution to Richard's problem, and Richard eventually said, while discussing a different possible solution, "Supermount seems to come closer to what I had in mind, and maybe could do the job. But some people say it is not reliable. If it is made reliable and gets included as a standard part of Linux, maybe it will do the job."
4. Linus On Micro-Kernels
9 Jun 2000 - 14 Jun 2000 (39 posts) Subject: "linux and micro kernel"
People: Linus Torvalds, Matthew Wilcox, Val Henson
Tonglu Yi asked if Linux might become micro-kernel-based in the future. Val Henson gave a pointer to the FAQ (http://www.tux.org/lkml/#s1-5) , and to Liedtke's L4Linux paper (http://os.inf.tu-dresden.de/pubs/sosp97) . Most replies indicated very slim chances for this, and at some point in the discussion, Matthew Wilcox tried to remember a quote by Linus Torvalds on the subject, and someone else found it in the archives. Linus had said, "message passing as the fundamental operation of the OS is just an excercise in computer science masturbation. It may feel good, but you don't actually get anything DONE." For Linus' full quote (it's long) and a summary of the thread in which it came up, see Issue #25, Section #4 (20 Jun 1999: The Future Of OS Design) .
5. Putting Restrictions On Untrusted Code
9 Jun 2000 - 16 Jun 2000 (18 posts) Subject: "Running Untrusted Code in a Restricted Process"
Topics: User-Mode Linux
People: Jeff Dike, Jesse Hammons, Brian Gerst, David A. Wagner, Alan Cox, Pavel Machek
Jesse Hammons (working off the 2.2.12 kernel) made his first ever post to linux-kernel, in which he proposed a way to restrict the system calls an untrusted binary could make. Someone gave a link to an overview of various access control systems (http://research-cistw.saic.com/cace/) , and someone else gave a pointer to the Medusa (http://medusa.fornax.sk/) project.
Jeff Dike said that a dedicated sandbox arrangement would probably be better than Jesse's proposal; and gave a pointer to his user-mode linux (http://user-mode-linux.sourceforge.net) kernel port, adding, "It gives you a virtual machine whose disk space consumption, cpu consumption, memory comsumption, and network traffic can be completely controlled. Plus, it's all in user-space. Nothing needs to be added to the kernel at all." Jesse agreed that uml did sound better than what he'd envisioned, and asked if any resource stats were available. Jeff replied:
No good ones at this point. The user-space kernel is larger than a native one, but I haven't added much code to it, so I imagine that I've done some stupid code-bloating things. After I look for them and fix them, I imagine that it will be comparable to a native kernel. So, I would look at the size of a native kernel, and I think that will in the ballpark of what you can expect to see.
Also, if you care enough about something to stick it in a virtual machine, a couple of megs is probably not a big deal. If it is, and you have a bunch of things that need to live in virtual machines, you can make them all live in the same one, where they can all infect each other with viruses and send love notes to each other :-)
In the same post, he added, "this is somewhat more resource-intensive than other sandboxes, but it's also more secure."
In the midst of replying to Jeff on other levels, Jesse suggested:
I hope it doesn't sound silly to say this but assuming just for a moment that I could compile this sucker on say, windows (maybe using cygwin32), and reimplement the part that does system calls in terms of windows system calls, could this be used to run sandboxed linux (elf) plugins on windows as well? *That* would be cool.
I guess you would need a way to trap system calls from the windows OS. I don't know if they provide that facility.
Jeff replied that he'd heard the Windows-port suggestion before, and added, "You could do that. Apparently, 95 doesn't have the capabilities needed, 98 is iffy, and NT seems to be ok. If you're (or anyone else is) interested in this, let me know, and I'll point you to the (scanty) information I have on doing a Windows port." As far as the specific ability to trap system calls, he repeated, "NT supposedly has the ability to do that, as well as the mmap stuff that's needed." The subthread ended there.
Elsewhere, Brian Gerst pointed out that Jesse's original proposal could already be done under Linux using the ptrace() system call. He explained, "Ptrace can intercept system calls made by the traced process (strace uses this) and can modify or deny them." Jesse checked this out, and found that this was true for 2.3.x, but not for the kernel he was using (2.2.12). Alan Cox replied that the functionality was indeed in 2.2.x, but Jeff clarified, "For the record, this was added in 2.3.22 and 2.2.15 for i386."
Elsewhere but still on 'ptrace', David A. Wagner said, "As others have noted, you can use ptrace() to selectively deny syscalls. See http://www.cs.berkeley.edu/~daw/janus/ for an implementation that used this idea in a more general context." And Jeff replied, "And see Pavel Machek's site (http://atrey.karlin.mff.cuni.cz/~pavel/dipl/eng.html) for how Janus (and any other ptrace syscall filterer) can be faked out. Plus a bunch of other sandbox possibilities."
6. Some Debate Over POSIX And Symlinks
11 Jun 2000 - 19 Jun 2000 (29 posts) Subject: "[BUG] Kernel 2.4.0-test1-ac10 changes open of symlink behavior."
Topics: BSD, POSIX
People: Alexander Viro, Alan Cox, Ulrich Drepper, Andries Brouwer, Daniel Pittman
Under the most recent 2.4.0test-* kernels, Daniel Pittman found that trying to edit files through a symlink would give an ENOENT error when he tried to save to disk. Under 2.2.15, he could save just fine. He explained that at the time of editing, the symlink pointed to a nonexistant file; but his understanding was that the referenced file should be created automatically in that circumstance. Andries Brouwer agreed, and quoted the new POSIX draft, "In general the open() function follows the symbolic link if path names a symbolic link." But Alexander Viro replied, "Excuse me, but I'll take difference from POSIX over a bunch of very real races, thank you very much. Again, feel free to propose race-free implementation if you want that thing back. Until then O_CREAT without O_EXCL will return -ENOENT on broken symlinks. Userland should not rely on objects' creation/removal following symlinks. Period." But Alan Cox replied in turn, "POSIX says otherwise. Period ;)" Alexander replied that for this particular case, "POSIX draft is broken and we would be better off fixing that bug instead of casting it in stone. Behaviour in question is inconsistent with every other case when links are created/removed/renamed." Alan suggested, "You might want to take that up with the relevant posix committee then."
Ulrich Drepper replied, "Symlinks are not in the current POSIX standard and therefore the existing POSIX standard is OK. Symlinks are coming in the next revision of the standard as an option (due to the merging with the Unix specs). If you find something inadequate in the Austin group draft tell me what it is and a solution for it. I'll make sure it gets handled in the next meeting." Alexander proposed:
Case in question: foo/bar/baz being a dangling symlink, open() with O_CREATE applied to it. Behaviour mandated by the draft: create a file in place where symlink points to. Problem: it is wildly inconsistent with every other case when we create/remove/rename objects. In every other case it's "you've got foo/bar/baz, you either create/remove/rename the entry 'baz' in directory 'foo/bar' or fail". Here the operation is applied to directory potentially different from the foo/bar.
Proposed (minimal) change: "Portable programs can not rely on open()/create() following dangling links. It should not be confused with O_EXCL semantics - there we refuse to follow _any_ symlinks, as well as open existing files."
In other words, proposed semantics looks so:
Andries said that the behavior Alexander was complaining about was the standard path resolution for symlinks; and guessed Alexander hadn't read the POSIX drafts. He quoted POSIX at length, and Alexander replied, "Lovely. IOW, draft sucks badly for mkdir() too - AFAICS with security consequences. And makes 4.4BSD, Solaris and Linux non-compliant in bargain. mkdir() on a dangling symlinks does not follow links on any of these systems." Andries replied, "Why don't you get the text yourself and read it, before shouting wildly? Concerning mkdir() I can reassure you. Quoting from the mkdir() section: "If path names a symbolic link, mkdir() shall fail and set errno to EEXIST"."
The subthread ended there, but elsewhere Andries gave pointers to The Austin Common Standards Revision Group (http://www.opengroup.org/austin/) , also to a Long Description of the Proposed Common Standards Revision Project (http://www.opengroup.org/austin/docs/austin_9r6.txt) , and finally the Austin Group's mailing list page (http://www.opengroup.org/austin/lists.html) .
7. Attempt At New Slab Allocator
11 Jun 2000 - 15 Jun 2000 (5 posts) Subject: "New slab allocation (pre-alpha)"
People: Mark Hemment
Mark Hemment announced:
For the last few months, I've been working (on and off) on a replacement for my implementation of the Slab allocator.
The new allocator has the features;
At present, the code is for 2.2.15 only and is pre-alpha quality. I might decide to re-design it tomorrow if I think of something better.
I will soon be moving to 2.4.x - infact, I will need to when finalising this. The page structures aren't cached aligned in 2.2.x, so the slab's internal usage of this structs' members may well cross L1 cache lines (not good for performance).
This code has only been run on an UP box.
I don't have access to an MP box, but hope to by next weekend. Note; the MP code uses different methods to add general size caches, and to empty clips - none of which can be easily tested on UP (even running an MP kernel on UP doesn't help much). MP owners, bewarned.
A patch is available at;
Or to just view the allocator's source;
Hopefully, I've loaded up the latest version. :(
There were a couple replies, but no real discussion.
8. NFSv3 In The Stable Series: The Saga Continues
11 Jun 2000 - 14 Jun 2000 (12 posts) Subject: "Linux 2.2.17pre1"
Topics: FS: NFS, Kernel Release Announcement
People: Alan Cox, Matthias Andree, Paul Jakma
Alan Cox announced 2.2.17pre1, adding, "I'm going for stabilising the oddments 2.2.16 got a bit wrong before we move onwards. This even though a pre patch should be somewhat more solid than 2.2.16." Matthias Andree asked if NFSv3 might get into 2.2.18, and Paul Jakma was also hopeful. For more on the ongoing saga of NFSv3 in the stable series, see Issue #61, Section #14 (21 Mar 2000: Status Of NFSv3) .
Alan replied, "Various bits are going on. It may be in .17. Just for the moment .17 is the bug fixes." Matthias said, "I got the imagination you were planning 2.2.17 as bugfix-2.2.16-only release." And Alan corrected, "For the moment Im putting in the bug fix patches. What happens then depends how many and how serious."
9. Developers Discuss Microsoft
11 Jun 2000 - 16 Jun 2000 (8 posts) Subject: "Ballmer speaks a truth"
Topics: Microsoft, Patents
People: Rick Hohensee, Brandon S. Allbery, Rik van Riel, James Simmons, David Ford, Richard Torkar, Andrew Sharp, Ricky Beam
There was a bit of revelling this week, starting when Rick Hohensee gave:
"So far, Linux doesn't have a lot of traction on the client [Microsoft-ese for desktop computers], except in some university environments."
Steve Ballmer of Microsoft, as quoted and remarked by John Schwartz in the Washington Post, June 11 2000
Brandon S. Allbery replied with a smirk, "Heh. We *are* pissing them off. Good. :)" And Rik van Riel said with a grin:
It must be worrying for MS how much Linux has increased in popularity. I believe we've gone from "Linux will never be successful" to the above "Linux isn't successful yet" in just about one year ... :)
Kind of worrysome that even MS is admitting the success of Linux :) (outside of their legal arguments)
James Simmons mentioned hearing that, "M$ in retaliation to the DJ will release advance technology they have been keeping from the people to wipe out their competitors. According to M$ linux will be gone in a year if they do this." David Ford replied, "Considering the past history of 'innovation', we already have this 'new technology'. And to m$ wiping out Linux.. a) can't happen and b) is pretty darn funny thinking 'bout it." Richard Torkar let his fiendish laugh be heard, "*muahahahahahahahaha* Since when has MS come up with an innovation?" And Andrew Sharp added, "Reality check...since when would M$ have EVER held back a technology that would do harm to their competitors? If the 2000 programmers Microserf has assigned to writing a Linux virus had anything ready to go, they wouldn't be "holding it back."" Ricky Beam answered the question of Microsoft's most recent innovations, estimating:
About 20 years ago. The wave they think they are riding evaporated long ago. Let's see, they bitch about linux being "based on 30yr old technology" -- they seem to forget how old MS-DOS, windows, and NT are and how much of the same "30yr old tech" they've adopted -- but very recently Microsoft gave windows the same capabilty UNIX has had for "ever" -- probablly only because 3rd parties are making money selling such capabilities.
I am, of course, referring to "Terminal Server" and "X"... X has had the ability to have it's API calls sent to any display from the first line of code.
Billy boy can say what ever he wants. There's two decades of evidence of his greed -- you should buy everything from Microsoft... There are too many cases to count of Microsoft ignoring various laws and IP -- patents, copyrights, out-right theft of technology...
10. Developer Philosophy: Quietly Breaking Hardware Ports In Unstable Series
12 Jun 2000 - 16 Jun 2000 (18 posts) Subject: "Linux 2.4.0-test1-ac16"
Topics: Assembly, I2O, Kernel Release Announcement, SMP
People: Alan Cox, Jens Axboe, Ed Carp, Matthew Wilcox, Peter Rival, Tigran Aivazian, Arnaldo Carvalho de Melo, Mike Phillips, Ingo Molnar, David Woodhouse, Russell King, Richard Torkar, Ben LaHaise, Chris Ricker
Alan Cox announced 2.4.0-test1-ac16:
These patches are versus 2.4.0-test1 from your favourite kernel mirror (ftp.us.kernel.org:/pub/linux/kernel/v2.4.0 (ftp://ftp.us.kernel.org/pub/linux/kernel/v2.4.0) ). The patches are in /pub/linux/kernel/people/alan (ftp://ftp.us.kernel.org/pub/linux/kernel/people/alan) .
Mike Phillips replied that item 9 (Fix a couple of CD driver bugs) broke the compile by removing the CDROM_CAN macro definition, and Richard Torkar confirmed the breakage using 'gcc' 2.95.1; Jens Axboe replied, "Yup, pretty stupid error on my part, don't know how that macro disappeared. Just reverse that part of the patch, Alan already has a fix."
Peter Rival reported that both ac15 and ac16 would hang his Alpha ES40, right after enabling swap. David Woodhouse confirmed a similar problem on his ia32 SMP, though it was not repeatable, and happened on 2.3.99pre8 and 2.4.0test1-ac12.
Jim Barriault reported that ac16 and ac17 (apparently available by the time he posted) would both give compiler errors on his DS20, and Alan replied, "All the non x86 platforms got broken by the ptrace change. This fixes an ugly kernel race and needs asm level changes for the other platforms. Painful but important to do." Ed Carp chastized:
An extremely shortsighted thing to do. The x86 platform isn't the only platform that runs Linux, and these sorts of changes should be done for ALL platforms at once, rather than fixing "ugly" kernel stuff (which should've been fixed before anyway) at the expense of non-x86 platforms.
Then again, since it's Alan's personal release, maybe it doesn't matter, as long as it gets fixed for all platforms by the time that 2.4 gets kicked out the door.
There were three replies to this. Matthew Wilcox said, "you don't understand. this is the normal way of doing changes which require assembly or machine-specific changes. break the build on those architectures then the port maintainers notice immediately and fix them."
Alan also replied to Ed, "Tough. Its up to you to fix the other platforms. Its always worked like this. I am not playing nanny and co-ordinator to all the port maintainers. It broke the PPC and Alpha folks have updated their ports. Either fix the sparc one or wait until someone does. If I wait for all the maintainers then everyone suffers. If we break ports now and then most of them don't." Ed replied, "In other words, Alan tells the non-x86 folks that they can (in the words of one of my favorite movies) "all line up and kiss his ass!" ROFL!" And in the same post, he went on, "Oh, pretty please, DOCUMENT THE ASM CHANGES so we don't have to figure out what the hell you did to fix it? I hate to be a pain in the ass and scream yet again about kernel docs, but it really needs to be done if people want to consider themselves professional programmers working on a system they want to be accepted by the mainstream. It also makes porting code to other platforms A LOT LESS PAINFUL. See, one of my clients is this really big hardware company and I'd really like them to continue down the Linux road, but it's a tough sell if the technical gang tells them that VxWorks or PsOS is a lot better because they're designed to be relatively easy to port to other platforms." To this last, Matthew replied, "that's bollocks. who told you to use the latest development kernel? would you go to VxWorks, demand the latest nightly snapshot of their build and judge them on that? just because the latest snapshot of linux is available to you, doesn't mean we recommend it." And to the idea of Alan telling the non-x86 folks to kiss his ass, Matthew said, "you'd prefer alan to guess what the correct sparc, mips, ppc, arm, s/390, m68k, alpha and parisc asm code is? or ship a kernel with a known security hole? or not ship a kernel at all for weeks until all the port maintainers have had enough time to submit changes? face it, this is the only way to work. and you're the only one who has a problem with it."
Peter also replied to Ed's initial post, "it's already been fixed for Alpha, and I believe sparc & sparc64 either work or almost work now as well. What is _really_ annoying is when there are releases put out where nobody says anything about non-x86 ports being broken until someone complains and then the reaction is "oh yeah, we knew about that". I'm not complaining, it's just frustrating when I'm trying to get work done and I have to waste time trying to update to a release that is known at release time not to work." In the same post, he requested, "Alan - any chance we could get a list of things known to be broken/not to work added to the 2.4.0test-acXX release announcements? I don't care that things get broken, I'd just like to not have to find it out if people already know. Or, smack me in the head and tell me "moron, things are always documented - look here". Either way - I'm not offended... ;)" Alan replied that there would be so much to report, that he'd have to gzip his posts. But he also pointed out where the release notes had said "Fix ptrace races | may need tweaks for non x86". And Peter concluded the thread with, "Good 'nuff - my bad. I just didn't realize that "may need tweaks" could also mean "doesn't build". I'm just gonna shut up now and try to find some more caffeine. :)"
11. Developers Argue Over Virtual Memory: 'classzone' Vs. 'strict zone'
12 Jun 2000 - 15 Jun 2000 (31 posts) Subject: "[patch] improve streaming I/O [bug in shrink_mmap()]"
Topics: Virtual Memory
People: Zlatko Calusic, Andrea Arcangeli, Rik van Riel, Stephen C. Tweedie
The discussion started peacefully enough. Zlatko Calusic posted a one-liner to fix a long-standing problem in the virtual memory system. He explained, "While searching for a discardable page in shrink_mmap() Linux was too easily failing and subsequently falling back to swapping. The problem was that shrink_mmap() counted pages from the wrong zone, and in case of balancing a relatively smaller zone (e.g. DMA zone on a 128MB computer) "count" would be mistakenly spent dealing with pages from the wrong zone. The net effect of all this was spurious swapping that hurt performance greatly." Stephen C. Tweedie was impressed, and added that Zlatko's bug might be the same thing causing the excessive CPU usage recently reported. For more on this, see Issue #66, Section #3 (22 Apr 2000: 'kswapd' Instability; Debugging Deadlocks) , also Issue #68, Section #5 (8 May 2000: Virtual Memory Problems Persist In Development Series) , and Issue #69, Section #6 (16 May 2000: Possible Fix For 'kswapd' CPU Overuse) . Classzone had a brief mention in Issue #70, Section #2 (18 May 2000: Things To Do Before 2.4: Saga Continues) , and there was a brief exchange in Issue #72, Section #6 (3 Jun 2000: More VM Bug Hunting) .
Rik van Riel, the main VM maintainer, agreed with Stephen, and added that only one known bug remained with the current 'shrink_mmap()'. Andrea Arcangeli, author of the classzone patch (the only patch confirmed to solve the 'kswapd of death' problem), also replied to Stephen. He remarked that the "strict zone" approach preferred by Rik and others, would have higher loads anyway just by its nature. He described an exploit, "You boot, you allocate all the normal zone in cache doing some fs load, then you start netscape and you allocate the lower 16mbyte of RAM into it, then doing some other thing you trigger kswapd to run because also the lower 16mbyte are been allocated now. Then netscape exists and release all the lower 16m but kswapd keeps shrinking the normal zone (this shouldn't happen and it wouldn't happen with classzone design)." He went on, "I think Linus's argument about the above scenario is simply that the above isn't going to happen very often, but how can I ignore this broken behaviour? I hate code that works in the common case but that have drawbacks in the corner case. It would be better if I wouldn't know what the current code is doing, then I could accept it more easily."
Rik agreed that there was a theoretical load increase, but that it was balanced out by other benefits. He went on:
Let me summarise the drawbacks of classzone and the strict zone approach:
Strict zone approach:
Here you'll see that both systems have their advantages and disadvantages. The zoned approach has a few (minimal) performance disadvantages while classzone has a few stability disadvantages. Personally I'd chose stability over performance any day, but that's just me.
The big gains in classzone are most likely from the _other_ changes that are somewhere inside the classzone patch. If we focus on merging some of those (and maybe even improving some of the others before merging), we can have a 2.4 which performs as good as or better than the current classzone code but without the drawbacks.
Andrea replied point-by-point, disagreeing with the drawbacks Rik found with classzone. He argued that it was strict zone, and not classzone, that had the incorrect behavior. He put it, "Classzone provides the correct behaviour but at a potentially major fixed cost during allocations/deallocations and the lock is not per-zone anymore. However this additional information that we collect we'll avoid us to waste CPU and memory so it's not obvious that classzone will decrease performance."
Elsewhere Rik and Andrea had a long angry staircase, in which they both had to step back at times and take a deep breath before going on. Finally, they were unable to see eye to eye on the technical points, and the thread ended inconclusively. Since Rik is the official maintainer of the current code, the burden of proof seems to rest on Andrea, if he wants to see his classzone patch in the main tree.
12. Alan Cox Not Updating EXTRAVERSION In -ac Patches
13 Jun 2000 - 19 Jun 2000 (9 posts) Subject: "Alan, tie a string around your finger"
People: Xuan Baldauf, Alan Cox, Garst R. Reese
Garst R. Reese chastized Alan Cox, saying that Alan kept forgetting to set the EXTRAVERSION variable, to indicate the new ac version numbers. Xuan Baldauf agreed, and explained:
everytime you forget to update EXTRAVERSION, I get my modules overwritten in the wrong place. As a consequence, I cannot use the modules on the yet running ac18 anymore, because the modversions are for ac19, not ac18.
So *please* use a script which automatically increments EXTRAVERSION (I'm sure you use a script to produce patches). If that's to difficult to act on a whole file, create a file "extraversion", and import it into the makefile. It's not important how you do it, but please do it.
Alan had no reply.
13. Alan's Latest List Of Things To Do Before 2.4 Can Come Out
13 Jun 2000 - 17 Jun 2000 (64 posts) Subject: "Semi up to date JOBS list"
Topics: Compression, Disk Arrays: RAID, Disks: IDE, Disks: SCSI, FS: FAT, FS: NFS, FS: NTFS, FS: UMSDOS, FS: devfs, FS: ext2, FS: ramfs, Forward Port, I2O, Networking, PCI, Power Management: ACPI, Real-Time, SMP, Samba, Security, USB, Virtual Memory, VisWS
People: Alan Cox, Jeremy Katz, Richard Gooch, Rik van Riel, Rogier Wolff, Alexander Viro
Alan Cox posted the most recent version of his list of things to do before 2.4 could come out. He listed:
Should Be Fixed (Confirmation Wanted)
Capable Of Corrupting Your FS
E820 memory setup causes crashes/corruption on some laptops Use PCI DMA by default in IDE is unsafe (must not do so on via VPx x<3)
Boot Time Failures
Obvious Projects For People (well if you have the hardware..)
Fix Exists But Isnt Merged
To Do But Non Showstopper
Probably Post 2.4
Jeremy Katz replied with an entry to add to section 4 (boot time failures). He said, "Add "AIC7xxx driver doesn't work with Western Digital drives". This was fixed in Doug's 2.2.x drivers (the 5.1.29 one was the one with the fix iirc) but it hasn't been forward ported to 2.3 yet."
To item 8.3 (Devfs races, Sockfs (removing NULL ->i_sb stuf) (Al Viro)), Richard Gooch replied, "Actually, I've been working on fixing the devfs races. My latest patch improves things a lot."
To item 3.1 (Fix module remove race bug (mostly done - Al Viro)), Alexander Viro replied that this was in progress.
To item 14.3 (VM needs rebalancing or we have a bad leak), Rik van Riel replied:
There are two small things still needed to be done.
(I could have finished the active/inactive/scavenge queue thing a week ago, but I want the code to be obvious, readable and understandable for the untrained eye ... in the long run I want all VM code to be easily readable and maintainable, if only so it's easy to spot and fix bugs when somebody needs to do some modifications)
Alan added to Rik's list, item 3: "Figure out why it all went to shit about ac14." Rik pointed out that there had been virtually no VM changes since ac10, and Alan replied, "I suspect the ac10 change may be the actual one that did the damage." They went on to have a brief hunt for the exact time of breakage.
14. Dell Binary-Only Drivers May Go Open Source
14 Jun 2000 - 15 Jun 2000 (8 posts) Subject: "Compiling Linux 2.2.16+ with Dell Proprietary PERC 3/Si Raid Device"
People: Byron Stanoszek, Matt Domsch, Rik van Riel, Alan Cox
Byron Stanoszek asked:
I have a couple of Dell PowerEdge 4350 computers with a PERC 3/Si raid controller installed as the boot device. I know about Dell's website that has a binary-only version of their percraid.o module, but they provide no means of upgrading this kernel to 2.2.15 or 2.2.16 while using the same binary driver.
Is there any chance that anyone is working on an open-source driver that can be included into the kernel to enable support for this device in the future?
There were several replies, including one from Dell representative Matt Domsch, who said, "Yes. Dell and our partners have been working on making this driver open-source. We're in the "seriously-pound-it-until-you-find-most-bugs" stage, and it's not quite ready for release. We understand the concern, and are anxious to make this driver ready for everyone. Supporting binary-only drivers (particularly with MODVERSIONS enabled) is really really hard, and I'm looking forward to dropping that one from my list of worries."
Alan Cox confirmed that he'd heard a rumor about the driver being Open Sourced at some time in the future. Nicholas Marouf asked what he and others could do to speed the process along, and Rik van Riel replied, "Not much. The only thing we can do is advice people to buy elsewhere until the Dell hardware is properly supported."
15. Possible Solution For Recent VM CPU Hogging
15 Jun 2000 - 18 Jun 2000 (14 posts) Subject: "kswapd at 96% CPU on my 16Mb system"
Topics: Virtual Memory
People: Rik van Riel
Kees Bakker reported bad 'kswapd' performance on 2.4.0test1-ac17 (see Issue #73, Section #11 (12 Jun 2000: Developers Argue Over Virtual Memory: 'classzone' Vs. 'strict zone') in this issue). After some back-and-forth, in which Kees posted some log output, Rik van Riel said:
Ahh, I see the problem. The function do_try_to_free_pages() continues to free pages long after we have reached enough free memory.
The problem is that if that happens, shrink_mmap() will loop for a long long time but at the same time refuse to free pages because zone->free_pages > zone->pages_high.
In effect, shrink_mmap() enters something quite close to an infinite loop ...
Sharon And Joy
Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.