Kernel Traffic #24 For 24�Jun�1999

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1475 posts in 6146K.

There were 475 different contributors. 211 posted more than once. 166 posted last week too.

The top posters of the week were:

1. Virtual Memory Performance Patch For 2.2.x

6�Jun�1999�-�14�Jun�1999 (11 posts) Archive Link: "[patch] 2.2.9_andrea-VM1.gz"

Topics: FS: ext2, Virtual Memory

People: Andrea Arcangeli,�Chuck Lever,�Peter Steiner

Andrea Arcangeli posted a big virtual memory patch for 2.2.x (ftp://ftp.suse.com/pub/people/andrea/kernel-patches/) , to fix high-load performance problems that had actually caused some folks to downgrade to 2.0.x. Peter Steiner reported the strange problem that the system was very stable and responsive under high loads, but tended to crash when idle. He included some strace output, and Andrea fingered the culprit as ext2/truncate, which was buggy under 2.2.9, although the bug remained dormant without his patch (and has since been fixed in 2.3.x). He posted a small patch just for that problem, and uploaded a new VM patch with the fix included. Peter gave it the thumbs up, then discussed implementation with Chuck Lever.

2. Ooooooo!

8�Jun�1999�-�12�Jun�1999 (131 posts) Archive Link: "Profanity in the Linux Kernel?!?!?"

Topics: Humor, Mailing List Administration, Microsoft

People: David S. Miller,�Barry Treahy,�Jeff V. Merkey,�Linus Torvalds,�Alan Cox,�Ingo Molnar

Scott Jaderholm felt it would be a good idea to take certain words of low usage out of the kernel sources. David S. Miller replied, "This topic came up several times before, and the result is that arbitrary censorship of code/comments is not going to happen. The only exception is in kernel messages which can reach the user in the message logs, and those have been cleaned up." Barrett G. Lyon did a grep that showed a number of places where profanity was explicitly printed, but Ingo Molnar explained that those prints were theoretically impossible, and were there just for debugging purposes.

Barry Treahy put forth the idea that "people are working hard to present Linux as an acceptable component in corporate environments and that anything which casts a negative impression about the Linux makes it much harder to fight corporate bean counters that own MicroSloth stock."

Jeff V. Merkey replied, "Look at Microsoft, now there's a paragon of morality -- their predatory and ruthless behavior has not slowed or hindered sales of their products at all, in fact, it's their behavior that has made them the most powerul and successful sfotware company on the planet.."

Meanwhile, Linus Torvalds made his official pronouncement:

I'll just re-iterate my standpoint on this before it gets out of hand..

Profanity and bad taste can be fine. It's usually not a problem, and the only cases where it should be =really= avoided is in messages to the user that aren't absolutely lethal this-should-not-ever-happen kind.

But I feel that trying to be too politically correct is a much worse disease than =any= amount of profanity, and I'm personally much happier seeing the output of a grep like the above than I would be trying to clean it up in the name of PC.

That means, for example, that I won't accept cleanup-patches from anybody else than the maintainers of the specific subsystems. And I won't do the cleanup myself. I _might_ reject a patch because I felt is was too foul-mouthed, but to be quite frank I don't think I have ever done so.

So don't worry. People have sometimes worried that it is "unprofessional" to use profanity, but if you think professionals don't swear you've either been living in a monestary or playing golf your whole life ;)

I'll start worrying when the profanities start to occupy a noticeable amount of kernel space.

Elsewhere, Alan Cox also gave his opinion:

You should see non Linux source code if you think that is a problem.

Linux 2.2 never prints any message that contains obscenities.

Its all very silly anyway. Things you class now as obsceneties are regarded as standard by most of the English speaking world. Indeed if anything they simply had a dip in usage. They were routinely used by the people drafting your US constitution, and publically so.

Anyway if it offends you the license allows you distribute your own 'fuck free linux' patch.

The thread went on and on.

3. Major 'fsck' And 'rm' Speedup Vs. Small Slowdown Of Normal Data Operations

10�Jun�1999�-�22�Jun�1999 (19 posts) Archive Link: "Speeding up fsck 2 times"

Topics: FS

People: Jan Kara,�Stephen C. Tweedie,�Steve Bergman,�Pavel Machek,�Stefan Monnier

Pavel Machek posted a patch to allocate indirect blocks close to one another, which would double the speed of 'fsck' and triple the speed of 'rm'. He warned that you couldn't just start using the patch. It would only work after doing a mke2fs so the new layout would take place.

Steve Bergman posted success. Pavel wasn't sure if he should try to get it in the main kernel, but Stefan Monnier said that as long as the benchmarks for the worst case weren't too bad, he felt the patch should go in. Jan Kara gave his anaylysis, as, "Worst case would be reading two blocks which are on the "reference block boundary" - one block referenced as last one in the block so for next one we have to read in reference block which would mean seek - read block - seek - read data block instead of current read block - read data block. But this is not very probable case so the worst reasonable case would be linear reading (long at least a few MB) of file large enough in the part where are triple indirects... But as we are going to loose only each 256KB when using 1KB blocks I don't think we will loose more than a few percents (my personal estimation is <5%)."

Stephen C. Tweedie dropped a fly in the ointment, pointing out that the previous behavior of having indirect blocks spread over the disk, kept the blocks "close to the data, though. Placing indirect information in a separate cluster of blocks may make it easier to do metadata-only operations like fsck and unlink, but it will just slow down things which actually access data too. That seems like a crazy thing to want to do!"

Pavel felt the data operation slowdown would be insignificant compared to the metadata operation speedup, but no one could really say for sure without proper benchmarks. No one did any proper benchmarks, although the folks who tried the patch noticed approximately a 100% speedup of things like 'fsck' and 'rm', and only a 5% slowdown of data operations for actual user programs.

There was no real resolution during the thread.

4. More FireWire Conflicts And A Painful Resolution

10�Jun�1999�-�15�Jun�1999 (4 posts) Archive Link: "Linux IEEE-1394 (FireWire) clarification"

Topics: Disks: SCSI

People: Emanuel Pirker,�Andreas Bombe,�Srdjan Sobajic

Last week in Issue�#23, Section�#8� (5�Jun�1999:�FireWire Conflicting Development) there was some discussion of FireWire development, and this week Emanuel Pirker, the first developer to make the attempt, had some words to say.

He disagreed with Andreas Bombs that his code had accomplished nothing, and in fact, "I have a working subsystem (there are some points I dislike about it but it works) and a working AIC-5800 driver. The AIC driver does outgoing and receiving transactions without problems." He added, "One person got IP packets over this driver." He felt that the thing to do was merge the projects. He had nothing to say about Srdjan Sobajic's extension of his own work; Emanuel was not subscribed to the list, and may have missed the rest of last week's discussion.

Andreas replied unsympathetically that his own code was better, and that Emanuel had primarily copied code he hadn't understood from the SCSI subsystem. But he did admit that Emanuel's Adaptek code was valuable, and he'd like to see it migrate to his subsystem.

Emanuel felt that he had understood the SCSI code well enough, and added, "Andreas, I am very glad you are in Linux-1394 business. But don't try to reinvent the wheel. If your subsystem is really THAT better, I'll move my Adaptec code to your structure ASAP. Quality wins, that's Linux development works. But please understand that I _thought_ about what I did and by doing everything again you'll make mistakes again I did or even other mistakes."

Andreas replied, "Look, I looked around your code and have a fairly good idea of how it works. Maybe you should download my patch and take a closer look at it and see how I do things. Judging from some of your concerns I don't think you did already."

End Of Thread.

However, under the Subject: IEEE1394: Development issues (http://kernelnotes.org/lnxlists/linux-kernel/lk_9906_03/msg00932.html) , Emanuel said:

some kind of debate was on the lkml which has been already misunderstood by some people ("conflicting development"). Well, development is not conflicting, but quality wins. Andreas Bombe was too annoyed about the old, from SCSI derived code in the Linux-IEEE1394 subsystem that he began writing a new one, focusing on TILynx development.

This means:

So, there is NO development conflict. I have been out of business for quite a time, but the old code will be replaced by Andreas' code. Quality wins. I will bring my Adaptec AIC-5800 to the new architecture. I do not say "port", because the ideas of the new subsystem code are mainly the same. Only the code is much better. Of course there are some issues still to be resolved, but these are only technical issues.

Alas, I have failed to complete the whole thing alone while this project was "official" and part of my studies. It is much work and I was much too inexperienced. Well, now there are many developers out there. All of the issues Subsystem, TILynx, OHCI, AIC-5800, IP-over-1394 are covered by developers. There is work ongoing for usage of cameras, but I am not informed well about this. Volunteers for SBP-2 (and as a consequence, IEEE1394 hard drives) are still searched :-)

I am approaching the end of my graduate program. Chances are good that I can continue work on Linux1394 (at least more than I could do the last months). There is still much to do and I like the project very much, I want to continue, though I can not guarantee on it yet. That's why I am especially glad, that Andreas joined, because he seems to have enough resources to maintain the subsystem well.

5. Unwelcome Optimizations For Many Threads

11�Jun�1999�-�15�Jun�1999 (7 posts) Archive Link: "More new schedule() results ..."

People: Ingo Molnar,�Davide Libenzi

Davide Libenzi posted the thread switching speed results for his new schedule() implementation. The results were interesting. For very small number of threads (around 2) his patch was actually around 15% slower than straight 2.2.5; but for 10 or more threads he noticed a gain of at least 25%; for 450 threads the speedup was generally 45%, and sometimes as much as 80%.

Ingo Molnar said this was what he expected, and pointed out, "Most systems (even loaded servers) have typically less than 5 runnable processes. So those systems will see 15% scheduling slowdown. Some applications might use many threads - for those cases your patch is a nice improvement."

An irate Davide replied:

Sorry, but are we building a new version of MS-DOS here ???

Linux is well known as good server platform and You want to say me that more of Linux users will fall the 2 thread case ??!?!

In a schedule() algo filled of gotos to get a better prefetch queue that can improve speed no more then 10 % I post a patch that on tipical Linux machines will lead to a 30 up to 80 % of increasing performance.

Now one of the two things must be true:

  1. I'm crazy
  2. I'm in the wrong place

Let me know.

There was not a lot of discussion, but everyone agreed with Ingo, so the patch probably won't automatically go into the main tree.

6. New ioctl For Advanced Filesystems

11�Jun�1999�-�17�Jun�1999 (11 posts) Archive Link: "RFC: from FIBMAP to FIONDEV"

Topics: Disk Arrays: RAID, FS: ReiserFS, Ioctls

People: Werner Almesberger

Werner Almesberger posted a description of FIONDEV, a replacement for the old FIBMAP ioctl ("Returns the block number in the fs corresponding to the argp'th block in the file. (http://step.polymtl.ca/~ldd/syscalls/syscalls_17.html) "), which would be better able to handle the advanced filesystem structures of reiserfs, md, etc. He anticipated a several year migration from FIBMAP to FIONDEV, and posted a long description.

No one had any major objections to the idea, but everyone agreed that RAID (other than RAID-1) would be a problem. Werner acknowledged he wasn't quite sure what to do about RAIDs, and there was some discussion. No one found a solution, but neither did anyone feel that this was a real impediment to the project. My impression was that there was simply no alternative, and the developers were trying to arrange for as much functionality as possible.

7. Renovating 'mount'

11�Jun�1999�-�14�Jun�1999 (8 posts) Archive Link: "sys_mount cleanup"

Topics: Backward Compatibility

People: Alan Cox,�Matthew Wilcox,�Alexander Viro,�H. Peter Anvin

Matthew Wilcox asked about the consequences of ditching some backward compatibility left over from 1.0.0's sys_mount() expecting 3 arguments rather than 5. Alan Cox replied, "Traditionally we've put in a bitch about something being obsolete for one stable version then squashed it the next," but Alexander Viro pointed out that to be bitten by this particular change you'd have to be running a 5-year-old mount() under a bleeding edge kernel. He suggested putting a warning in 2.2.10, and changing the functionality for 2.4.

Meanwhile, H. Peter Anvin asked if the fmount() syscall could be implemented, which would get rid of some races. But Matthew explained that he and Alexander had been working on a way to replace mount() that would also obsolete fmount(). Neither he nor Alexander gave any more information than that.

8. Possible CDROM Changes

12�Jun�1999�-�15�Jun�1999 (4 posts) Archive Link: "UDF patch to 2.2.9"

Topics: Disks: IDE, Disks: SCSI, FS

People: Dave Boynton,�Grant R. Guenther,�Jens Axboe

Dave Boynton announced and gave a pointer to the latest UDF filesystem patch (http://trylinux.com/projects/udf/) (for DVDs and packet-written CDRWs, etc). He added, "I'd like to take this opportunity to plead for some needed updates to the SCSI and IDE cdrom modules. The UDF module currently makes direct calls to the SCSI driver in some places that make it mostly incompatible to IDE drives. It would certainly be cleaner if the functions we need were implemented in the Uniform CDROM interface. What we need is inclusion of the SCSI3/mmc2 command set, which is used by DVDs, CDRs, CDRWs, etc."

Jens Axboe, the Linux CDROM maintainer, agreed, and said something should be appearing within a month. Grant R. Guenther added, "Don't forget about the parallel port ATAPI drives. There are a lot of HP-7200e users out there who'd love to see a UDF implementation. Once there's a standard interface in the Uniform CD-ROM layer, and a reference SCSI implementation, I'll add the support to 'pcd'."

9. The State Of The Bleeding Edge

13�Jun�1999�-�14�Jun�1999 (10 posts) Archive Link: "pre-2.3.7-1 fails compile"

Topics: FS: ext2

People: Linus Torvalds,�Andrea Arcangeli,�Ingo Molnar

Pete Clements got compiler errors on 2.3.7-pre1, and posted some 'make' output. Linus Torvalds explained:

I'd like to point out that the current pre-2.3.7 series is fairly experimental. As amply demonstrated by the filename (the "dangerous" part in the filename hopefully made some people go "Hmm..").

We're working on re-architecting (or rather, cleaning up so that it works like it really was supposed to) the page cache writing, and as a result a number of filesystems are probably going to be broken for a while unless we get people jumping in to help.

Right now 2.3.7-1 (aka "dangerous") is not stable even with ext2, in that swapping doesn't work. Ingo just sent me patches to fix that, and I'm hoping to remove the "dangerous" part from 2.3.7-2, but even then a number of filesystems will be broken.

We _may_ end up just re-introducing the "update_vm_cache()" code for filesystems that really don't need the added performance, but it would actually be preferable if people really wanted to make them perform well with the new direct write-through cache code.

Andrea Arcangeli replied that his admittedly introductory view of the code suggested other causes of FS corruption. He and Ingo Molnar had a brief technical discussion, and Linus also said:

the real problem to some degree is that we just got this working. I finally have patches that =really= fix the writable mapping case, and only now do we have a system that appears really stable.

We'll end up fixing up small things, both performance- and correctness- wise for the next two months, I'm sure. This is going to be one of the "big changes" between 2.2 and 2.4, so don't expect it to be completely painless. You haven't seen the rather broken versions that Ingo has been fighting for the last few weeks - you're only seeing the end result ;)

10. Problems With 'patch'

14�Jun�1999�-�16�Jun�1999 (31 posts) Archive Link: "Patch 2.2.10 is wrong"

Topics: POSIX, Version Control

People: Linus Torvalds,�Alan Cox,�Jes Sorensen

Ricardo Galli Granada had trouble applying the 2.2.10 patch, and Alan Cox pointed out that it required at least 'patch' version 2.5. An irate Jes Sorensen replied that 'patch' 2.5 changed the behavior of command-line parameters.

Linus Torvalds posted his opinion:

I have to admit that I think the POSIX patch behaviour is less than optimal, and the first time I saw it I went "oh, crap, who came up with this idea?"

However, I've become less hateful of it. The new behaviour is exactly the one you'd want if everything goes right. It just silently does the right thing for people who use patch without ever getting patch errors or anything like that.

The new behaviour is fairly horrible if something goes wrong, though: it leaves the ".orig" file only for the files that had trouble, not the files that were successfully patched without warnings. That makes it harder to "revert" a patch that had partial problems. It's still possible, but it's definitely less user-friendly for that case.

Oh, well. The best behaviour would probably be to always do the backup files, and then if everything patches cleanly you remove the files at the very end - but if there is any problem what-so-ever you'd leave all backup files alone, even for files that were successfully patched.

This is one of the things that source control makes a non-issue, of course, so in that sense the new behaviour is more source-control- oriented.

11. Status Of Integration Of fdset Patch Into The Main Tree

14�Jun�1999�-�22�Jun�1999 (20 posts) Archive Link: "why no fdset patch in kernel?"

People: Stephen C. Tweedie,�Alan Cox,�Linus Torvalds

Brian Feeny asked why the fdset patch to allow large numbers of file descriptors and processes had not made it into the main kernel tree, since without the patch, Linux wasn't useful as a high-hit web server. Alan Cox replied that it had simply been too late for 2.2.x, and Linus Torvalds wanted certain issues addressed before he'd put it into 2.3.x.

Stephen C. Tweedie pointed out that the patch was in the 2.2.x-ac series, so folks who wanted it under 2.2 could have it. It also came out that Red Hat included the patch in the default kernel of their 6.0 distribution.

12. USB Printer Fix

15�Jun�1999 (1 post) Archive Link: "making usb printer work"

Topics: USB

People: Pavel Machek

Pavel Machek posted a patch to fix a USB printer problem. Apparently his printer wasn't being detected, and when he forced it, it just printed the same text over and over.

There were no replies.

13. FAT Patch Lingers Unapplied

15�Jun�1999 (1 post) Archive Link: "[Announcement] FAT patch for 2.2.10"

Topics: FS: FAT

People: Alexander Viro,�Linus Torvalds

Alexander Viro gave the URL to his FAT patch (ftp://ftp.math.psu.edu/pub/viro/fat-patch-21.gz) for 2.2.10; apparently there have been no changes since May, but Linus Torvalds has still not incorporated it. He added that new optimizations will be kept in a separate patch from now on.

14. mkdir() Problems And Uncertainties

16�Jun�1999�-�17�Jun�1999 (16 posts) Archive Link: "[RFC] Bug in mkdir(2)"

Topics: BSD

People: Alexander Viro,�Alan Cox,�Peter Samuelson,�Linus Torvalds

Alexander Viro reported:

Sigh... Looks like we got Yet Another Symlink Hole(tm). Not too serious one, since probably no suid-root stuff is perverted enough to trigger it, but anyway. Scenario:

$ ln -sf b a
$ ls -ld a b
ls: b: No such file or directory
lrwxrwxrwx   1 al       al              1 Jun 16 12:56 a -> b
$ mkdir a
mkdir: cannot make directory `a': File exists
$ mkdir a/
$ ls -ld a b
lrwxrwxrwx   1 al       al              1 Jun 16 12:56 a -> b
drwxrwxr-x   2 al       al           1024 Jun 16 12:58 b

In other words, if foo is a dangling symlink mkdir("/kernel-traffic/foo/index.html") will merrily follow it. Which it shouldn't.

There are 3 reasonable variants of fix and they give different error values - -ENOENT (if we are treating it as a dangling link in the middle of lookup), -EEXIST (if we refuse to follow link here and ignore the trailing /) or -ENOTDIR (ditto, but noticed that it's not a directory). Take your pick ;-) Solaris prefers the second variant and IMO it's the right thing.

BTW, rmdir("/kernel-traffic/foo/index.html") also shouldn't follow links. rmdir(1) works around that (it trims the trailing slashes), but IMHO rmdir(2) shold return -ENOTDIR here. Actually it happily follows the link.

Patching it either way is fairly trivial and I'll submit the patches as soon as you will choose the variant. I think that the right thing to do here is to -EEXIST for mkdir() and -ENOTDIR for rmdir(). Up to you, indeed.

Linus Torvalds actually didn't see any problem with mkdir following symlinks in the way it did, and felt it would be consistent with open(). Alexander replied, "Oh, well... Looks like it's a really borderline case - everybody seem to be doing whatever they want here. In situations when the last component is a link and call normally wouldn't follow it adding slashes seems to be ignored on Solaris and forces the link expansion on Linux and 4.4BSD... I still think that following the link is bogus, but after all, if somebody wants to hang let's give him the rope..."

Elsewhere, Alan Cox pointed out, "What everyone else relies on is the modern unix guarantee that mkdir does not follow links. Whats more some packages test for this property and compile differently if the mkdir is sane. Linux packages assume it is sane, so we _must_ preserve this property"

There were about 15 posts all told, but no absolute decision seems to have been made, considering that Linus and Alan disagreed but didn't discuss it.

(Thanks go to Peter Samuelson for emailing me about this: the above quote from Alan Cox was not necessarily in disagreement with Linus. Apparently, Alan was talking about the general case of 'mkdir("foo")' not following links. As Peter pointed out to me in his email, Alan really didn't weigh in on the issue of 'mkdir("/kernel-traffic/foo/index.html")' at all. His point was only that changing mkdir() to follow links in every case would be a mistake-- thanks, Peter! -- Ed: [27 Jun 1999 00:00:00 -0800]

Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.