Kernel Traffic #147 For 24 Dec 2001

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1200 posts in 5176K.

There were 435 different contributors. 207 posted more than once. 158 posted last week too.

The top posters of the week were:

1. The VM Subsystem: The Saga Continues

10 Dec 2001 - 13 Dec 2001 (43 posts) Archive Link: "Re: 2.4.16 & OOM killer screw up (fwd)"

Topics: Version Control, Virtual Memory

People: Marcelo TosattiRik van RielAndrea ArcangeliAndrew MortonDaniel PhillipsHenning P. SchmiedehausenAlan Cox

Marcelo Tosatti asked Andrea Arcangeli to follow up on any Virtual Memory Subsystem issues that arose, and added, "Just please make sure that when sending a fix for something, send me _one_ problem and a patch which fixes _that_ problem." In the course of discussion, Rik van Riel added:

Andrea, it seems -aa is not the holy grail VM-wise. If you want to merge your good stuff with marcelo, please do it in the "one patch with explanation per problem" style marcelo asked.

If nothing happens I'll take my chainsaw and remove the whole use-once stuff just so 2.4 will avoid the worst cases, even if it happens to remove some of the nice stuff you've been working on.

Andrea replied, "it may be not a holy grail in swap benchmarks and flood of writes to disk, those are minor performance regressions, but I have no one single bug report related to "stability"." He added, "as far as I'm concerned 2.4.15aa1 and 2.4.17pre?aa? are just rock solid and usable in production. We'll keep doing background benchmarking and changes that cannot affect stability, but the core design is finished as far I can tell." Andrew Morton replied, "Your patch increases the time to untar a kernel tree by seventy five percent. That's a fairly major minor regression." Elsewhere, he said, "Bugs which are only fixed in -aa aren't much use to anyone. The VM code lacks comments, and nobody except yourself understands what it is supposed to be doing. That's a bug, don't you think?" Andrea replied:

Lack of documentation is not a bug, period. Also it's not true that I'm the only one who understands it. For istance Linus understand it completly, I am 100% sure.

Anyways I wrote a dozen of slides on the VM with some graph showing the design of the VM if anybody can better learn from a slide than from the code.

I believe the slides are useful to understand the design, but if you want to change one line of code slides or not you've to read the code. Everybody is complaining about documentation. This is a red-herring. There's no documentation that allows you to hack the previous VM code. I'd ask how many of the people happy with the previous documentation were effectively VM developers. Except for some possible misleading comment in the current code that we may have not updated yet, I don't think there's been a regression in documentation.

Rik replied, "Without documentation, you can only know what the code does, never what it is supposed to do or why it does it. This makes fixing problems a lot harder, especially since people will never agree on what a piece of code is supposed to do." Andrea countered:

I only care about "what the code does" and "what are the results and the bugreports". Anything else is vaopurware and I don't care about that.

As said I wrote some documentation on the VM for my last speech at the one of the most important italian linux events, it explains the basic design. It should be published on their webside as soon as I find the time to send them the slides. I can post a link once it will be online. It shoud allow non VM-developers to understand the logic behind the VM algorithm, but understanding those slides it's far from allowing anyone to hack the VM.

I _totally_ agree with Linus when he said "real world is totally dominated by the implementation details". I was thinking this way before reading his recent email to l-k (however I totally disagree about evolution being random and the other kernel-offtopic part of such thread :).

For developers the real freedom is the code, not the documentation and the code is there. And I think it's much easier to understand the current code (ok I'm biased, but still I believe for outsiders it's simpler).

Daniel Phillips replied, "Judging by the number of complaints, it's not easy enough. I know that, personally, decoding your vm is something that's always on my 'things I could do if I didn't have a lot of other things to do' list. So far, only Linus, Marcelo, Andrew and maybe Rik seem to have made the investment. You'd have a lot more helpers by now if you gave just a little higher priority to documentation." He also suggested that Andrea post his slides on the web right away instead of waiting. Andrea did so, and gave a link to a tarball (ftp://ftp.suse.com//pub/people/andrea/talks/english/2001/pluto-dec-pub-0.tar.gz) .

Elsewhere, Henning P. Schmiedehausen was appalled by Andrea's lack of concern for documentation, particularly that Andrea didn't consider that lack to be a bug. He said, "I'm not happy about your usage of magic numbers, either. So it is still running on solid 2.2.19 until further notice (or until Rik loses his patience." Rik replied, "I've lost patience and have decided to move development away from the main tree. http://linuxvm.bkbits.net/." Alan Cox complained about the use of BitKeeper on that site, so Rik put the patches up on http://surriel.com/patches/.

2. Historical Digression

12 Dec 2001 - 13 Dec 2001 (9 posts) Archive Link: "Where does 'vmlinuz' come from?"

Topics: Compression, Virtual Memory

People: Jesse PollardPozsar Balazs

Pozsar Balazs asked where the name 'vmlinuz' came from. Jesse Pollard explained:

It is partly historical:

Original boot on PDP-11, the kernel was kept in the file /unix (date was mid to late 1970s)

When virtual memory was added it was changed to /vmunix (early 80s I think) to distinguish the difference on those systems that could do both (Mid 80s I had a Motorola 68020 that still used /unix since the VM hadn't been finished yet).

Then on to Linux, which added compression. Since the name UNIX (in all it's forms) was copyrighted and couldn't be used to name the system the OS became linux, and, following the progression, vmlinux hence - with compressed files having a Z or gz extension - vmlinuz

3. Approaching 2.4.17

13 Dec 2001 - 17 Dec 2001 (24 posts) Archive Link: "Linux 2.4.17-rc1"

Topics: FS: ReiserFS, FS: devfs

People: Marcelo TosattiRichard GoochDaniel Phillips

Marcelo Tosatti announced 2.4.17-rc1, saying:

I've just copied 2.4.17-rc1 to ftp.kernel.org... Its mirroring yet, probably.

Well, I want people with the "unfreeable" buffer/cache problem to confirm with me that 2.4.17-rc1 is working ok.

The same change which should fix that problem also should make 2.4 a bit less "swap happy".

Daniel Phillips asked if there would be an -rc2, and several folks went over various problems still remaining to be fixed. Elsewhere, Marcelo replied to Daniel, "Yes there will. There have been reiserfs bug reports (I'm waiting for the fix for -rc2), and I'm waiting for Richard's patch to fix a devfs update issue. I want to test those before 2.4.17." Richard Gooch said he had the devfs patch, but was just waiting for some bug reports from various testers. He added, "I'm pretty confident that my current patch is an improvement, even it it doesn't fix everything." Marcelo replied, "Ok, as soon as you get the reports from people, please send me the patch or tell me its broken :)"

4. Some Developers Unhappy With Linus

15 Dec 2001 - 18 Dec 2001 (11 posts) Archive Link: "PDC20265 IDE controller trouble"

Topics: Disks: IDE

People: Andre HedrickRene RebeJeff GarzikBenjamin LaHaise

In the course of a bug hunt, it was agreed that the bug did exist, and Andre Hedrick said bitterly, "Well blame that on the folks that are not taking kernel code that will allow you to solve this problem. Linus is the number one offender." Rene Rebe replied, "Maybe under Marcelo this might change ..." But Jeff Garzik objected:

Linus is taking some patches and not others right now... so what? A couple of my patches, isolated and clearly unrelated to bio and mochel's driver work, made it in. Others got dropped.

I see several people (not just you Andre) whining about the dropped patches, when it seems to clear to me that only a few things in specific areas are getting applied right now. For you specifically, Andre, Jen's patches have been slated for 2.5.x for a while, so it seems blindingly obvious that he would not take your IDE patches at least until the bio subsystem is finished and clean, since you IDE patches would clearly depend on the bio changes.

I do not believe this as a personal condemnation of your patches, or bcrl's, or anyone else's.

Patience is a virtue ;-) We have a long devel series in front of us and we are only at the pre-patches to the FIRST 2.5.x release.

Regarding the length of the development cycle, Benjamin LaHaise remarked, "There is no reason not to have a 6 month devel cycle, and plenty of reasons in favour of it. If people aren't going to bother reviewing patches in a timely fashion, they should tell people when a good time to resend patches is. Given the whole vm fiasco in 2.4 (which is still a mess and falling apart for heavy loads) which stems from a lot of random direction with patches, I hope that some of the underlying problems will get fixed. But it really doesn't look that way." Close by, Gunther Mayer recommended a patch to fix the original bug, and Andre spat out, "I acknowledge the validity of the patch to you and Linus and agreed for its need. As you can see he has not got a clue nor could you sell him one. His additude toward laptops is /dev/null, otherwise he would have taken the patches a long time ago and had the infrastructure for proper APM calls in place."

5. Some Discussion Of Linus' Development Philosophy

16 Dec 2001 - 18 Dec 2001 (14 posts) Archive Link: "2.5.1 - intermediate bio stuff.."

Topics: Disks: IDE, Disks: SCSI, Kernel Release Announcement, Networking, SPI, Virtual Memory

People: Linus TorvaldsAndre HedrickTroy BenjegerdesLarry McVoyRik van Riel

Linus Torvalds announced Linux 2.5.1, saying:

I just made a 2.5.1, but I'm still concentrating on bio stuff, so don't bother sending me other patches unless they are serious bug-fixes to something else.

2.5.1 is hopefully a good interim stage - many block drivers should work fine, but many more do not. However, the pre-patches were getting largish, so I'd rather do a 2.5.1 than wait for all the details.

As to other stuff - note the separation of drivers for new and old tulip chips: if you have an old 2104x tulip chip (as opposed to the newer 2114x chips) the regular tulip driver doesn't work any more for you. Don't be surprised, select CONFIG_DE2104X.

Andre Hedrick asked:

to be completely clear on your point, you do not want any patches that describe the rules for driver "domain validation". Next, you do not want any patches that fix gross things, too. IE, exiting of any ISR's to perform BH events. Noting that one is not able to kludge it anymore, the solution is to cut off the beast and start from scratch.

Now the significance of driver "domain validation", in block/storage is the inner-play with the VM layer via a swap partition or file.

Until you can validate the new block io is correct at the data-transport layer, where the requests are converted to the actual data-io to the disk you have nothing but a WAG. You will also have no way to separate issues of FS/Memory corruption should it not be gone yet. Otherwise you have to disable any and all forms of SWAP real or file.

Since there is no way to validate the drivers and many believe it is not important to perform such tests, how can you assure any one given user their data is safe? Right now you are giving the impression that you do not care about data integrity, and refusing to acknowledge this will further prove you are in the same camp.

I remember all the crap taken over FS Corruption in the past, and now present to you a perfect driver and a way to authenticate the data transport and you thumb down the idea, directly or indirectly. I had plans to try and do the same for SCSI to become compliant to SPI 4, but given the total rejection of layer isolateion for regression testing it does not seem practical. This is stated because the simple case is being rejected so I see no way to even present the more complex case ever.

So do us all the favor by answering and explaining your position on the scale of this sensitive issue. I am sure everyone would like to hear your views on the need or useless bloat that would result from having a testable diskdrive data transport layer.

My bets are on you will call it "useless bloat".

To Andre's third paragraph, Troy Benjegerdes replied:

Translation: Andre has been in a few too many ATA meetings and can't think without using storage industry insider-speak ;)

I only had a 6 months internship in storage, but I believe what he's talking about are sound engineering principles.

The first of which is, if we are trying to find a problem in a complex system, you try and isolate that system from influences of others. And if you are trying to prevent new problems from showing up, you try and test each component of a complex system as an ongoing process.

Andre is focusing on the block IO layer here, because that's his area of expertise. I think he points out a symptom of a problem that needs to be addressed for damn near every area of the kernel.

We REALLY need to have some sort of coherent strategy for testing different components to determine whether they are worth putting in the mainline kernel, and catch bugs sooner. Yes, given enough eyeballs, all bugs are shallow, but given a little effort on setting up a an ongoing test system, we can reduce the workload of the 'core' kernel people by not having to have them sift through a bunch of useless bug reports because a user didn't know what we all know about debugging.

We need to have some way of isolating different subsystems, and a catalog of 'regression tests' to verify that new changes aren't causing subsystems to fail. I don't expect regression tests to be able to catch every possible mistake, but I *DO* expect that we should be able to catch every mistake we have previously made. This way a core kernel person only has to look debug a problem once, and write a test to catch it, instead of seeing the same problem over, and over, and over again from 300 different users.

To Troy's first paragraph, Linus smiled, "We know ;)" And to Troy's second paragraph, Linus said:

No. Sound software engineering principles is to design good interfaces, and make the low level code adhere to them.

Andre comes from the other end - he writes and talks about low-level code, and thinks that should drive how upper layers work.

Larry McVoy contrasted Linus' first paragraph with an earlier statement by him, "I _am_ claiming that the people who think you "design" software are seriously simplifying the issue, and don't actually realize how they themselves work." Larry asked, "So which is it?" Rik van Riel quipped, "It must be the latter, since Linus has always stated a preference for simplifying issues. Oh wait, that one is incompatible with both ;)" But Linus said to Larry, "Can you go back, and _read_ the messages? In particular, microscopic vs macroscopic." But Larry replied:

I read them, I read them again, and I've read them a third time. If you want, I'll put together a summary of your statements so that you can read what you wrote again and think about it. You can wiggle all you want, Linus, your statements were clear and you are trying to have it both ways. But I doubt you'll admit it, nobody likes to look foolish, so let's let it go.

What I would like is for you to make a clear statement about what you think is a good way to approach systems problems. You've bounced from "design is good" to "design is bad" and I don't want to nitpick, I want to know what you think. After you've *thought* about it, as opposed to just some kneejerk reaction for effect.

I'm not alone in this, either. Since you are the final decision maker on what goes in, many people in the world would like to know how to do things "correctly" from your point of view.

A thoughtful writeup on how to make something happen in the Linux kernel would be well received.

There was no reply.

6. Developer Unhappiness With Linus

17 Dec 2001 - 18 Dec 2001 (7 posts) Archive Link: "Limits broken in 2.4.x kernel."

People: Andrew MortonAlan CoxRik van RielJustin Piszcz

Justin Piszcz reported that per-user process limits didn't seem to be working correctly in the 2.4 tree. Andrew Morton owned up to having caused that particular breakage, and said he hadn't realized that some changes he'd made would have that effect. He said, "I didn't have a clear reason for moving the UID to root's - it just didn't seem a good idea to have kernel threads running with non-root UIDs. But we have a reason now - process accounting." He said he'd do a patch for it. Alan Cox also replied to Justin, saying that regarding the available fixes, "Linus kept rejecting it. Now we have Marcelo as 2.4.x maintainer I'll look at submitting it. 2.5 will no doubt stay broken for a while." Rik van Riel quipped with glee, "One of the things to remember for when marcelo takes over 2.6, I guess ;)" And Alan replied soberly, "Not what I meant - process counting is not block I/O stuff." End of thread.

 

 

 

 

 

 

Sharon And Joy
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.