Kernel Traffic #168 For 26 May 2002

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1608 posts in 7624K.

There were 416 different contributors. 214 posted more than once. 164 posted last week too.

The top posters of the week were:

1. Status Of 2.5 VM, ext3, And IDE Code

5 May 2002 - 21 May 2002 (283 posts) Archive Link: "Linux-2.5.14.."

Topics: Clustering, Disks: IDE, FS: devfs, FS: driverfs, FS: ext3, Hot-Plugging, Kernel Release Announcement, PCI, Small Systems, USB, Virtual Memory

People: Linus TorvaldsAndrew MortonDaniel PittmanAnton AltaparmakovAlan CoxBert HubertMartin DaleckiAndrea ArcangeliRik van Riel

Linus Torvalds announced 2.5.14, saying:

There's a lot of stuff that has happened in the 2.5.x series lately, and you can see the gory details in the ChangeLog files that accompany releases these days, but I thought I'd point out 2.5.14, since it has some interesting fundamental changes to how dirty state is maintained in the VM.

(The big changes were actually in 2.5.12, but 2.5.13 contained various minor fixes and tweaks, and 2.5.14 contains a number of fixes especially wrt truncate, so hopefully it's fairly _stable_ as of 2.5.14.)

Credit goes to Andrew Morton, and not only does it clean up the code a lot, it also seems to perform a lot better in many circumstances.

There's a lot of other stuff in the 2.5.x tree too, but few things are so fundamental. Please test (but also, please be careful - backups are always a good idea).

Bert Hubert asked about the Virtual Memory Subsystem: in the continuum between Rik van Riel's rmap code, and Andrea Arcangeli's 2.4 rewrite, where did the actual VM subsystem lie? Andrew Morton explained:

"VM" is a broad term. The memory allocator, page replacement, swap and all that stuff is unaltered - it is the same as 2.4.current. ie: Andrea's VM from when his changes stopped going into the mainline kernel.

I made minimal changes in there to teach the page allocator that all dirty memory is written back via pages and not sometimes-pages, sometimes-buffers. Also to add support for the new `clustering writeback' which address_spaces can perform.

It's probably not as well tuned as it could be at present, but I don't see a lot of point in fiddling with it. As long as the VM doesn't actually impede 2.5 development and evaulation of 2.5 performance, best to leave it alone until a VM developer steps up to do the 2.6 VM.

The change to which Linus refers is:

In 2.4, dirty data from the write(2) system call is encapsulated in buffer_heads and is placed on a global buffer list for writeout. And dirty data from shared mappings is attached to its inode.

In 2.5, the buffer list went away, and dirty data from write(2) is now managed in the same way as dirty data from mmap().

And because the kupdate and bdflush functions used to work against the buffer LRU, replacements were introduced which do the same thing against the inodes, instead of against the buffers.

So it's all page-oriented now.

Daniel Pittman, who'd been experiencing ext3 file corruption under certain circumstances, asked if he could "expect" to see these fixed in the current release. Linus replied:

"Expect" is too strong a word. I'd say "hope" - a number of truncate bugs were fixed, but whether that was what bit you, nobody knows.

I suspect the real answer is that we'd love for you to test things out, but that if it ends up being too painful to recover if the problems happen again, you probably shouldn't..

Daniel got up the courage to do some testing, and reported, "I believe that 2.5.14 is working correctly with 2k ext3 filesystems, at least for minimal use. I didn't do any sort of extreme load testing or anything like that, being cautious about it." But he did report some boot-time warnings. He and Andrew exchanged a couple more emails in search of the problem, and the subthread ended.

Also in reply to Linus' initial announcement, Martin Dalecki posted a ton of patches to the IDE code, and folks debated the various technical points. At one point Anton Altaparmakov burst out, "As the new IDE maintainer so far we have only seen you removing one feature after the other in the name of cleanup, without adequate or even any at all(!) replacements, renaming all functions to hell and back, and breaking the ide core here there and everywhere. All critical bug fixes seem to have been contributed by other people looking at your code which doesn't inspire a lot of confidence in you... Even Alan Cox said a while ago that you have his vote of no confidence (probably slightly rephrased here) because of changes you were introducing." But Linus said:

Who cares? Have you found _anything_ that Martin removed that was at all worthwhile? I sure haven't.

Guys, you have to realize that the IDE layer has eight YEARS of absolute crap in it. Seriously. It's _never_ been cleaned up before. It has stuff so distasteful that t's scary.

Take it from me: it's a _lot_ easier to add cruft and crap on top of clean code. You can do it yourself if you want to. You don't need a maintainer to add barnacles.

All the information that /proc/ide gave you is basically available in hdparm, and for your dear embedded system it apparently takes up less space by being in user space. So what is the problem?

My vote is to remove as much as humanly possible.

"Everything should be made as simple as possible, but not simpler" - Albert Einstein

Think about it, and really _understand_ it.

At one point Alan Cox pointed out, "/proc/ide has useful information in it that you can't get easily by other means at the moment - which controller is driving the disks, what devices are present etc." But Linus replied:

I'd love for somebody to add the devices to the real device tree, at which point this kind of information would be very much visible..

Right now devicefs isn't even mounted by default, but it's the only _really_ generic way of showing things like this that we have. For people who haven't seen it before, do a

mount -t driverfs /devfs /devfs

and go look in there.. In particular, if you have a PCI system with a USB device tree (or _multiple_ such trees), notice how you can look at things like

/driverfs/root/pci0/00:1f.4/usb_bus/000/

and it wouldn't be impossible (or even necessarily very hard) to make an IDE controller export the "IDE device tree" the same way a USB controller now exports the "USB device tree".

For things like hotplug etc, I think driverfs is eventually the only way to go, simply because it gives you the full (and unambiguous) path to _any_ device, and is completely bus-agnostic.

But there is definitely a potential backwards-compatibility-issue.

2. Status Of Big File Support

9 May 2002 - 17 May 2002 (41 posts) Archive Link: "[PATCH] remove 2TB block device limit"

Topics: Big File Support, Disks: IDE, FS: JFS

People: Peter ChubbAndrew MortonChristoph HellwigMartin Dalecki

Peter Chubb announced, "At present, linux is limited to 2TB filesystems even on 64-bit systems, because there are various places where the block offset on disc are assigned to unsigned or int 32-bit variables." He linked to a patch (http://www.gelato.unsw.edu.au/patches/2.5.14-largefile-patch) that would implement "a type, sector_t, that's meant to hold offsets in sectors and blocks." He added, "On an old pentium I now have a 15Tb file mounted as JFS on the loop device -- and it seems to work for almost everything. There are a few user-mode programs that'll have to be fixed (notably parted, mkfs.??? etc) to cope with the new GETBLKSIZE failure (they should use alternate mechanisms, e.g., GETBLKSIZE64, or just seek to the end of the partition and look at the offset)." Andrew Morton replied, "My vote would be: just merge the sucker while it still (almost) applies. 2TB is a showstopper for some people in 2.4 today. Obviously 2.6 will need 64-bit block numbers. The next obstacle will be page cache indices into the blockdev mapping. That's either an 8TB or 16TB limit, depending on signedness correctness."

Various other folks also liked it. Martin Dalecki, IDE maintainer, said he'd apply the IDE portions of the patch.

Later on, Peter announced a revised patch for kernel 2.5.15 (http://www.gelato.unsw.edu.au/patches/2.5.15-largefile-patch) , and Christoph Hellwig voiced his support, saying, "This looks really good, I'd like to see something like that merged soon!"

Various folks also had a fairly sizable implementation discussion.

3. Status Of kbuild

16 May 2002 - 17 May 2002 (22 posts) Archive Link: "[PATCH] Fix BUG macro"

Topics: Kernel Build System

People: Andrew MortonLinus Torvalds

In the course of discussion, folks began talking about how long it took to compile a kernel. At one point, Andrew Morton remarked, regarding a specific cause of slow-down, "The final solution to all problems is to merge kbuild-2.5 and then to teach it to use relative pathnames when performing a build within the source tree. Presumably that's not hard, but I'm surely about to learn why it's not feasible." And Linus Torvalds replied:

I'm hoping we can get there in small steps, rather than a big traumatic merge. I'd love to just try to merge it piecemeal.

Especially as I don't find the existign system so broken.

4. More kbuild Discussion

16 May 2002 - 20 May 2002 (60 posts) Archive Link: "kbuild 2.5 is ready for inclusion in the 2.5 kernel - take 3"

Topics: FS: NTFS, Kernel Build System

People: Keith OwensNicolas PitreTomas SzepeRussell KingWayne BrownDave JonesLinus Torvalds

Keith Owens tried again to get a response from Linus Torvalds regarding his kbuild work. He said:

Third and final attempt. Original sent on May 2, second mail sent on May 14, still no response from Linus.

Linus, kbuild 2.5 is ready for inclusion in the main 2.5 kernel tree. It is faster, better documented, easier to write build rules in, has better install facilities, allows separate source and object trees, can do concurrent builds from the same source tree and is significantly more accurate than the existing kernel build system.

Possibly referring to the discussion covered in Issue #87, Section #1  (2 Sep 2000: Possible GPL Violations By Microsoft; Kernel Debugger In Official Sources) , Nicolas Pitre said, "Linus is a bastard. Did you forget?" And Tomas Szepe added:

This is getting ridiculous all right.

Linus, what makes you ignore Keith's work?

Would you tend to think he's worked on kbuild25 this long to end up having to send a linus-dammit-would-you-have-a-look-at-last-i'm-not-going-to-keep-asking-forever msg?

Russell King said:

I'm not going to answer for Linus, except to say that Linus is taking patches to fix and improve the existing kbuild in 2.5.

Maybe the right thing to do is to let Linus and others try to fix the existing kbuild, and when/if it doesn't work we have something that does work.

And Wayne Brown put in at one point, "OTOH, those of us who are not looking forward to kbuild 2.5 are grateful for any delays we can get."

Elsewhere, Jeff Millar replied to Keith's initial announcement. He asked if any of the other kernel branches (such as the one maintained by Dave Jones) would adopt the new kbuild. Keith replied that once Linus took it, everyone else would follow suit. And Dave also said to Jeff:

I've thought it over a few times over the last few weeks, and tbh inclusion in any tree other than Linus' doesn't really make much sense other than perhaps to get some more 'early adopter' testers.

The current kbuild2.5 patches will apply cleanly against my tree, but due to things like the new input layer still not being completely merged in Linus' tree, some files are in different places, so the Makefile.in's in kbuild2.5 point to the wrong places.

Sure, I could merge it, but tbh it's not worth the effort right now of fixing up those files until Linus actually says yay or nay.

5. Status Of HCF Modem Support

21 May 2002 - 22 May 2002 (7 posts) Archive Link: "Support for HCF modem.?"

Topics: BSD, Licencing, Modems, PCI, SMP

People: Alok K. Dhir

Halil Demirezen asked if there were any Linux support for the HCF Conexant PCI modem. Alok K. Dhir pointed him to http://www.mbsi.ca/cnxtlindrv/, and added, "Caveat: These only work on non-SMP kernels. My SMP box freezes hard when accessing the modem using this HCF driver." Elsewhere, Halil observed that some of the code was available under the GPL, and some under the BSD license. And someone else pointed out that the code itself was available in source form, but linked against a binary-only library, containing the soft modem core from Conexant.

 

 

 

 

 

 

Sharon And Joy
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.