Kernel Traffic #195 For 9 Dec 2002

By Zack Brown

Table Of Contents

Introduction: Request For Assistance

Hi folks. I'm trying to find a good solid server to host kerneltraffic.org. It would be the main Kernel Traffic site, and development environment for new issues. If anyone's interested in this, please let me know.

Mailing List Stats For This Week

We looked at 1441 posts in 8002K.

There were 462 different contributors. 244 posted more than once. 176 posted last week too.

The top posters of the week were:

1. New Athlon 'Bug'

21 Nov 2002 - 29 Nov 2002 (5 posts) Archive Link: "A new Athlon 'bug'."

People: Dave JonesDaniel NofftzPavel Machek

Dave Jones reported:

Very recent Athlons (Model 8 stepping 1 and above) (XPs/MPs and mobiles) have an interesting problem. Certain bits in the CLK_CTL register need to be programmed differently to those in earlier models. The problem arises when people plug these new CPUs into boards running BIOSes that are unaware of this fact.

The fix is to reprogram CLK_CTL to 200xxxxx instead of 0x600xxxxx as it was in previous models. The AMD folks have found that this improves stability.

The patch below does this reprogramming if an affected model/bios is detected.

He asked for someone with an affected box to run some benchmarks comparing the before and after affects of his patch, but no one spoke up to do so. But Pavel Machek asked what behavior would result from setting the CLK_CTL bits improperly. Dave replied, "The documentation I have says nothing other than "...platforms are more robust..." with the fix. It's purely a reliability thing, but as it's fiddling with the CPU clock, it's possible that it may *slightly* affect performance too." Daniel Nofftz recalled from his own Athlon work, "the clk_ctl register is importand when the athlon processor comes back from acpi-c2 mode. in c2 he is disconnected from the system bus and the internal clock is clocked down. in some cases a false value in this register could prevent the athlon processor to come back from c2 -> lockup of the machine or something like it ... (bug 11 of the athlon processor revision guide)" . Dave confirmed this, adding that the register "contains values that indicate to the CPU how to ramp up the CPU clock during low-power modes."

2. Wish List For Module Features

25 Nov 2002 - 29 Nov 2002 (21 posts) Archive Link: "Modules with list"

Topics: FS: initramfs, FS: ramfs, Security

People: Adam J. RichterRusty RussellIngo OeserRoman Zippel

Adam J. Richter posted:

Here is a list of changes that I'm thinking about trying to make to modules, in case anyone is interested or wants to show me the error of my ways. Most of these changes do not depend on whether the module loader is in the kernel or a user level program. I've labelled the items that are only applicable to user level modules with "user level version:".

  1. Allow multiple MODULE_DEVICE_TABLE's of the same type in the same .c file instead of the combined_dev_id_table hack that is now used by modules that really need to load separate but related drivers.
  2. Eventually have the same build command for modules and compiled in objects so that distribution makes can ship an "all modules" build and link script to allow much more customization by users who do not want to recompile kernel code.

    1. Compile module_init, subsys_init, etc. by the same mechanism used by kernel objects.
    2. Pass module parameter by __setup() rather than MODULE_PARM().
    3. Eliminate "#ifdef MODULE" init.h, module.h, and, eventually, almost everywhere.
    4. In the core kernel, THIS_MODULE would point to a struct module rather than being NULL (eliminating many little banches).

  3. To prevent rmmod's during modprobe, have rmmod do flock(/proc/modules, LOCK_EX) and modprobe do flock(/proc/modules, LOCK_SH). Yes, you can detect this already, but this way you it does not cause failure and you do not need retry code.
  4. Other wishes that probably do not effect module-init-tools, at least when the module loader is in the kernel:

  5. failureless raceless module unloading by the module->rwsem_list system that I described toward the bottom of this message: http://marc.theaimsgroup.com/?l=linux-kernel&m=103773401411324&w=2
  6. At modprobe time, being able to decide to load a module as non-removable to avoid loading .exit{,data} for a smaller kernel footprint. This might only require insmod changes for the user level insmod.
  7. kmalloc'ing small modules for less memory consumption and perhaps so that they can avoid using TLB entries on certain architectures (412 of 1129 modules on my system have .text + .data < 4096).

    1. maybe load .text and .data separately for modules where .text + .data >= 4096 && .text < 4096 && .data < 4096 (26 of 1129 modules have this property on my system). Probably not worth it.

  8. User level version: optionally be able to move all symbols to user land at the expense of losing kksymoops (would save ~100kB on my system).
  9. User level version (already done in kernel loader version): eliminate dependence on struct module using a module-start.o based on what Roman Zippel proposed at http://marc.theaimsgroup.com/?l=linux-kernel&m=103740379811285&w=2 (but using a module-end.o file and eliminating the linker script).
  10. User level version: load module contents with mmap(/dev/kmem), reducing initial memory requirements by avoiding a malloc and copy.
  11. Move tracking of dependencies among loaded modules to user land (and be able to reconstruct in some cases from modules.dep).

Hopefully, posting this list will reduce the chances of duplication of effort or help expose a problem or potential improvement I hadn't considered.

Rusty Russell responded to most items on this list, with two main threads of discussion, and many little threadlets. To Adam's item 1 (multiple MODULE_DEVICE_TABLEs), Rusty felt this was sensible, and should be no problem. To item 2 (same build command for modules and compiled in objects), he said, "I've never really aimed for this, and as you've noticed, there are a few issues." Ingo Oeser suggested, "Maybe that could be done already by having a list of modules for initramfs? That's Alans plan anyway, so we might as well solve it here." But there was no follow-up to this.

To Adam's item 3 (preventing rmmods during modprobe), Rusty felt Adam's method seemed reasonable. To item 4 (failureless, raceless module unloading), Rusty studied the link (http://marc.theaimsgroup.com/?l=linux-kernel&m=103773401411324&w=2) Adam had given, and approved of the general scheme, but felt the technical obstacles were significant. In particular he felt the locking issues were nontrivial; and the two of them went back and forth on the technical issues for a bit.

To Adam's item 5 (deciding at modprobe time to load a module as non-removeable), Rusty said, "it'd be a cute hack to let the user do this." But Ingo felt this would lead to a proliferation of dangling pointers. Adam disagreed, saying the problem was really quite simple; and they went back and forth on it for a bit, without coming to any final agreement.

To Adam's item 6 (kmalloc()ing small modules for less memory consumption), Rusty said, "Yeah, this is trivial with the current scheme, and was one of the aims. The alloc is arch-specific."

Rusty had nothing to say about Adam's other items, except item 10 (tracking dependencies of loaded modules from user-space), to which he said:

Personally, I think the userspace module loaders are clearly inferior, especially as you're gonna break userspace with almost every one of these changes. Sure, you can use a kernel-specific library to give you back the interface flexibility, but why? You gain complexity and your kernel doesn't get any smaller anyway.

Anyway, I think supporting both doesn't make sense. Either the in-kernel module loader is better, in which case it should be kept, or it isn't in which case it should be junked.

Ingo replied:

At least resolving module name aliases to modules and options hould be done in user space, because that's critical to auto configuration and readable configuration of the system.

module_name_deamon anyone?

This resolving is clearly seperateable and might not even require root privileges and can be done as a special user (passed as kernel parameter and defaulting to UID 0), because we just need to read a kind of database.

That reduces buffer overflow attacks and the like.

That was about it.

3. New Graphical GKC Tool For LinuxKernelConf Configuration System

26 Nov 2002 - 29 Nov 2002 (5 posts) Archive Link: "Re: kconfig (gkc): GTK tool released, please test again..."

Topics: Kernel Build System

People: Roman Zippel

Romain Lievin updated kgc (http://tilp.info/perso/gkc.html) , his graphical front-end to Roman Zippel's LinuxKernelConf (http://www.xs4all.nl/%7Ezippel/lc/) CML1 replacement, to use the GTK+ 2.0 graphics library. There followed some offline discussion, with requests for a patch that could be applied to the kernel sources for testing purposes; and Romain posted a patch (http://tilp.info/perso/prepare.diff) a few days later.

4. NFS/ext3 Problems

26 Nov 2002 - 28 Nov 2002 (21 posts) Archive Link: "htree+NFS (NFS client bug?)"

Topics: FS: NFS, FS: XFS, FS: ext2, FS: ext3

People: Jeremy FitzhardingeTheodore Y. Ts'oTrond MyklebustStephen C. Tweedie

Jeremy Fitzhardinge reported, "I'm having problems with exporting htree ext3 filesystems over NFS. Quite often, if a program on the client side is reading a directory it will simply stop and consume a vast amount of memory. It seems that the readdir never terminates, and endlessly returns the same dir entries again and again." He added, "Both the client and server machines are running 2.4.20-rc1, with Ted's extfs-update-2.4.20-rc1 patch (dating from Nov 8th or so). Is there a more up to date patch?" As far as he could make out, there seemed to be "some sort of problem managing the NFS readdir cookies, but it isn't clear to me whether this is the NFS server/ext3 generating bad cookies, or the NFS client handling them wrongly." Theodore Y. Ts'o replied:

Well, even if the NFS server is generating bad cookies (and that may be possible), the NFS client should be more robust and not spin in a loop forever.

Can you send me a directory list (from the server) of the directory in question? Also, can you send me the output of dumpe2fs -h on the filesystem. I'll need the later to get the seed for the htree hash, so I can try replicating this on my end.

Also, have you tried running e2fsck on filesystem on the server? It would be very interesting to confirm whether or not the filesystem is consistent.

Jeremy sent the information, and added, "This" [problem] "happens very regularly. Each time it does I create a new directory and move everything over, which presumably works because it rehashes everything and/or changes the alignment of particular direntries with respect to the NFS reply packets. What I'm saying is that if the filesystem is getting into an inconsistent state, it is doing so at a very high rate. I'll check it" [with e2fsck] "anyway... all OK."

There was no followup to this, but elsewhere, Trond Myklebust said that, in order to identify whether the problem was at the server end or the client end, it would be helpful "if you could print out the cookies from that listing or better still: if you could provide us with the raw tcpdump output. Please remember to use an 8k snaplen for the tcpdump..." Jeremy replied with a pcap dump file of the transaction, adding, "I changed [rw]size to 1024 because I couldn't work out how to get ethereal to reassemble the fragments. It doesn't seem to have affected the behaviour at all. This was with a 2.5.47 client (same server as before)." Stephen C. Tweedie took a look at the dump, but after some analysis was unable to say for certain where the problem lay. But he did say, "I suspect that this is a root a client problem --- the client has repeated a READDIR despite being told that the previous reply was EOF --- but that the best solution is actually to change the server to poke a dedicated EOF cookie into the final dirent of the stream. One of the reasons we need to do that is to cope with the unclear situation when an application is using telldir/seekdir to reposition the directory stream (tcsh is really bad for trying to do that with 32-bit offsets when globbing, for example, although the right answer there is "use readdir64".)" Trond disagreed with this interpretation, and they argued peacefully back and forth for awhile. Finally, Stephen managed to reproduce the problem himself, and trace it back to ext3. He concluded, "The problem is that the htree readdir code is not updating f_pos after returning the very last chunk of data to the caller. That doesn't hurt most callers because the location is cached in the filp->private data, but it really upsets NFS." He replied to himself:

In fact, it's not clear what we _can_ return as f_pos after the last dirent.

We're only using 31-bit hashes right now. Trond, how will other NFS clients react if we return an NFS cookie 32-bits wide? We could easily use something like 0x80000000 as an f_pos to represent EOF in the Linux side of things, but will that cookie work if passed over the wire on NFSv2?

The alternative is to hack in a special case so that (for example) we consider a major htree hash of 0x7fffffff to map to an f_pos of 0x7ffffffe and just consider that a possible collision, so that 0x7fffffff is a unique EOF for the htree tree walker.

To the idea of a 32-bit wide NFS cookie, Trond replied:

For all other NFS clients that I know of, this is perfectly acceptable. As far as the Linux kernel goes, it is quite OK, but when you get to userland, glibc-2.2 and above will insist that this is an illegal value (they like to sign extend 32-bit values). Causes no end of trouble, since XFS tends to use '0xffffffff' as the EOF cookie.

I have a patch that hacks the values of such cookies so that glibc will accept them. That hack will never go in to the official kernel, so it would be nice if ext2/ext3 could avoid the need for it.

To Stephen's proposed alternative special case, Trond felt this would be fine. Jeremy, however, said:

Even if you fix this, there's another problem.

It seems that htree basically can't work with NFS in its current state - it only works at all on small directories, which aren't hashed and therefore use the non-htree cookie scheme. This can be fixed creating a distinct EOF cookie.

However, in the transformation from a non-hashed to hashed directory the cookie scheme completely changes, and in effect invalidates all cookies currently known by clients. The obvious problem is that sometimes adding a single entry to a directory will kill all concurrent readdirs.

I know that changing a directory while scanning it has at least some undefined effects (allowed to miss entries, but not allowed to duplicate, if I remember correctly), but if you add a single entry to a directory, is it allowed to completely break any pending readdir operation?

One solution I can think of is to always use name hashes as directory cookies, even for non-hashed directories. This means that scans of a small directory will require linear searching to find the entry corresponding to a particular cookie, but since the directory is small by definition it shouldn't be a bad performance hit.

There was no reply to this, and the discussion ended.

5. Setting Per-User Resource Limits

27 Nov 2002 - 28 Nov 2002 (7 posts) Archive Link: "Limiting max cpu usage per user (old Conectiva patch)"

Topics: Forward Port

People: Marc-Christian PetersenMartin WaitzRik van Riel

Frederik Dannemare asked if there were a way to limit the maximum CPU usage, on a per-user basis. He'd searched the linux-kernel archives and found only a single thread (http://www.uwsg.iu.edu/hypermail/linux/kernel/0108.2/0362.html) in which Rik van Riel had described a patch by Conectiva against the 2.2 kernel. Frederik asked if anyone knew whether that patch had been ported to 2.4 or 2.5.

Marc-Christian Petersen said, "I can offer you a really nice small patch from Karol Golab. Find it here: http://www.tls-technologies.com/CPU/cpu-intro.html. works great for me." Frederik said he'd definitely check it out.

Rik van Riel also said the Connectiva patch had been forward-ported, and was on his patches page (http://surriel.com/patches/) . Hu Gang confirmed that the patch worked, and Frederik said he'd test it out right away. Elsewhere, Martin Waitz also said:

i'm working on a resource container implementation for linux for my diploma thesis.

resource container provide a hierarchical way to account and limit resources. this way not only per user limits can be achieved, but any policy you can think of (per service, per client, ...)

the work is due january, but interested people could have a look at the (undocumented for now ;) source earlier.

6. Support For SGI Visual Workstation In 2.5

27 Nov 2002 - 28 Nov 2002 (5 posts) Archive Link: "[PATCH] ressurection of VISWS support in 2.5-ac"

Topics: SMP, VisWS

People: Andrey PaninAlan Cox

Andrey Panin announced:

after about month of heavy nightly work :)) , I'm proud to present updated VISWS support for 2.5.xx kernels.

Attached patch is against 2.5.49-ac1, but can be applied to 2.5.47-ac with small tweaking in Kconfig. 2.5.47-ac is even preferrable for people who don't want to bother with modules mess^H^H^H^Hproblems in newer kernels.

This kernel loads and works fine in my VISWS 320 with serial console or via ssh connection.

Problems that still remains:

Alan, can you apply this patch to 2.5.49-ac or it should be splitted to smaller per-area patches ?

Alan Cox said he'd probably apply the patch, adding, "thats truely demented. Care to port ucLinux to the Amiga 500 (http://www.obsoletecomputermuseum.org/amiga500/) next 8)"

7. Linux 2.5.50 Released

27 Nov 2002 - 4 Dec 2002 (21 posts) Archive Link: "Linux v2.5.50"

Topics: Disks: SCSI, Kernel Release Announcement, Power Management: ACPI, USB

People: Linus TorvaldsNathan Walp

Linus Torvalds announced Linux 2.5.50, saying:

Taking a small thanksgiving break, but before that here's 2.5.50 (http://www.kernel.org/pub/linux/kernel/v2.5/ChangeLog-2.5.50) .

Merges from Alan, Dave and Andrew. ACPI, USB, LSM and SCSI updates. Sparc, ARM and v850 architecture updates.

He added, "For the non-US aware of you out there: it's the time of year when the whole country turns into one big turkey-filled trough, and pretty much everybody just pigs out. The amount of turkey consumed would roughly reach 5.4 times to the moon and back, if all the turkeys were laid in a straight line. Small black holes form where enough fat people get together. It's not pretty. And I'll do my best to blend in ;^)."

Nathan Walp mentioned, "Hrmm then that would make the 2.5.x tree one year old now. Hats off to all of you who have accomplished so much in 12 short months."

8. Module Development Continues; Some Developers Unhappy

27 Nov 2002 - 29 Nov 2002 (16 posts) Archive Link: "[RELEASE] module-init-tools 0.8"

Topics: Kernel Build System, PCI, USB

People: Rusty RussellGerd KnorrBill DavidsenJan-Benedict GlawTomas SzepeChristoph HellwigJan-BenedictLinus Torvalds

Rusty Russell gave a link (http://www.us.kernel.org/pub/linux/kernel/people/rusty/modules) to module-init-tools version 0.8, and asked Linus Torvalds to apply the patch. He said:

This release needs depmod again, which should help speed for those of you with 1300 modules. A replacement depmod is provided, since the previous one gets rightfully confused by 2.5.47+ kernels. You will require a small kernel patch to 2.5.50 (below) for PCI and USB tables to work.

Also included is modules.conf2modprobe.conf, which is fairly simplistic but should get most people up and running. This will be enhanced as new features go into the new modprobe.

Some dummy options are implemented, and "modprobe -c" is implemented too, which should help Mandrake and RedHat's init scripts deal with the change.

Many thanks to those who provided patches, bug reports, and copies of their init scripts. Your feedback is greatly appreciated!

Please report any bugs to rusty@rustcorp.com.au.

Gerd Knorr reported that his system still wouldn't boot properly, with Rusty's patch. He found a number of modprobe processes hanging around in his 'ps' output, at which point 'lsmod' would hang as well. No other modules could be loaded after that, making his system virtually unusable. He complained, "Module debugging is next to impossible right now. The apm.o module oopses for me in 2.5.50. ksymoops isn't able to translate any symbol located in modules. The in-kernel symbol decoder (CONFIG_KALLSYMS) doesn't work too." Rusty gave another patch, adding that he was unable to reproduce the problem on his own systems. He added elsewhere, "Linus unfortunately has been dropping my patches." Bill Davidsen said at one point:

The new module stuff has been in for about three weeks now, many people are having problems with it, and I have yet to see a single post praising the *actual* benefits. Will there be a time when this is reverted and rescheduled for a future release (2.7?) or is this a do-or-die feature?

It doesn't have the feel of something solid having a few corner cases fixed, it feels like a bunch of band-aids which will unstick in future releases and continue to be high maintenence.

Jan-Benedict Glaw added, "It's not only that i386 is b0rked to some degree. Ever looked at some other architectures? Again, most (if not all) won't compile again. Eg. last Alpha kernel which worked for me (TM) was 2.5.44..." Close by, Tomas Szepe remarked:

Also I can't see how the new module infrastructure could have made it in w/o having been complete, *functional*, proven and thoroughly reviewed off-tree in the first place (which I thought was pretty much a standard around here). Mature, drop-in replacement projects like Keith Owen's kbuild 2.5 are getting ignored while something as wild as Rusty's "welcome in module hell exhibit" is merged right when the tree is supposed to start stabilizing.

And heck, I haven't even seen Viro and Hellwig complaining! What's going on? :)

Christoph Hellwig replied:

I have complained once in the very beginning. Reading the arches might help.

I'm back at 2.5.47-xfs for daily work now until some brave souls get the new module stuff in shape. It doesn't look like this is going to happen anytime soon, though.

9. Linux 2.4.20 Released

28 Nov 2002 (1 post) Archive Link: "linux-2.4.20 released"

People: Marcelo Tosatti

Marcelo Tosatti announced Linux kernel 2.4.20 (http://www.kernel.org/pub/linux/kernel/v2.4/ChangeLog-2.4.20) .

10. Kernel-Based Debugger For 2.4.20

28 Nov 2002 (1 post) Archive Link: "Announce: kdb v2.5 is available for kernel 2.4.20"

Topics: Debugging

People: Keith Owens

Keith Owens announced the kernel-based debugger (ftp://oss.sgi.com/projects/kdb/download/v2.5/) for 2.4.20.

11. XFS Patches For 2.4.20

28 Nov 2002 (1 post) Archive Link: "Announce: XFS split patches for 2.4.20"

Topics: Access Control Lists, FS: XFS

People: Keith Owens

Keith Owens posted a URL (ftp://oss.sgi.com/projects/xfs/download/patches/2.4.20) and announced:

For some time the XFS group have been producing split patches for XFS, separating the core XFS changes from additional patches such as kdb, xattr, acl, dmapi. The split patches are released to the world with the hope that developers and distributors will find them useful.

Read the README in each directory very carefully, the split patch format has changed over a few kernel releases. Any questions that are covered by the README will be ignored. There is even a 2.4.21/README for the terminally impatient :).

12. WOLK 3.8 Released

29 Nov 2002 (1 post) Archive Link: "[ANNOUNCE] WOLK v3.8 FINAL // [PATCH | PATCHSET | FULLKERNEL | UPDATE]"

Topics: Big O Notation

People: Marc-Christian Petersen

Marc-Christian Petersen posted a URL (http://sf.net/projects/wolk) and announced WOLK 3.8, saying:

When I released 3.7 FINAL I said development for 3.x series has stopped, but I cannot stop ... It's just like a drug ;) ... Next release will be definitively WOLK4.0 with 2.4.20 ... Just waiting for O(1), Lowlat, Preempt for O(1) patches for 2.4.20 ...

I guess you'll like this release very much!

13. Linux 2.2.23 Released

29 Nov 2002 (1 post) Archive Link: "Linux 2.2.23"

People: Alan Cox

Alan Cox announced Linux Kernel 2.2.23 (http://www.uwsg.indiana.edu/hypermail/linux/kernel/0211.3/0221.html) .

14. Support For POSIX Message Queues

30 Nov 2002 - 2 Dec 2002 (5 posts) Archive Link: "[PATCH] POSIX message queues, 2.5.50"

Topics: Ioctls, POSIX, SMP

People: Krzysztof BenedyczakManfred SpraulRussell King

Krzysztof Benedyczak announced:

After some (rather long) work we have finished and tested (also on SMP machine) new implementation of POSIX message queues.

As it was suggested it is made as a filesystem.

Recently there were Peter Waechtler's patch with the same functionality. As it is concurrent to us I will point out some key differences (and advantages of our implementation as I think :-)

and two most important ones:

Finally one note about library - it is possible to wget it from www.mat.uni.torun.pl/~golbi/mqueuelib-3.0beta.tar.gz (http://www.mat.uni.torun.pl/~golbi/mqueuelib-3.0beta.tar.gz) but it is beta version (mainly because docs weren't updated) We plan to finish it (and update our page) on next Tuesday.

Russell King and Manfred Spraul had some technical objections, and Krzysztof posted some updated versions of the patch.

15. Status Of The Various Stable Series

30 Nov 2002 - 2 Dec 2002 (3 posts) Subject: "2.4.20: and next ?"

Topics: FS: ext3

People: Bert HubertBill Davidsen

Romain Lievin wanted to backport the tipar char driver to 2.4, and asked if there would be any releases after 2.4.20; Bert Hubert replied, "Sure, there is 2.2 development too. Don't count on 2.6 being mainstream anytime soon." And Bill Davidsen said also:

There were 2.0 releases until 2001, and three 2.2 releases this year alone, I think you can assume that there will be releases for at least another year.

Actually with all the problem reports and partial fixes for data loss on journaling filesystems (or perhaps only ext3), I would expect another release fairly soon.

Certainly if you submit a patch by the end of the year it would be soon enough, although it still has to be accepted. I can't see any reason why it wouldn't, although that doesn't mean much.

16. Data Corrution In ext3 Under 2.4.20

1 Dec 2002 - 2 Dec 2002 (8 posts) Archive Link: "data corrupting bug in 2.4.20 ext3, data=journal"

Topics: FS: ext3

People: Andrew MortonNick PigginStephen C. Tweedie

Andrew Morton reported:

In 2.4.20-pre5 an optimisation was made to the ext3 fsync function which can very easily cause file data corruption at unmount time. This was first reported by Nick Piggin on November 29th (one day after 2.4.20 was released, and three months after the bug was merged. Unfortunate timing)

This only affects filesystems which were mounted with the `data=journal' option. Or files which are operating under `chattr -j'. So most people are unaffected. The problem is not present in 2.5 kernels.

The symptoms are that any file data which was written within the thirty seconds prior to the unmount may not make it to disk. A workaround is to run `sync' before unmounting.

The optimisation was intended to avoid writing out and waiting on the inode's buffers when the subsequent commit would do that anyway. This optimisation was applied to both data=journal and data=ordered modes. But it is only valid for data=ordered mode.

In data=journal mode the data is left dirty in memory and the unmount will silently discard it.

The fix is to only apply the optimisation to inodes which are operating under data=ordered.

But he replied to himself with an update, saying that his proposed fix in the last paragraph, wasn't actually a proper fix. He added, "Please avoid ext3/data=journal until it is sorted out."

Nick Piggin said, "In fact it was reported on lkml on 18th July IIRC before 2.4.19 was released if that is any help to you. 2.4.19 and 2.4.20 are affected and I haven't tested previous releases. I was going to re-report it sometime, but Alan brought it to light just the other day." Andrew replied, "Are you sure? I can't make it happen on 2.4.19. And disabling the new BH_Freed logic (which went into 2.4.20-pre5) makes it go away." Stephen C. Tweedie felt that removing the BH_Freed logic would re-expose bugs that the BH_Freed code had been intended to fix. He gave his own take on the problem, and after a very brief bit of technical debate, the thread ended inconclusively, with both Andrew and Stephen saying they'd found the correct fix.

17. Linux PnP Support 0.93 For Kernel 2.5.50

1 Dec 2002 (1 post) Archive Link: "[PATCH] Linux PnP Support V0.93 - 2.5.50"

People: Adam Belay

Adam Belay announced:

Attached is a patch, gzipped for size, that updates the 2.5.50 to the latest pnp version. It includes all 9 of the previously submitted patches.

Highlights are as follows:

PnP developers please use this patch.

18. i2c-amd766 Driver For 2.5.50

1 Dec 2002 (4 posts) Archive Link: "i2c-amd766 driver for 2.5.50"

Topics: I2C

People: Pavel Machek

Pavel Machek posted a big patch and said, "This brings amd766 driver along with adm1021 and lm75 to 2.5.50. Does it look mergeable? If so, please apply."

19. Maximum Physical RAM

1 Dec 2002 (4 posts) Archive Link: "Maximum Physical Memory on 2.4 and ia32"

Topics: Big Memory Support, FS: ext2

People: Andrew MortonMartin J. BlighStephen Rothwell

Stephen Rothwell quoted a statement from Red Hat (http://www.redhat.com/services/techsupport/production/GSS_caveat.html) in which they said their 2.4-based operating system would support no more than 16G of RAM. He asked why this limit existed. Andrew Morton replied:

It's a practical limit. The mem_map array alone would consume 720 megabytes, so you have no normal-zone memory left.

At 16G, mem_map[] consumes 180 megabytes and there's 540ish megabytes of normal zone left for general use.

Even at this 20:1 highmem:lowmem ratio, the system will be struggling. Any time you have normal-zone data structures which are pinned by pages, the maths gets you in the end.

buffer_heads, pagetable pages, radix-tree nodes, pte_chains and inodes are normal-zone data structures which, depending on the kernel version, may be pinned into the normal zone by highmem pages.

In 2.5, with ext2's no-buffer-head option, shared pagetables, highpte, with your fingers crossed and the wind in the south east, 32G might be practical.

And Martin J. Bligh added:

32Gb was indeed what we've been working towards for 2.6, and we've been running that on some workloads.

However, if you're willing to run with a 2:2 or even a 1:3 user:kernel split instead of the default 3:1, you can get away with all sorts of things, probably including 64Gb. I've got the hardware to build such a beast, but haven't bothered yet (we have enough problems already ;-)). Big databases won't like it, but other workloads without huge individual processes (or shared mappings) will be fine.

20. Backporting HDLC To 2.4

1 Dec 2002 - 2 Dec 2002 (3 posts) Archive Link: "[PATCH] generic HDLC update for 2.4.21-pre"

Topics: Ioctls

People: Krzysztof HalasaFrancois Romieu

Krzysztof Halasa announced:

I've uploaded the HDLC update discussed here to: ftp://hq.pm.waw.pl/pub/linux/hdlc/current/hdlc-2.4.20.patch.gz

This patch is essentially a downport from current 2.5 kernel line, which means quite a big rewrite and a binary incompatibility of userspace utility "sethdlc".

There seem to be an agreement that this patch should be applied to 2.4, despite the compatibility problem. Most users are already using the updated version anyway (my own hw drivers only support 2 older ISA cards, while manufacturers of newer cards have drivers working only with the newer code).

The new code isn't really that new, it has been in use for over a year.

This patch doesn't change anything outside "generic HDLC" area (except that it adds a new SIOCWANDEV net device ioctl, which is used instead of various SIOCDEVPRIVATEs, but it's a trivial change).

Please apply to 2.4. Thanks.

Francois Romieu objected:

I'd rather avoid pushing the 2.5.x core code for the dscc4 chipset in 2.4 now as some side of it still suck. Is it fine to wait for me to update current 2.4.x dscc4 code to new api ? ETA = now + a few days at worst.

Btw if someone can get in touch with Infineon, it would be interesting to know if recent releases of the chipset still behaves as the old ones (I only have rather old ones).

And Krzysztof replied, regarding waiting for Francois' update, "I can see no problem with that, as long as we are at early 2.4.21 stages."

21. kexec-tools 1.8 Released

1 Dec 2002 - 2 Dec 2002 (4 posts) Archive Link: "[ANNOUNCE] kexec-tools-1.8"

Topics: Big Memory Support, Kexec

People: Eric W. BiedermanDave HansenAndy Pfiffer

Eric W. Biederman announced:

kexec-tools-1.8 is now available at: http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.8.tar.gz

Dave Hansen has a patch that allows /proc/iomem to export resources above 4GB which is needed on machines on with > 4GB of RAM.

Changes:

That should make kexec quite useable.

The syscall: http://www.xmission.com/~ebiederm/files/kexec/linux-2.5.48.x86kexec.diff and the fixes http://www.xmission.com/~ebiederm/files/kexec/linux-2.5.48.x86kexec-hwfixes.diff continue to apply to 2.5.50 so I have not updated them.

The archive is at: http://www.xmission.com/~ebiederm/files/kexec/

My apologies for not getting this sooner. Along with the holidays I have been battling a cold...

Dave Hansen was very happy with this. He said, "It booted on my first try, even with the 64-bit /proc/iomem changes. I tried it on machines with 16GB and 1GB of RAM. (insert clapping here)" Eric replied:

Thanks. The code for reading /proc/iomem was a modeled after Andy Pfiffer's work, and your earlier patch. I just cleaned them up and integrated it cleanly with my existing code base.

I guess that means I should shake off the bit rot and resubmit to Linus.

22. Bugzilla: The Saga Continues

2 Dec 2002 - 5 Dec 2002 (15 posts) Archive Link: "lkml, bugme.osdl.org?"

Topics: Bug Tracking

People: Martin J. BlighDave JonesValdis Kletnieks

Valdis Kletnieks asked if it was preferable for bug reports to go only to the linux-kernel mailing list, or only to the Bugzilla database (http://bugzilla.kernel.org) , or both. Martin J. Bligh replied:

I'd say both. Be careful not to file duplicates in Bugzilla though. People attatching patches to existing bugs in Bugzilla are especially welcome ;-)

Bugs will get closed out once they're fixed in the next full release of mainline, so Bugzilla shouldn't get too cluttered. We need to have a better (more searchable) version field, but that needs some more complex Bugzilla rework ... we're thinking about how best to do it.

Elsewhere, Dave Jones said:

whilst on the subject of bugzilla: a few people (myself included) go through the bug database once a week or so pruning out-of-date/fixed entries. So far the ones I've closed have been quite sensible, but there are a few there of the form..

"xxx doesn't work in 2.5.47", then Rusty's module rewrite happened, and the tester didn't (or couldn't) see if it got fixed in subsequent kernels. I'll send out pings to such reports when they get to something like 5 kernels old. If the problem then doesn't get re-ACKed, I'll close it.

23. Dynamic Power Management Proposal

3 Dec 2002 - 5 Dec 2002 (11 posts) Archive Link: "IBM/MontaVista Dynamic Power Management Project"

Topics: Patents

People: Bishop BrockAlan CoxArjan van de VenDominik BrodowskiHollis Blanchard

Bishop Brock of IBM announced:

IBM and MontaVista have initiated a joint project to develop a dynamic power management control and policy mechanism for Linux for processors supporting dynamic voltage and frequency scaling. A paper describing the proposal can be obtained from

http://www.research.ibm.com/arl/projects/dpm.html

A working prototype of the proposed framework for the IBM PowerPC 405LP processor exists and will be made public in the near future.

Alan Cox replied, "Interesting. One small question however. The paper says "Others have also explored the possibilities of this type of fine grained control". More to the point however they have patents covering them. What does IBM intend to do about that ?" Bishop said:

This is an important and complicated question. Our code has passed an internal IBM legal review, however we are still discussing the implications of the patent with our attorneys. The best I can offer at this point is that we hope to have a definitive answer next week.

The patent in question (US 6,298,448) deals with application-specific dynamic scaling. Although this is an important part of our proposal, it is not the central idea, and I believe the proposal has merit even if this portion were suppressed.

Elsewhere, on a more technical level, Arjan van de Ven asked, "any idea if/how this will fit into the existing cross platform cpufreq framework ?" Dominik Brodowski replied, "if I understand IBM's proposal right, it seems to be an alternative to cpufreq: a different "mid-layer" between the low-level processor drivers, other kernel code, and the user. So it's not an extension to an existing feature, but a new feature." Hollis Blanchard added, "The idea is that you want scaling events to be generated by the kernel rather than only scaling on userland input. The paper" [...] "give you some ideas of when and why..." Dominik felt that IBM's proposal was really just a duplication of CPUFREQ, with some added elements. He said, "CPUfreq is about providing a cross-arch interface between other kernel code, user-space and processor drivers for static and/or dynamic frequency and voltage scaling. The DPM proposal seems to try to be another such "mid-layer". And while it might be possible to emulate CPUFREQ as a driver for DPM, it will be possible in the same way to emulate DPM as a driver for CPUFREQ." And concluded, "I'd suggest that we work together in integrating your DPM proposal into the existing cpufreq framework. As I said before, if any changes to the cpufreq core are neccessary, these can easily be done as long as they don't reduce functionality or break existing features."

There was no reply.

24. AGPGART Redesign

3 Dec 2002 - 4 Dec 2002 (2 posts) Archive Link: "[CFT][2.5] AGPGART reworking."

Topics: Maintainership, Sound: i810, Version Control

People: Dave JonesJeff Hartmann

Dave Jones announced:

As per private discussion with Linus over the last few days, I've reworked the AGPGART driver considerably. There's likely some gotchas left here, so I'd appreciate testers/code-review of this quite large change (>200KB worth of diff, but several files get moved about a bit).

You can get this from bk://linux-dj.bkbits.net/agpgart or for the bk-challenged, you can get the gnu diff at .. ftp.kernel.org/pub/linux/kernel/people/davej/patches/2.5/2.5.50/agpgart-recore-2.diff.gz (ftp://ftp.kernel.org/pub/linux/kernel/people/davej/patches/2.5/2.5.50/agpgart-recore-2.diff.gz)

Rudmer van Dijk tested it out, and reported that it seemed to be working perfectly.

25. NTFS 2.1.0 Released For 2.4.20 Kernel

3 Dec 2002 (1 post) Archive Link: "[ANN] NTFS 2.1.0a for Linux 2.4.20"

Topics: FS: NTFS

People: Matthew J. Fanto

Matthew J. Fanto announced:

The new NTFS driver 2.1.0 (a) is now available for the 2.4.20 kernel.

You can grab it along with ntfstools and other patches from http://linux-ntfs.sf.net

26. Status List For 2.5

4 Dec 2002 (1 post) Archive Link: "[STATUS 2.5] December 4, 2002"

Topics: Bug Tracking

People: Guillaume Boissiere

Guillaume Boissiere posted his 2.5 Status List (http://www.uwsg.indiana.edu/hypermail/linux/kernel/0212.0/0742.html) for December 4, 2002, saying:

Of note this week the merge of the syscall compatibility layer. Also more bug reports in bugzilla (http://bugme.osdl.org/), which probably means more people are testing 2.5 these days.

Full status list is at http://www.kernelnewbies.org/status/. Please let me know if anything is inaccurate or missing.

 

 

 

 

 

 

Sharon And Joy
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.