Kernel Traffic #194 For 2 Dec 2002

By Zack Brown

Table Of Contents

Introduction

Well, the Topic Index (topics.html) is a bit more complete now, though it definitely needs some refinement. The index is essentially generated by a script using many perl regular expressions, but a lot of hand-tuning is also part of the process. I've been going over everything bit by bit, crafting new regexes, fixing topic tags by hand, but it is slow going on my own. I'd appreciate feedback, particularly in the following areas:

Please send me all your corrections, no matter how small. The full page is well over 400K, and I do plan to split it up in a future incarnation, so no need to ask for that. I'm aware.

Mailing List Stats For This Week

We looked at 1525 posts in 8713K.

There were 446 different contributors. 232 posted more than once. 193 posted last week too.

The top posters of the week were:

1. Linux 2.4.20-rc2 Released

15 Nov 2002 - 21 Nov 2002 (11 posts) Archive Link: "Linux 2.4.20-rc2"

Topics: Backward Compatibility, Framebuffer, Ioctls, Networking, Security, Version Control

People: Marcelo TosattiTrond MyklebustTom RiniDavid S. MillerPetr VandrovecPete ZaitcevAlan CoxVojtech PavlikJoshua UzielKeith OwensBen Collins

Marcelo Tosatti announced 2.4.20-rc2 and listed the summary of changes from rc-1:


<cel@citi.umich.edu>:
  o sock_writable not appropriate for TCP sockets

<hch@sgi.com>:
  o fix file system corruption under load   

<jgarzik@redhat.com>:
  o Use dev_kfree_skb_any not dev_kfree_skb in tg3 net driver function
tg3_free_rings.

<marcelo@freak.distro.conectiva>:
  o Undo latest hid-input fixes: they are broken
  o Reverse order of BK config checkout entries
  o Changed EXTRAVERSION to -rc2

<mkp@mkp.net>:
  o Update credits

<rth@are.twiddle.net>:
  o Fix carry ripple in 3 and 4 word addition and subtraction macros

<tytso@think.thunk.org>:
  o HTREE backwards compatibility patch

Alan Cox <alan@lxorguk.ukuu.org.uk>:
  o Enable the merged AMD pm driver

Andries E. Brouwer <Andries.Brouwer@cwi.nl>:
  o [TCP] Do not update rcv_nxt until ts_recent is updated

Ben Collins <bcollins@debian.org>:
  o [TG3]: TG3_HW_STATUS_SIZE should be 0x50 not 0x80

c-d.hailfinger.kernel.2002-q4@gmx.net <c-d.hailfinger.kernel.2002-Q4@gmx.net>:
  o restore framebuffer console after suspend

David S. Miller <davem@nuts.ninka.net>:
  o [SPARC64]: Translate SO_{SND,RCV}TIMEO socket options
  o [SPARC64]: Handle kernel integer divide by zero properly
  o [SPARC64]: Check DRM_NEW not DRM in ioctl32.c
  o [SPARC64]: Fix accidental clobbering of register on cheetahplus

David S. Miller <davem@redhat.com>:
  o Fix tg3 net driver to properly disable interrupts during some TX operations

Edward Peng <edward_peng@dlink.com.tw>:
  o sundance net driver updates

Joshua Uziel <uzi@uzix.org>:
  o [SPARC64]: 0x22/0x10 is Ultra-I/spitfire

Pete Zaitcev <zaitcev@redhat.com>:
  o [sparc] Fix off-by-one in s/g handling

Petr Vandrovec <VANDROVE@vc.cvut.cz>:
  o Fix ncpfs file creation issue

Petr Vandrovec <vandrove@vc.cvut.cz>:
  o Fix lcall DoS

r.e.wolff@bitwizard.nl <R.E.Wolff@BitWizard.nl>:
  o Fix SX driver detection

Tom Rini <trini@kernel.crashing.org>:
  o Fix a thinko in arch/ppc/kernel/ppc_ksyms.c

Trond Myklebust <trond.myklebust@fys.uio.no>:
  o another kmap imbalance in 2.4.x/2.5.x RPC

Vojtech Pavlik <vojtech@suse.cz>:
  o Add vt8235 support

Keith Owens pointed out that none of these changes had any obvious impact on the kdb kernel debugger, so folks could use kdb-v2.5-2.4.20-rc1 with the -rc2 kernel.

2. Linux 2.5.48-mm1 Released

19 Nov 2002 - 21 Nov 2002 (5 posts) Archive Link: "2.5.48-mm1"

Topics: FS: ext2, Real-Time

People: Andrew Morton

Andrew Morton announced:

url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.48/2.5.48-mm1/

Lots of little bits and pieces here. Most notably I've started to look at the scheduling latency of the uniprocessor preemptible kernel. It was actually pretty awful. Most everything in the MM/VFS area has been fixed here, and it is now achieving 500 microseconds max latency at 500MHz.

With the notable exception of the case where a task exits while holding a large amount of mmapped memory. 300 milliseconds for a 700 megabyte mapping, and a couple of milliseconds while just running cvs. This will take quite some fixing.

Only ext2 has been done. Other filesystems will need attention.

3. subarch Cleanup; Header File Organization

19 Nov 2002 - 22 Nov 2002 (14 posts) Archive Link: "[RFC] [PATCH] subarch cleanup"

Topics: Disks: SCSI, Source Tree, User-Mode Linux

People: John StultzMartin J. BlighRussell KingJames BottomleyChristoph HellwigSam Ravnborg

John Stultz posted a patch and said:

This is a small patch to try to somewhat cleanup the subarch code. First it moves all the subarch .h files out of arch/i386/mach-xyz into include/asm-i386/mach-xyz, then it changes the include patch to include include/asm-i386/mach-xyz and include/asm-i386/mach-generic when compiling. This allows the compiler to use the arch specific .h files when needed, and then falls back to the generic .h files if no subarch specific changes are needed.

Obviously this doesn't work with .c files, so I've split up the Makefile MACHINE variable into MACHINE_H and MACHINE_C, so subarchs like summit which does not need any subarch specific .c files can just use the generic files.

Sam Ravnborg asked why John had moved all the .h files, and Martin J. Bligh replied, "Because they're in a silly place now. They should be whereever all the other include files are" [...] "Header files go under include ...."

A couple of folks disagreed with this principle. Russell King said:

Think about UML. UML has:

include/asm-um/*.h
include/asm-um/arch -> include/asm-i386

When building for UML, what happens if you need to get to a machine specific file for something, and the i386 include files do:

#include <asm/mach-generic/foo.h>

Yep, it fails.

Now guess why we in the ARM community haven't even bothered to look at UML yet? There's over 1MB of include files that would need to be moved, along with countless #include statements needing to be fixed up.

For something that would be nice to have, and probably run quite well on the ARM architecture (due to some nice features ARM has, especially for UML's jail mode) there isn't enough interest in it to warrant such a painful reorganisation.

I'd therefore strongly recommend NOT going down the path of adding subdirectories to include/asm-*.

Martin came back with, "I don't understand what UML is trying to do here, but if it wants the architecture specific stuff, it has to do it in the same way as the arch itself. That's UML's problem ... it sounds like it's just making invalid assumptions. Maybe the include paths need to be in a seperate makefile to avoid maintainance problems."

There was no reply to that, but elsewhere, James Bottomley also replied to Martin's assertion that headers should go under the include directory. James said:

Externally useful header files go in include. Header files only used internally to the subsystem go in local directories.

The reason I put them under arch/i386 is because I didn't want the guts of the subarch splitup spilling into the kernel core.

While the subarch is local to i386, I think the headers should stay there. If you want to make the subarch a global framework (and thus get agreement with Russel and ARM to use it) then putting them under the global include directories would probably make sense.

This didn't sit well with Martin. He pointed out that there were plenty of header files under include/ that were not externally useful; and that all the other headers were still in the include path. "putting them all mixed in with C files is just making a mess," he said. James admitted that not everyone followed the proper custom, but said that keeping only externally useful headers in the include directory was a custom nonetheless. He gave the SCSI code as an example. He said, "We have a scsi.h file in drivers/scsi which defines subsystem specific things that we only use within SCSI. We have include/scsi/scsi.h which defines things other subsystems can use."

Martin didn't consider that such a hot model to follow, and Christoph Hellwig also thought it was not the smartest idea. Christoph said, "There are more than enough scsi HBA drivers or emulation drivers outside drivers/scsi. I have a longstanding plan to rationalize the scsi headers (the current state is really really messy), and that includes moving everything but the truly midlayer-specific parts like non-exported function to headers in include/scsi/."

4. Current Work On Disk-Array Support

19 Nov 2002 - 22 Nov 2002 (41 posts) Archive Link: "RFC - new raid superblock layout for md driver"

Topics: Device Mapper, Disk Arrays: EVMS, Disk Arrays: LVM, Disk Arrays: MD, Disk Arrays: RAID, Virtual Memory

People: Neil BrownJoel BeckerDoug LedfordSteven DakeKevin Corry

Neil Brown said:

The md driver in linux uses a 'superblock' written to all devices in a RAID to record the current state and geometry of a RAID and to allow the various parts to be re-assembled reliably.

The current superblock layout is sub-optimal. It contains a lot of redundancy and wastes space. In 4K it can only fit 27 component devices. It has other limitations.

I (and others) would like to define a new (version 1) format that resolves the problems in the current (0.90.0) format.

The code in 2.5.lastest has all the superblock handling factored out so that defining a new format is very straight forward.

I would like to propose a new layout, and to receive comment on it..

He posted his design, and various folks discussed it. At one point Joel Becker asked, "Hmm, what is the intended future interaction of DM and MD? Two ways at the same problem? Just curious." Neil replied:

I see MD and DM as quite different, though I haven't looked much as DM so I could be wrong.

I see raid1 and raid5 as being the key elements of MD. i.e. handling redundancy, rebuilding hot spares, stuff like that. raid0 and linear are almost optional frills.

DM on the other hand doesn't do redundancy (I don't think) but helps to chop devices up into little bits and put them back together into other devices.... a bit like a filesystem really, but it provided block devices instead of files.

So raid0 and linear are more the domain of DM than MD in my mind. But they are currently supported by MD and there is no real need to change that.

Doug Ledford said:

I haven't yet played with the new dm code, but if it's like I expect it to be, then I predict that in a few years, or maybe much less, md and dm will be two parts of the same whole. The purpose of md is to map from a single logical device to all the underlying physical devices. The purpose of :VM code in general is to handle the creation, orginization, and mapping of multiple physical devices into a single logical device. LVM code is usually shy on advanced mapping routines like RAID5, relying instead on underlying hardware to handle things like that while the LVM code itself just concentrates on physical volumes in the logical volume similar to how linear would do things. But, the things LVM does do that are very handy, are things like adding a new disk to a volume group and having the volume group automatically expand to fill the additional space, making it possible to increase the size of a logical volume on the fly.

When you get right down to it, MD is 95% advanced mapping of physical disks with different possibilities for redundancy and performance. DM is 95% advanced handling of logical volumes including snapshot support, shrink/grow on the fly support, labelling, sharing, etc. The best of both worlds would be to make all of the MD modules be plug-ins in the DM code so that anyone creating a logical volume from a group of physical disks could pick which mapping they want used; linear, raid0, raid1, raid5, etc. You would also want all the md modules inside the DM/LVM core to support the advanced features of LVM, with the online resizing being the primary one that the md modules would need to implement and export an interface for. I would think that the snapshot support would be done at the LVM/DM level instead of in the individual md modules.

Anyway, that's my take on how the two *should* go over the next year or so, who knows if that's what will actually happen.

Joel pointed out:

Most LVMs support mirroring as an essential function. They don't usually support RAID5, leaving that to hardware.

I certainly don't want to have to deal with two disparate systems to get my code up and running. I don't want to be limited in my mirroring options at the block device level.

DM supports mirroring. It's a simple 1:2 map. Imagine this LVM volume layout, where volume 1 is data and mirrored, and volume 2 is some scratch space crossing both disks.

          [Disk 1]        [Disk 2]
          [volume 1]      [volume 1 copy]
          [        volume 2             ]

If DM handles the mirroring, this works great. Disk 1 and disk 2 are handled either as the whole disk (sd[ab]) or one big partition on each disk (sd[ab]1), with DM handling the sizing and layout, even dynamically.

If MD is handling this, then the disks have to be partitioned. sd[ab]1 contain the portions of md0, and sd[ab]2 are managed by DM. I can't resize the partitions on the fly, I can't break the mirror to add space to volume 2 quickly, etc.

Doug replied, "Not at all. That was the point of me entire email, that the LVM code should handle these types of shuffles of space and simply use md modules as the underlying mapper technology. Then, you go to one place to both specify how things are laid out and what mapping is used in those laid out spaces. Basically, I'm saying how I think things *should* be, and you're telling me how they *are*. I know this, and I'm saying how things *are* is wrong. There *should* be no md superblocks, there should only be dm superblocks on LVM physical devices and those DM superblocks should include the data needed to fire up the proper md module on the proper physical extents based upon what mapper technology is specified in the DM superblock and what layout is specified in the DM superblock. In my opinion, the existence of both an MD and DM driver is wrong because they are inherently two sides of the same coin, logical device mapping support, with one being better at putting physical disks into intelligent arrays and one being better at mapping different logical volumes onto one or more physical volume groups." Steven Dake replied:

EVMS integrates all of this stuff together into one cohesive peice of technology.

But I agree, LVM should be modified to support RAID 1 and RAID 5, or MD should be modified to support volume management. Since RAID 1 and RAID 5 are easier to implement, LVM is probably the best place to put all this stuff.

Doug said, "Yep. I tend to agree there. A little work to make device mapping modular in LVM, and a little work to make the md modules plug into LVM, and you could be done. All that would be left then is adding the right stuff into the user space tools. Basically, what irks me about the current situation is that right now in the Red Hat installer, if I want LVM features I have to create one type of object with a disk, and if I want reasonable software RAID I have to create another type of object with partitions. That shouldn't be the case, I should just create an LVM logical volume, assign physical disks to it, and then additionally assign the redundancy or performance layout I want (IMNSHO) :-)" And Steven replied:

Yup this would be ideal and I think this is what EVMS tries to do, although I haven't tried it.

The advantage of doing such a thing would also be that MD could be made to work with shared LVM VGs for shared storage environments.

now to write the code...

And Kevin Corry put in:

This is indeed what EVMS's new design does. It has user-space plugins that understand a variety of on-disk-metadata formats. There are plugins for LVM volumes, for MD RAID devices, for partitions, as well as others. The plugins communicate with the MD driver to activate MD devices, and with the device-mapper driver to activate other devices.

As for whether DM and MD kernel drivers should be merged: I imagine it could be done, since DM already has support for easily adding new modules, but I don't see any overwhelming reason to merge them right now. I'm sure it will be discussed more when 2.7 comes out. For now they seem to work fine as separate drivers doing what each specializes in. All the integration issues that have been brought up can usually be dealt with in user-space.

5. Module Problems In 2.5

20 Nov 2002 - 21 Nov 2002 (5 posts) Archive Link: "2.5.48 QM_MODULES: Function not implemented"

People: Felix SeegerChris FriesenRusty Russell

Felix Seeger complained:

I compiled 2.5.48 and now all the files like modules.dep in /lib/modules are away.

I can't load a module, I get this: modprobe: Can't open dependencies file /lib/modules/2.5.48/modules.dep ...

depmod: QM_MODULES: Function not implemented

I enabled all option in the module config.

Someone explained that it was necessary to get Rusty Russell's modutils (ftp://ftp.kernel.org/pub/linux/kernel/people/rusty/modules/module-init-tools-0.7.tar.bz2) package, and also to patch kernel/module.c (http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&selm=20021120125211.GA446%40apocalipsis) to avoid the "Cannot Allocate Memory" message. Chris Friesen added, "Rusty's stuff will let you load, but you'll still get the depmod error. It would have been nice to have a version of depmod that does nothing normally but calls the old version on an older kernel." Felix downloaded Rusty's work and gave it a try, and agreed, "this doesn't fix the depmod problem even with the new kernel, I have modprobe, rmod and so back but the package doesn't include depmod." And Chris said, "Yep. Tnere is no depmod. Bang on Rusty to fix things."

6. Massive SMP Slowdown In 2.4.17

20 Nov 2002 - 26 Nov 2002 (7 posts) Archive Link: "2.4.17 SMP hangs .."

Topics: SMP, Virtual Memory

People: Manish LachwaniAndrew MortonTheodore Y. Ts'o

Manish Lachwani reported, "I am seeing system hangs with 2.4.17 SMP kernel when doing mke2fs accros 12 drives in parallel. However, the hangs only occur when the I/O rate from vmstat is high." Andrew Morton suggested, "Quite possibly it has not hung. You just need to wait half an hour or so. The algorithm isn't very good." And Theodore Y. Ts'o said:

Try setting the environment variable "MKE2FS_SYNC" to a value such as 10. This will cause mke2fs to force a sync after writing out every 10 block groups worth of inode tables.

If this fixes the problem, then it means that the kernel isn't handling write throttling correctly, and the system is thrashing itself to death. Write thottleing is one of these kernel bugs which gets fixed and broken in the kernel multiple times. I've considered making MKE2FS_SYNC the default, but I haven't, mainly because current behaviour is a great way of pointing out this write throttling bugs in the VM. (Stephen has fixed this bug multiple times over the years, and he suggested that having a good test case for noticing when someone has broken write throttling would be a Good Thing --- and it seems to get broken fairly often, as people try to make improvements to the VM layer.....)

Andrew agreed that setting MKE2FS_SYNC would solve the problem.

7. Status Of Sound Support For VIA VT8233A

21 Nov 2002 (2 posts) Archive Link: "PROBLEM: Sound VIA VT8233 on K7VTA3 motherboard"

Topics: Sound: ALSA

People: Jeff Garzik

Dmitry Kudryavtsev encountered problems getting sound to work with his VIA VT8233A on a K7VTA3 motherboard. Jeff Garzik replied, "Kernel 2.4.x does not support your audio chip. I hope to add support soon. ALSA 2.4.x or the in-kernel ALSA in 2.5.x does support your audio."

8. RTLinuxFree 3.2 Released

21 Nov 2002 - 25 Nov 2002 (2 posts) Archive Link: "ANNOUNCE RTLinuxFree 3.2"

Topics: Real-Time: RTLinux

People: Victor YodaikenPavel Machek

Victor Yodaiken announced:

RTLinuxFree 3.2-pre1 released. This release is a pre-release for RTLinuxFree 3.2. but is considered stable. It contains more than a year worth of contributions from us and others.

RTLinuxFree 3.2 is available at http://www2.fsmlabs.com/3.2-free.html

Timings with Linux 2.4.18 on a P4 are about 16 microseconds worst case error for a 1 millisecond periodic task. K7s are really good too. You can expect 30-40 microseconds on earlier machines.

Some days later, Pavel Machek asked, "What is difference between RTLinux and RTLinuxFree?" But there was no reply.

9. Tightening Up The inode Structure

21 Nov 2002 (5 posts) Archive Link: "[PATCH] kill i_dev"

Topics: BSD: FreeBSD

People: Andries BrouwerLinus Torvalds

Andries Brouwer posted a patch and said:

This serves to decrease the size of struct inode a bit, and makes sure that a later increase of sizeof(dev_t) does not make struct inode bigger than it used to be.

The i_dev field is deleted and the few uses are replaced by i_sb->s_dev.

There is a single side effect: a stat on a socket now sees a nonzero st_dev. There is nothing against that - FreeBSD has a nonzero value as well - but there is at least one utility (fuser) that will need an update.

Linus Torvalds applied the patch, and said:

Looking at the patch (not testing it), as far as I can tell we'll return a basically random number that is just whatever the anonymous super-block was allocated, right?

I'm not convinced that returning random numbers to user space is necessarily a great idea.. That said, I think we already do it for unnamed pipes anyway, so I'm more wondering if we should have some way to map these numbews (in user space) to a valid thing, so that they wouldn't just be random numbers.

(In other words: I like the patch, and I'm not really complaining about this new behavour at all. It's just the "randomness" as far as user space goes that bothers me a bit, since it seems to imply bad interface design).

Andries confirmed Linus' interpretation of the return value, but as for his suggestions about mapping the numbers to anything valid, he said, "I don't know. We can try in-kernel to give these well-known services well-known numbers. Or we can give them essentially random numbers like today but publish these somewhere, e.g. under /sys. Both are easy, but seem too heavy for a value used by nobody. We have process IDs and anonymous fs IDs, and both are just what they happen to be."

10. Compatibility Layer Between 32- And 64-Bit Architectures

21 Nov 2002 - 26 Nov 2002 (10 posts) Archive Link: "[PATCH] Beginnings of conpat 32 code cleanups"

Topics: POSIX

People: Stephen RothwellLinus TorvaldsDavid S. MillerAnton BlanchardPavel MachekAndi Kleen

Stephen Rothwell said:

There is a lot of duplicated code among the 32 compatibility layers in our 64 bit architectures. I am proposing to considate this as much as possible. To that end, I first need to tidy up the relevant header files and make them as common as possible. Discussions with Dave Miller, Andi Kleen, and Anton Blanchard has led to the creation of compat32.h to contain all the 32 compatibility data types.

This patch merely adds include/asm-generic/compat32.h which is the header information that is common to all the 32 bit compatibility code across all the architectures (except parisc as I don't pretend to understand that :-)).

I will follow this up with patches for each architecture that I can. I also intend to intruduce a CONFIG_COMPAT32 define that will be used to wrap generic implementations of some of the 32 bit compatibility code in our architecture independent code. The idea is to share as much of the non compatibility code and where that is not possible, to keep the 32 bit versions of code near their "normal" (i.e.64 bit) counterparts in order to minimise the number of bugs introduced and maximise the number of bugs fixed.

Pavel Machek offered to test anything Stephen had on x86-64. But Linus Torvalds said testily:

What kind of strange _crap_ is this?

        +typedef unsigned int           __kernel_size_t32;   
        +typedef int                    __kernel_ssize_t32;
        +typedef int                    __kernel_time_t32;
        +typedef int                    __kernel_clock_t32;
        +typedef int                    __kernel_pid_t32;
        +typedef unsigned int           __kernel_ino_t32;
        +typedef int                    __kernel_daddr_t32;
        +typedef int                    __kernel_off_t32;
        +typedef unsigned int           __kernel_caddr_t32;
        +typedef long                   __kernel_loff_t32;

You're doing a compat layer, and then you're using various undefined types that can be random sizes, and calling them xxx_t32.

For christ sake, somebody is on drugs here.

If they are called "xxx_t32", then that means that you _know_ the size already statically, and you should use "u32" or "s32" which are shorter and clearer anyway. You should sure as hell not use some random C type that can be different depending on compiler options etc, and then calling it a "compat" library.

Quite frankly, I don't see the point of this AT ALL. You're introducing new types that cannot be sanely used directly anyway. What's the point?

Make your compat stuff use u32/s32/u64 directly, instead of making up ugly new types that make no sense.

Anton Blanchard pointed out that Stephen was only merging what Anton, David S. Miller, and Andi Kleen had coded. And Stephen said to Linus, "If you had bothered to look, you would have noticed that these are taken straight from the compatibility code that already exists in all the 64 bit architectures that have 32 bit compatibility layers. I am into doing incremental changes that people can see easily are not harmful and then doing better cleanups later." He added, "Thanks for abusing me in public - its just what I needed after having my attempts at reducing the mess in the arch dependent code ignored for so long!" Linus replied:

It's _still_ crap.

The fact is, that a

unsigned int xxxx

in a <asm-i386/xxxxx.h> file is fine. On asm-i386, "unsigned int" has well-defined behaviour, and is a well-defined type within that world.

However, if you consolidate different files from different architectures, what is acceptable in some random architecture header file is _not_ any more acceptable in a "consolidated" entry. Suddenly, "unsigned int" has no well-defined meaning any more, and certainly not something like "long" which will sometimes change even on the same architecture depending on compiler options etc.

THAT is what I'm complaining about. Code sharing without thinking is BAD.

Get rid of made-up types and clean them up to use well-defined types. The whole point of your patch was to clan stuff up, no?

In other words: for something like this, don't even bother with strange types like "__kernel_loffset_t32". The type simply does not make sense. The only reason we have __kernel_xxxx types is to export architecture- specific stuff to user space through "types.h", without polluting the POSIX- mandated namespaces.

For the 32-bit compatibility stuff, that simply doesn't make sense:

This is why I think the whole file makes zero sense. Yes, people have copied crap before. But that doesn't make it any better.

11. Linux v2.5.49 Released

22 Nov 2002 - 26 Nov 2002 (10 posts) Archive Link: "Linux v2.5.49"

People: Linus Torvalds

Linus Torvalds announced 2.5.49 (http://www.kernel.org/pub/linux/kernel/v2.5/ChangeLog-2.5.49) and said:

Ok, I appear to be without network connectivity at home at least over the weekend, and master.kernel.org is going down for some maintenance this afternoon, so here's my current tree.

Architecture updates, threading improvements, shm fix (the cause of the Oracle problems), networking, scsi, modules, you name it, it's here.

Due to my lacking network connection over the weekend, I'd suggest discussing issues on linux-kernel, since emailing them to me won't much help ;/

The joys of switching ISPs..

12. Trouble With Module Code In 2.5

22 Nov 2002 - 24 Nov 2002 (10 posts) Archive Link: "New module loader makes kernel debugging much harder"

People: Keith OwensJamie LokierJohn LevonRusty Russell

Keith Owens complained:

The new module loader makes kernel debugging much harder.

There is no section information available, which means ...

The complete lack of kernel and module symbols (no /proc/ksyms) means that ksymoops is now useless on 2.5 kernels. If you get an oops in a kernel built without kksymoops, there is no way to decode the oops.

Jamie Lokier added, "Also "depmod" is now broken, so even with Rusty's replacement modutils a system won't demand load modules properly."

John Levon said, "Somebody (this includes Rusty himself) needs to come up with a workable solution. For my own needs, the start address of the mapped module is good enough, but it seems you need more than that. I'd be quite happy with the re-introduction of /proc/ksyms as it would mean no userspace code changes for me :)" At one point Rusty Russell said, "The patch to restore /proc/ksyms is trivial. As is the addition of a start entry /proc/modules. When combined with rth's simplified loader, it should be sufficient for both ksymoops and oprofile. kgdb needs a patch to work, anyway: you might want to restore /proc/ksyms in that patch? (I don't use kgdb, so my ignorance here is complete)."

13. syscalltrack 0.80 Released

23 Nov 2002 - 27 Nov 2002 (4 posts) Subject: "ANN: syscalltrack 0.80 "Tanned Otter" released"

Topics: Debugging

People: Muli Ben-YehudaPavel MachekMuli

Muli Ben-Yehuda announced:

syscalltrack-0.80, the 12th alpha release of the Linux kernel system call tracker, is now available. syscalltrack supports version 2.4.x of the Linux kernel on the i386 platform.

This release containes many bug fixes and logging improvements.

* What is syscalltrack?

syscalltrack is made of a pair of Linux kernel modules and supporting user space environment which allow interception, logging and possibly taking action upon system calls that match user defined criteria. syscalltrack can operate either in "tweezers mode", where only very specific operations are tracked, such as "only track and log attempts to delete /etc/passwd", or in strace(1) compatible mode, where all of the supported system calls are traced. syscalltrack can do things that are impossible to do with the ptrace mechanism, because its core operates in kernel space.

* Where can I get it?

Information on syscalltrack is available on the project's homepage: http://syscalltrack.sourceforge.net, and in the project's file release.

The source for the latest version can be downloaded directly from: http://osdn.dl.sourceforge.net/sourceforge/syscalltrack/syscalltrack-0.80.tar.gz or any of the other sourceforge mirrors.

* Call for developers:

The syscalltrack project is looking for developers, both for kernel space and user space. If you want to join in on the fun, get in touch with us on the syscalltrack-hackers mailing list (http://lists.sourceforge.net/lists/listinfo/syscalltrack-hackers).

* License and NO Warrany

syscalltrack is Free Software, licensed under the GNU General Public License (GPL) version 2. The 'sct_ctrl_lib' library is licensed under the GNU Lesser General Public License (LGPL).

syscalltrack is in _alpha_ stages and comes with NO warranty. We put it through extensive testing and routinely run it on our systems, but if it breaks something, you get to keep all of the pieces.

* PGP Signature

All syscalltrack releases from now on will be signed. This release is signed with my pgp public key, which you can get from http://www.mulix.org/pubkey.asc or via
'gpg --keyserver wwwkeys.pgp.net --recv-keys 0xBFD537CB'

Happy syscalltracking!

New in version 0.80, "Tanned Otter":

Pavel Machek asked what this could do that ptrace couldn't, and Muli replied:

Everything that stems from being 1) kernel based and 2) system wide. ptrace is inherently process based - "show me what this process did". syscalltrack is system wide - "show me *which* process did this or that." (You can probably emulate syscalltrack's system wide behaviour by ptracing init and all of its forked children, but your system will slow to a crawl. With syscalltrack, you'll barely feel anything.)

syscalltrack also has better filtering than strace, and supports actions - fail the system call if it passed that filter, suspend the process if it passed that filter, etc.

Basically, there are things which strace is good for, and there are things subterfuge is good for, and there are things syscalltrack is good for. Use the right tool for the job. You can see more about syscalltrack's capabilities on the website.

Pavel agreed that the speed difference was huge.

14. IDE Update

25 Nov 2002 - 27 Nov 2002 (4 posts) Archive Link: "[PATCH-2.5.47-ac6] More IDE updates (BIOS, simplex, etc)"

Topics: Disks: IDE, Hot-Plugging

People: Torben MathiasenAlan CoxAndre Hedrick

Torben Mathiasen posted a patch and explained to Alan Cox:

Please apply the attached patch. Its an update on my previos patch with a rewrite of the simplex code. The patch now does the following:

I'm not sure whether you already applied my previous patch, so let me know if you want this on top of that one.

Andre Hedrick was thrilled with this.

15. perfctr 2.4.2 Released

25 Nov 2002 (1 post) Archive Link: "perfctr-2.4.2 released"

Topics: Profiling, Real-Time

People: Mikael Pettersson

Mikael Pettersson announced:

perfctr-2.4.2 is now available at the usual place: http://www.csd.uu.se/~mikpe/linux/perfctr/

The perfctr-2.4 branch is again the main branch. The perfctr-3.1 branch is suspended since (a) it's only purpose was to be a cleaned up version for the 2.5 kernel, (b) it was ignored w/o comment, and (c) it removed too many features (module support, old kernels support) in its attempt to be as clean as possible.

Proper support for hyper-threaded P4s is next on my TODO list.

Version 2.4.2, 2002-11-25

16. Direction Of User-Mode Linux

25 Nov 2002 - 26 Nov 2002 (16 posts) Archive Link: "uml-patch-2.5.49-1"

Topics: FS: devfs, SMP, User-Mode Linux

People: Jeff DikeH. Peter AnvinAndi KleenAndreas DilgerJim Nance

Jeff Dike announced:

This patch merges the rework that has been in my 2.4 pool for the last month or so. I'm going to describe what happened in some detail since it hasn't been discussed on lkml at all, and there are some generic kernel changes involved which may be of wider interest.

The design of UML has had as its main points:

This is insecure, because protecting UML kernel data from its processes is hard to do right, and impossible to do quickly. UML does have a 'jail' mode which implements this, which is many times slower than non-jail.

It's also slow, because entry to userspace involves a signal delivery to the process entering the kernel and a signal return when leaving the kernel.

To fix these problems, I followed up an observation by Ingo a few months ago that a full process context switch is several times faster than an in-process signal delivery.

I implemented a new mode which puts the UML kernel into a completely separate address space from its processes. skas (== "separate kernel address space" - the traditional mode is now called tt (== "tracing thread")) mode has these main points:

skas mode has a number of advantages over the traditional tt mode:

There is one major disadvantage to skas mode - it can't be implemented given the support currently in the stock kernel.

I've added some stuff into the generic and i386 code to make skas mode possible. This support includes:

/proc/mm has the following semantics -

The ptrace extensions are:

The host support patch is available with all the other UML downloads at
http://user-mode-linux.sf.net/dl-sf.html

I welcome any comments on it. The /proc/mm write semantics are less than ideal - I especially would like suggestions for improvements.

The 2.5.49 UML patch is available at
http://uml-pub.ists.dartmouth.edu/uml/uml-patch-2.5.49-1.bz2

For the other UML mirrors and other downloads, see http://user-mode-linux.sourceforge.net/dl-sf.html

Other links of interest:

The UML project home page : http://user-mode-linux.sourceforge.net
The UML Community site : http://usermodelinux.org

Jim Nance really liked Jeff's UML work, but suggested using /dev/mm instead of /proc/mm. Jeff didn't see anything to gain from that, but H. Peter Anvin explained, "Access control, ability to work in a chroot, ..." Jeff said he'd make the switch if H. Peter really thought it would be an improvement, and H. Peter replied, "Absolutely. I think /proc is heavily overused as a really bad devfs."

Elsewhere, Andi Kleen asked, "Can you quickly describe why you didn't use one host process per uml process ? That would have avoided the need for a /proc/mm extension too I guess." Jeff explained:

Yes, it would have. And it was done that way during an intermediate stage of the skas work.

There were a few reasons for /proc/mm -

Some smaller reasons -

An unexpected benefit - UML is noticably faster with /proc/mm. That knocked ~10% off its kernel build time. With it doing a build about 40% slower than the host, the 10% reduction in overall run time represents ~25% reduction in UML's virtualization overhead.

Andreas Dilger asked:

How does GDB now distinguish between UML processes? Previously, with GDB and UML one would "det; att <host pid>" to trace another process. Will there be equivalent functionality in the new setup?

I was just thinking about hacking the UML PID allocation code so that the UML process PID == host process PID, so that it is easier to debug multiple kernel threads (which are all called "kernel thread" and are hard to align with a specific UML kernel thread).

Will SMP UML "just" be a matter of forking the host process and sharing the /proc/mm file descriptors, along with a UML SMP scheduler and some IPC to decide which host process is running each UML process?

Regarding how GDB could distinguish between UML processes, Jeff replied, "It now doesn't. What I'm considering is some function you can call from gdb which would longjmp to the stack that you want to look at and execute a breakpoint (or maybe just hit a breakpoint that was put there earlier). That should give you equivalent functionality to the current det/att." And for SMP UML, he said Andreas had the right idea. He said, "It's basically the same as SMP in tt mode, except that starting the idle threads will be slightly different."

17. Linux 2.4.20-rc4 Released

26 Nov 2002 (2 posts) Archive Link: "Linux 2.4.20-rc4"

Topics: FS: ReiserFS, Ioctls, Networking, Samba

People: Marcelo TosattiPaul MackerrasAdrian BunkOleg DrokinKai Germaschewski

Marcelo Tosatti announced:

The main reason for -rc4 is that the ISDN fix in -rc3 was problematic.

Summary of changes from v2.4.20-rc3 to v2.4.20-rc4
============================================

<marcelo@freak.distro.conectiva>:
  o Changed EXTRAVERSION to -rc4
  o Cset exclude: ralf@dea.linux-mips.net|ChangeSet|20021030170645|00078

<ralf@dea.linux-mips.net>:
  o The BLKGETSIZE ioctl expects an unsigned long argument

Adrian Bunk <bunk@fs.tum.de>:
  o Fix ips driver .text.exit errors

Kai Germaschewski <kai@tp1.ruhr-uni-bochum.de>:
  o ISDN: Fix the fix

Oleg Drokin <green@namesys.com>:
  o Do not allow to mount reiserfs with blocksize != 4k

Paul Mackerras <paulus@samba.org>:
  o PPC32: Fix arch/ppc/Makefile so it builds on POWER3
  o PPC32: Ignore SIGURG if not caught

Rui Sousa <rui.sousa@laposte.net>:
  o Emu10k1 bugfixes

18. Reverse-Mapping Virtual Memory Subsystem

26 Nov 2002 (1 post) Archive Link: "[PATCH] rmap 15a"

Topics: Big Memory Support, Big O Notation, SMP, Version Control, Virtual Memory

People: Rik van RielAndrew Morton

Rik van Riel announced:

The first maintenance release of the 15th version of the reverse mapping based VM is now available. This is an attempt at making a more robust and flexible VM subsystem, while cleaning up a lot of code at the same time. The patch is available from:

http://surriel.com/patches/2.4/2.4.19-rmap15a and http://linuxvm.bkbits.net/

My big TODO items for a next release are:

rmap 15a:

19. XFS Patches For 2.4.20-rc3

26 Nov 2002 (1 post) Archive Link: "Announce: XFS split patches for 2.4.20-rc3"

Topics: Access Control Lists, FS: XFS

People: Keith Owens

Keith Owens announced:

ftp://oss.sgi.com/projects/xfs/download/patches/2.4.20-rc3.

For some time the XFS group have been producing split patches for XFS, separating the core XFS changes from additional patches such as kdb, xattr, acl, dmapi. The split patches are released to the world with the hope that developers and distributors will find them useful.

Read the README in each directory very carefully, the split patch format has changed over a few kernel releases. Any questions that are covered by the README will be ignored. There is even a 2.4.20/README for the terminally impatient :).

20. Status List For 2.5

27 Nov 2002 (2 posts) Archive Link: "[STATUS 2.5] November 27, 2002"

Topics: Bug Tracking, Disks: SCSI

People: Guillaume BoissiereSteven Dake

Guillaume Boissiere posted his 2.5 status list (see the specifics (http://www.uwsg.indiana.edu/hypermail/linux/kernel/0211.3/0674.html) ), and explained:

Things seem to be stabilizing quite a bit. No new features but tons of bug fixes have been merged. Below the list of what I have pending in the status list along with the 58 open bugs currently in Bugzilla (http://bugzilla.kernel.org/ (http://bugzilla.kernel.org/)) )

Full status list is at http://www.kernelnewbies.org/status

Steven Dake replied, "It appears that some minor problems with the SCSI and FibreChannel hotswap driver have caused it to be rejected by the SCSI maintainers at this time. Unforutnately I don't have much time before 2.6 releases to work on it, so I'd ask if you could move this to a 2.7 delivery."

21. List Of Kernel Maintainers

27 Nov 2002 (2 posts) Archive Link: "lk maintainers"

People: Denis Vlasenko

Denis Vlasenko posted his list of kernel maintainers (see the specifics (http://www.uwsg.indiana.edu/hypermail/linux/kernel/0211.3/0636.html) and said:

This document is mailed to lkml regularly and will be modified whenever new victim wishes to be listed in it or someone can no longer devote his time to maintainer work.

If you want your entry added/updated/removed, contact me.

BTW, requests to move your entry to the top of the list without actually changing the text are fine too: that will indicate that entry is not outdated, so don't be shy ;-)

22. OSDL Scalable Test Platform Release 2.00

27 Nov 2002 (1 post) Archive Link: "[Announce] OSDL Scalable Test Platform Release 2.00"

Topics: Debugging

People: Craig Thomas

Craig Thomas announced:

OSDL has released a new version of the Scalable Test Platform, an automated test environment that allows a developer to test kernel patches on a variety of hardware platforms against a number of performance tests.

The code and documentation is at:

http://sourceforge.net/projects/stp/

OSDL's instance of STP is at:

http://www.osdl.org/stp.

Types of tests to exercise a kernel patch include:

Changes to the new release include the following:

We encourage the community to use the framework at OSDL to test your patches. Become an OSDL associate (free signup) at www.osdl.org (http://www.osdl.org) and take advantage of the the various platforms offered in the lab.

 

 

 

 

 

 

Sharon And Joy
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.