Kernel Traffic #180 For 18 Aug 2002

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1614 posts in 8177K.

There were 386 different contributors. 208 posted more than once. 156 posted last week too.

The top posters of the week were:

1. Status Of InfiniBand Support

28 Jul 2002 - 12 Aug 2002 (31 posts) Archive Link: "Re: [2.6] The List, pass #2"

Topics: Disks: SCSI, Feature Freeze, Microsoft

People: Roland DreierAlbert D. CahalanLinus TorvaldsBen GreearAustin GonyouRob LandleyMatt DomschGuillaume Boissiere

Guillaume Boissiere had written that InfiniBand support would probably not make it into 2.6, and would have to wait until 2.7; Albert D. Cahalan asked why this was. He said that if folks stuck to the simplest goals, such as SCSI or IP over InfiniBand, it wouldn't be so difficult to code. Roland Dreier pointed out, "Look at http://infiniband.sf.net to see all the infrastructure required just to get to the point of being able to start to write an IP-over-IB driver." Albert replied, "It's pretty obvious that you could do SCSI and IP with much less code" [...] "Ditch the lofty goals, and you might make the 2.6.xx kernel. You can stick to being a FireWire alternative for now." But Roland said, "I agree that it's a shame that the IB spec is so absurdly complex. And it probably is true that you could come up with a simplified way of using IB hardware that would be easier to code for. However, I don't think you'll find much interest in the IB world in implementing Linux-specific, non-interoperable, non-IB-spec-compliant software."

Linus Torvalds also replied to Albert's initial question, saying:

It's big, it's complex, and nobody seems to take it that seriously (the only people who ever asked _me_ about it was Intel, and they seem to have cancelled their own projects).

If it turns out to be a big hit, it can be backported. But as it looks now, it has very little relevance for any 2.6 freeze schedule.

Ben Greear added, "I read that Microsoft was cancelling support for it as well." Austin Gonyou said to Linus:

True, and to second that with some facts, companies who were making this their mainstay, are in much pain, on top of their economic status, because of Intel seemingly cancelling their projects.

Several companies in Austin, TX were chomping at the bit for people to come work on Infiniband technology, but there has been nothing new relating to this climate for a very long time...and probably won't be unless Intel, HP, IBM, someone with deep pockets and a marketing machine can show it's viable for consumer use.

Rob Landley replied:

Ah, the austin rumor mill.

According to a friend of mine who works at AMD:

  1. Intel licensed that hyper-transport thing when they licensed x86-64 (Yamhill?).
  2. Dell gave people refunds on the few itanium machines they actually managed to sell. (I believe the number I heard was a total of about two hundred and fifty total itanium "development" systems sold...)

There would appear to be a distinct trend here, but you know rumors... :)

Matt Domsch from Dell commented on the Dell rumor, saying, "In this case, rumors are completely untrue. :-) Standard Operating Procedure is to give refunds for systems returned within 30 days if the customer chooses to do that. We didn't do anything akin to a recall. In fact, we're still selling 4P Itanium servers (PowerEdge 7150)."

2. Ethernet Driver Documentation

5 Aug 2002 - 8 Aug 2002 (38 posts) Archive Link: "ethtool documentation"

Topics: Ioctls, Networking

People: Jeff GarzikTim HockinAbraham vd Merwe

Abraham vd Merwe asked if there were any documentation describing the ioctls that needed to be implemented in each ethernet driver, and Jeff Garzik replied, "Unfortunately not. There is a distinct lack of network driver docs at the moment... The best documentation is looking at source code of drivers that implement the most ioctls." But Tim Hockin said he'd written a quick overview document. He added, "I need to add docs for a few of the newer commands, still, and I want to get into the structs for each call in more detail, too. I want to re-examine a few of recent additions, before they become too ubiquitous - am I too late to pipe up for my own aesthetics?" He posted the entire doc:

These are the valid parameters to the SIOCETHTOOL ioctl(). Network drivers should support these as much as possible.

ETHTOOL_GSET
ETHTOOL_SSET

Get/set NIC settings. These commands expect a 'struct ethtool_cmd *' argument. This struct includes fields for supported features (speed, duplex, transceiver), advertised features, speed, duplex, port, transceiver, and autonegotiation. If the caller attempts to set an invalid value for any field, return -EINVAL.

ETHTOOL_GDRVINFO

Get driver information. This command expects a 'struct ethtool_drvinfo *' argument. This struct includes the driver identifier as a string, the driver version as a string, bus information for the interface, and length information for other ETHTOOL_* commands.

ETHTOOL_GREGS

Get a register dump from the NIC. This command expects a 'struct ethtool_regs regs *' argument. This struct has a driver-specific version field and a length field. The length field indicates the length of the data field to be populated with register information.

ETHTOOL_GWOL
ETHTOOL_SWOL

Get/set wake-on-lan options for the NIC. These commands expect a 'struct ethtool_wolinfo *' argument. This struct has fields for supprted and active WoL options, and the SecureOn password, if active. If the caller attempts to set an invalid value, return -EINVAL.

ETHTOOL_GMSGLVL
ETHTOOL_SMSGLVL

Get/set the driver message-level value for the NIC. This command expects a 'struct ethtool_value *' argument.

ETHTOOL_NWAY_RST

Force auto-negotiation to restart, if it is enabled. If it is not enabled, return -EINVAL.

ETHTOOL_GLINK

Read the current link status. This command expects a 'struct ethtool_value *' argument.

ETHTOOL_GEEPROM
ETHTOOL_SEEPROM

Get/set EEPROM data. These commands expect a 'struct ethtool_eeprom *' argument. This struct has a magic number, an offset and length pair, and a data field. If the offset+length are longer than the maximum size, the extra is silently ignored.

ETHTOOL_GCOALESCE
ETHTOOL_SCOALESCE

Get/set coalescing parameters. These commands expect a 'struct ethtool_coalesce *' argument. This struct has several fields for configuring coalescing - see ethtool.h for details. If the caller attempts to set an invalid value, return -EINVAL.

ETHTOOL_GRINGPARAM
ETHTOOL_SRINGPARAM

Get/set RX/TX ring parameters. These commands expect a 'struct ethtool_ringparam *' aargument. This struct has fields for several rx pending options, and tx pending. If the caller attempts to set an invalid value, return -EINVAL.

ETHTOOL_GPAUSEPARAM
ETHTOOL_SPAUSEPARAM

Get/set the RX/TX pause parameters. These commands expect a 'struct ethtool_pauseparam *' argument. This struct has fields to enable autonegotiation of pause parameters and to force RX and TX pause control.

ETHTOOL_GRXCSUM
ETHTOOL_SRXCSUM

Get/set the RX hardware checksum capability/flag. These commands expect a 'struct ethtool_value *' argument. If the caller attempts to enable RX hardware checksumming on an interface that does not support it, return -EINVAL.

ETHTOOL_GTXCSUM
ETHTOOL_STXCSUM

Get/set the TX hardware checksum capability/flag. These commands expect a 'struct ethtool_value *' argument. If the caller attempts to enable TX hardware checksumming on an interface that does not support it, return -EINVAL.

ETHTOOL_GSG
ETHTOOL_SSG

Get/set the scatter/gather capability/flag. These commands expect a 'struct ethtool_value *' argument. If the caller attempts to set an invalid value, return -EINVAL.

ETHTOOL_TEST
/* execute NIC self-test, priv. */

ETHTOOL_GSTRINGS
/* get specified string set */

ETHTOOL_PHYS_ID
/* identify the NIC */

ETHTOOL_GSTATS
/* get NIC-specific statistics */

3. driverFS API Updates

5 Aug 2002 - 8 Aug 2002 (9 posts) Archive Link: "driverfs API Updates"

Topics: FS: driverfs

People: Patrick Mochel

Patrick Mochel described:

A series of driverfs changes went into Linus' tree last Friday. This is a short summary of those changes, and some notes on how to use them.

Firstly, I changed the name of the structure that must be declared from struct driver_file_entry to struct device_attribute. This more accurately represents what is going on.

I've also created a macro[1] for defining device attributes, that goes like this:

DEVICE_ATTR(name,"strname",mode,show,store);

This will create a structure by the name of 'dev_attr_##name', where ##name is the first parameter, which can then be passed to device_create_file(). [2]

The definition of device_remove_file has changed to:

void device_remove_file(struct device * dev, struct device_attribute * attr);

(The second parameter is now the same type as what is passed to device_create_file, for consistency's sake (instead of a char *)).

I've added support for bus and device driver attributes. The mechanism for manipulating them is analogous to that of device attributes. To declare them, you do:

BUS_ATTR(name,"strname",mode,show,store); DRIVER_ATTR(name,"strname",mode,show,store);

which create the objects

struct bus_attribute bus_attr_##name;

and

struct driver_attribute driver_attr_##name;

respectively.

You can then use

int bus_create_file(struct bus_type *, struct bus_attribute *); void bus_remove_file(struct bus_type *, struct bus_attribute *);

int driver_create_file(struct device_driver *, struct driver_attribute *); void driver_remove_file(struct device_driver *, struct driver_attribute *);

To add and remove them.

Bus attribute files appear in the bus's directory (bus/<bus>/ under the driverfs mountpoint).

Driver attributes appear in the driver's directory (bus/<bus>/<driver>/ under the driverfs mountpoint).

The bus show and store routines are identical to the device show and store routines, though they take a pointer of type struct bus_type as the first parameter.

Similarly for drivers; they take a struct device_driver * as the first parameter.

Please see include/linux/device.h for the structure definitions, and drivers/base/fs/*.c for implementation details.

driverfs now has the ability to support attributes for any object type. I've updated the documentation (Documentation/filesystems/driverfs.txt) to hopefully include enough information to hack on it. I'm also open to all questions and suggestions, so please feel free to ask.

4. uClinux With Memory Management

6 Aug 2002 - 8 Aug 2002 (6 posts) Archive Link: "uclinux on MMU platforms - query"

Topics: User-Mode Linux, Virtual Memory

People: Alan CoxGreg UngererDavid Weinehall

Amol Lad asked if uClinux would run on a platform that did have a memory management unit (MMU). Greg Ungerer said it was theoretically possible, but that if you could use virtual memory, you were better off doing so than not, so uClinux would not be the best choice for that hardware. But Alan Cox said, "Being able to run true ucLinux on i386 makes debugging and verification of software so much less painful sometimes." Greg replied, "For some things yes. But it is a real pain trying to track down memory corruption and stack overflow problems in applications. They have a tendency to take your the whole system..."

At this point David Weinehall joked that a uClinux version of UML would be great, and Greg laughed, saying, "We sort of have better than that now. There are quite a few emulators that run under standard Linux that will quite happliy run uClinux. There is xcopilot (m68328), coldfire (5206), ARMulator (ARM), tsim (SPARC leon) and or1ksim (OPENcores OR1000). I am sure there is more!"

5. Status Of NTFS Write Support

6 Aug 2002 - 8 Aug 2002 (5 posts) Archive Link: "[BK-PATCH-2.5] NTFS 2.0.24: Cleanups"

Topics: FS: NTFS

People: Anton AltaparmakovErik AndersenChristoph HellwigAdam J. RichterDenis Vlasenko

Anton Altaparmakov announced an NTFS update, explaining, "This is just some cleanups, mostly the BUG_ON() cleanups from Adam J. Richter. Just to relax a bit after the big changes in the last ntfs update. (-;" He posted the ChangeSet:

NTFS: 2.0.24 - Cleanups.

Erik Andersen noticed the item about read-write remounts, and remarked, "I thought the current NTFS driver does not yet support writing..." Anton explained, "Correct, and if you look at the code you will notice the #ifdef NTFS_RW around it... The read-only compiled driver doesn't have any write related code. Only the read-write compiled driver has, but at the moment this is just adding necesary safety bits before starting to add actual write code. Writing is under development and you will be seing more and more bits related to it appearing. (-:" Erik was very happy about this, for the occassional times he needed to interoperate with Windows 2000; and Denis Vlasenko also gave the thumbs up, offering to help write and test the code.

6. Tigon3 Crash Bug

7 Aug 2002 - 8 Aug 2002 (23 posts) Archive Link: "kernel BUG at tg3.c:1557"

Topics: Disk Arrays: RAID, Networking, PCI

People: Roland KuhnAlan CoxDavid S. Miller

Roland Kuhn reported, "On a dual Athlon MP with a 3ware-7850 RAID (640GB RAID-5) and 3C996B-T GE NIC I can crash the machine with the above BUG message in virtually no time simply by copying data both ways between the RAID and the NIC. The BUG message shows that this can happen any time, it doesn't matter if the interrupt is received in cpu_idle or something else. I tried noapic, but to no avail." Alan Cox replied, "I've never been able to get a broadcom chipset ethernet card stable on a dual athlon with AMD 76x chipset. I have no idea what the problem is although it certainly appears to be PCI versus main memory ordering funnies." David S. Miller hung on a bit longer with a few suggestions and patches, but eventually he said, "I'm stumped, sorry." Pressing on by himself, Roland said:

Just out of curiosity I tried it with

static void tg3_write_mailbox_reg32(struct tg3 *tp, u32 off, u32 val)
{ 
        unsigned long flags;

        spin_lock_irqsave(&tp->indirect_lock, flags);
        writel(val, tp->regs + off);
        spin_unlock_irqrestore(&tp->indirect_lock, flags);
}

and that plain works. That means that only the mailbox writes have additional locking around the otherwise unchanged writel() call. Does the spin_lock_irqsave/spin_unlock_irqrestore take care of the PCI ordering?

And he replied to himself with:

While loading properly, this still crashed the machine. After giving it some thought I tried to add a dummy pci_read_config_dword() just before the writel(), and that works! I use this function both for tw32 and tw32_mailbox. I hammered it over one hour with a script that crashed it always in five seconds, and not so much as a hiccup :-)

Only one question is left: can this effect be achieved more elegantly?

But there was no reply.

7. Sharing Thread Credentials

8 Aug 2002 - 12 Aug 2002 (14 posts) Archive Link: "[PATCH 2.5.30+] Second attempt at a shared credentials patch"

Topics: BSD, FS: InterMezzo, FS: NFS, POSIX, SMP, Version Control

People: Dave McCrackenTrond MyklebustLinus Torvalds

Dave McCracken posted a patch and announced:

This patch allows tasks to share credentials via a flag to clone().

This version fixes the problem with exec() that Linus found. Tasks that call exec() get their own copy of the credentials at that point.

The URL is here because it's too big to include in email:

http://www.ibm.com/linux/ltc/patches/misc/cred-2.5.30-3.diff.gz

The patch is against Linus' BK tree as of this morning.

Trond Myklebust complained that the patch was too big and monolithic. He pointed out an NFS bug that had slipped into the patch, saying, "Instead of doing this as one big unreadable monolithic patch and risking getting things wrong like in the above case, it would be nice if you could go via a set of wrapper functions" [which] "would allow you to make the changes to the lower level filesystem code in smaller babysteps, and make the actual move to 'struct cred' a trivial patch..." He added, "As I argued before when Ben first presented this, that will also allow us the flexibility to change the structure at a later date. Several filesystems could benefit from a shared *BSD-style 'struct ucred' to replace the tuple current->{ fsuid, fsgid, groups }."

Dave found the bug and posted an updated patch, but didn't like Trond's suggestion. He said, "I *could* do a big monolithic patch to change all references to current->*id to macros, then change the macros in a separate patch. But then we'd be stuck with macros for all those references forever, and they're not likely to change again any time soon. I don't think we'd really want to have macros for all our structure references on the off chance that someone might change it in the future." Trond explained:

Macros (and inlined functions) have the advantage that they enforce good policy. Doing 'task->cred->uid = a' on tasks other than 'current' is in general not a very safe thing to do. This sort of issue w.r.t. safe policies should in particular be worrying you when you start adding CRED_CLONE...

There are good precedents for this sort of argument: see 'set_current_state()' & friends.

In addition, those macros would allow you to set up compatibility with 2.4.x and simplify patch backports.

As for changing the structure: As I said previously I'd like to unify all those { fsuid, fsgid, group } things into a proper ucred, so that we can share these objects around the VFS, and cache them... Your 'struct cred' as it stands will not suffice to do all that since it does not provide the necessary Copy On Write protection. (For instance if some thread temporarily raises my process privileges, I will *not* want all my already opened 'struct file's to suddenly gain root access).

Dave was iffy about Trond's example. He said, "If a thread makes a system call to change its credentials, all other threads should see it. That's POSIX behavior, and the whole point of the patch. If you're talking about kernel code that assumes another identity under the covers, then yes, that's interesting. And could be achieved by allocating a temporary cred structure and attaching it to the task for the duration of the operation." Trond replied:

Authentication under UNIX usually requires you to check the process' uid/gid/groups affiliation. As such, it is useful to be able to pass that information around the kernel. Most OSes use some variation of the BSD 'ucred' structure which is reference counted and obeys COW (copy on write).

struct ucred {
  atomic_t count;
  uid_t    uid;      /* == fsuid if you like */
  gid_t    gid;      /* == fsgid  "  "   "   */
  int      ngroups;
  gid_t    *groups;
};

This means that 'struct file', the underlying filesystems, whoever else... can hold a reference to the above structure and be assured that it will never change. Changing the fsuid etc. are extremely rare operations compared to opening/closing a file, so the whole idea is precisely to *avoid* having to copy the above information all the time (which, given all the races that CLONE_CRED introduces, is a good thing).

As for POSIX behaviour: it is quite compatible with the above. The only change would be that your shared 'struct cred' would require a reference to a struct ucred rather than including fsuid, fsgid, groups as cred structure members.

Note: Given that Linux has adopted the 'capability' model on top of the standard UNIX authentication model, it might perhaps be necessary to move the capabilities into the ucred in order to make them COW too?

This made sense to Dave. He said it was a good idea, but not really related to his patch, and he didn't want to tie them together.

Earlier in the discussion, when Trond had said that macros and inline functions enforced good policy, Dave had replied, "I don't really see the benefit. The macros you're talking about are only there to provide different behavior for MP and UP. There aren't macros for any of the other shareable structures hanging off the task struct." And Trond said, "... which begs the question: are you saying that there are no SMP issues with CLONE_CRED and setting/reading the 'struct cred' members?" Dave replied, "Yes, I'm saying there are no SMP issues with the shared cred structure. I looked for them and failed to find any. Credentials are not set cross-task, and are always done via atomic ops. I also failed to find any broader race conditions that would require a lock." But Trond said:

What if one thread is doing an RPC call while the other is changing the 'groups' entry?

Given that the first thread wants both to copy and/or compare the groups entry what prevents scenarios of the form?

  Thread 1                       Thread 2

                                change cred->ngroups
  copy cred->ngroups
  copy cred->groups
                                change cred->groups

Dave's eyes bugged out, and he said, "Good point. Ok, I've added locking to the cred structure to handle this." He posted another patch (http://www.ibm.com/linux/ltc/patches/misc/cred-2.5.30-5.diff.gz) and said he didn't see any other places where locking was necessary. At this point Linus Torvalds chimed in, with:

Please don't do this with locking, I really think the right thing to do is to have a "duplicate()" function, and when you pass credentials off to something, you dup them at that point.

If you start off as non-root, and then execve suid into root, a pending NFS request should _not_ suddently have the credentials changed under it. Yet clearly that kind of thing can't just be locked either.

Along with copy-on-write semantics, this should perform perfectly well (ie "duplicate()" would only increment a count, and then setuid() would have to have code soemthing like

        if (cred->count > 1) {
                newcred = alloc_cred();
                copy_cred(newcred, cred);
                for_each_cred_group(p) {
                        p->cred = newcred;
                        atomic_inc(&newcred->count);
                        putcred(cred);
                }
        }

instead.

There was no reply to this, but Trond also said to Dave, "Err... Well my original point about your changes to the sunrpc code still stand: no spinlocking there AFAICS. In addition, you'll want to talk to the Intermezzo people: they do allocation of buffers based on the (volatile) value of cred->ngroups." Dave agreed he'd missed the sunrpc case, and posted another patch (http://www.ibm.com/linux/ltc/patches/misc/cred-2.5.30-6.diff.gz) . Trond still had some problems with the patch, and they probably took it to private email.

8. Linux Test Project Update

8 Aug 2002 (1 post) Archive Link: "ANNOUNCE: August Linux Test Project Announcement"

Topics: Bug Tracking, Networking, Version Control

People: Airong ZhangNathan StrazDavide LibenziJeff MartinPaul Larson

Airong Zhang announced:

The Linux Test Project test suite LTP-20020807.tgz has been released. Visit our website ( http://ltp.sourceforge.net ) to download the latest version of the test suite, and for information on test results on pre-release, release candidate and stable releases of the kernel. There is also a list of test cases that are expected to fail, please find the list at http://ltp.sourceforge.net/expected-errors.php.

Highlights

We encourage the community to post results, patches or new tests on our mailing list and use the CVS bug tracking facility to report problems that you might encounter with the test suite. More details are available at our web-site.

Change Log

CVS Bugs closed

#591695 getgroups03 fails on some distros
#591698 sendfile02 fails with 2.5 kernels

Acknowledgements

9. Daily Snapshots Of The Unstable Series

8 Aug 2002 - 9 Aug 2002 (10 posts) Archive Link: "Announce: daily 2.5 BK snapshots"

Topics: SMP, Version Control

People: Jeff GarzikRik van RielPaul LarsonLarry McVoyDavid WoodhouseH. Peter Anvin

Jeff Garzik announced:

Since Linus does not do pre-patches anymore, he mentioned some time ago it would be nice if somebody created an automated BK snapshot process to make BK changes accessible between kernel releases. I've done that.

ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/snap/2.5/

All BK changes not in the current 2.5 kernel release will be posted at this URL, in GNU patch form, on a daily basis. [snap happens at one minute past midnight, California time] The full changelog log is also extracted. Each time Linus releases a new kernel, the contents of this directory will be wiped out.

He added in reply to himself:

Two more details:

H. Peter Anvin asked Jeff in private, if they could choose a directory that wasn't in people/, but Jeff replied on the list, "I would like to test it for a while and see if Linus objects, before doing so..."

Rik van Riel also replied to Jeff's initial announcement:

Heh, I've had something vaguely like this on NL.linux.org:

ftp://ftp.nl.linux.org/pub/linux/bk2patch/

Every 3 hours it creates a unidiff between the latest tagged version and the head of the bk tree, for both 2.5 and 2.4.

Jeff replied:

Just to forestall other private responses [already gotten two], mine is slightly different than your's, and David Woodhouse's setup. My goal was basically to create a daily pre-patch, complete with hacked EXTRAVERSION. That's something that is familiar to testers (pre-patch form), and the snapshot is not so often that people will get buried in a flurry of patches and csets. can you say "2.5.30-bk439" ;-)

So I consider my dailies as a complement to your bk2patch and dwmw2's output, not redundant. Programmers would probably find dwmw2's per-cset patches to be more useful, while testers and power users, and maybe maintainers, would prefer daily pre-patches to test and sync against.

Paul Larson described his own system, saying, "I have a setup that does a nightly pull of 2.5, builds it for UP and SMP, pushes to two machines (UP and SMP) and runs LTP on it. Then sends me back the results of all of it. Of course if something fails that didn't fail the previous day, I have a limited set of Changesets as culprits so it's easier for me to find the cause of problems when I do more frequent testing like this. Any major problems are reported immediatly of course, but would anyone be interested in seeing the results of this more often? I don't know if I have enough space on the LTP website to post all the data that's gathered every single day (It would add up REALLY fast), but would a weekly rollup to lkml be something people would like to see?" Larry McVoy pointed out, "if you look at Documentation/BUG-HUNTING which describes how to do binary search to track down bugs, you'll notice you can now do the same thing with BK at a much finer granularity. It's possible to track down bugs to the changeset which caused the bug, rather than the release. Which is what Paul is talking about, but he's talking about doing it over a small set of csets. You can do it over a large set of csets as well. File this away as a thing you can do and if you ever need the details, contact me and I'll walk you through it if it isn't obvious."

10. Status Of -dj Series

8 Aug 2002 - 13 Aug 2002 (4 posts) Archive Link: "What patches I need for s stable 2.5.x"

Topics: Disks: IDE

People: Alan CoxDave JonesBrad Chapman

Brad Chapman asked what patches to apply to the 2.5 tree to get something stable. Alan Cox replied, "The IDE stuff might get you a usable 2.5, even then the error handling is not correct in all cases so treat it with care. On my test boxes the foreport didnt hang the machine the way 2.5.* did so its an improvement. You might just want to follow the 2.5.*-dj tree though." Brad saw the wisdom of that, and asked if there were any caveats with regard to Dave Jones' tree. Dave replied, "The biggest gotcha right now is that it's lagging behind mainline. (Current is 2.5.30-dj1, which doesn't actually boot.. Last working one was against 2.5.27). Hopefully I'll be back on track by the end of the week."

11. PCI Fix For NUMA-Q

8 Aug 2002 (2 posts) Archive Link: "[patch] PCI configuration fix for NUMA-Q"

Topics: PCI

People: Matthew DobsonAlan Cox

Matthew Dobson said, "The PCI code for NUMA-Q machines has been broken for a while... The kernel currently can't find PCI busses on quad's other than the first. This patch fixes that problem. Please apply." Alan Cox replied, "Its a tiny bit more code to implement a set of pci ops instead of hacking CONFIG_MULTIQUAD into the core code and it gets rid of the ifdefs for BUS2QUAD and the like if you instead of ideffing it all split the pci_conf1_ ops so you have your own copy with BUS2QUAD bits called pci_conf1_mq_.... and put the originals back cleanly without multiquad." He posted some sample code, and added, "Less ifdefs, less magic macros, minutely better performance and it scales for future stuff when Intel/Dell/whoever releases their NUMA chipset (See 2.4.20pre-ac although the effect is less obvious there as its all in one file anyway in 2.4)"

12. klibc And Licensing

8 Aug 2002 - 11 Aug 2002 (34 posts) Archive Link: "klibc development release"

Topics: BSD, FS: initramfs, FS: ramfs, FS: rootfs, Klibc, Version Control

People: H. Peter AnvinAlexander ViroOliver XymoronAlbert D. CahalanRob Landley

H. Peter Anvin announced:

Okay, I'm starting to have enough guts about this to release for testing...

klibc is a tiny C library subset intended to be integrated into the kernel source tree and being used for initramfs stuff. Thus, initramfs+rootfs can be used to move things that are currently in kernel space, such as ip autoconfiguration or nfsroot (in fact, mounting root in general) into user space.

I would particularly appreciate portability comments, since I'm flying blind on non-i386 machines (anyone want to send me hardware?), although any bug reports would be appreciated.

ftp://ftp.kernel.org/pub/linux/libs/klibc/klibc.tar.gz

I haven't bothered putting a version number on it, since it changes quite often. I have also published the CVS repository in the directory above.

Albert D. Cahalan asked if he could link 4-clause BSD code against klibc, since the 4-clause BSD license was incompatible with the GPL. H. Peter replied that he was planning to release klibc under a BSD-style license, either 3-clause BSD, MIT or the X license. Rob Landley asked what was wrong with the LGPL, and Alexander Viro replied, "klibc is static-only. So for all practical purposes LGPL would be every bit as viral as GPV itself." Oliver Xymoron replied:

You say that as if it were a bad thing.

(And technically incorrect, if you also provide .o files so that the end user can relink as they desire.)

That aside for the moment, isn't the plan to pull stuff that's currently GPL out of the kernel and link against this lib anyway?

Second point - if it goes into the kernel source tree as suggested, but with a 'copycenter' license, we can expect to have the nVidia problem but worse.

Making it GPL shouldn't be a big deal. It is intended to be a small amount of code, after all. And I'd hate to get into a situation where people started shipping their magic 'make the hardware work' bits as closed replacements for the early boot code.

Alexander replied:

I have no problems with people choosing whatever license they prefer for their work and I have no problems with using GPL when I'm working on projects that are already under it, but it's not the license I would choose for my work in cases when I have a choice.

As for the "make the hardware work" code, there's nothing to stop people from doing that _NOW_. I'm not too fond of that, but as long as we are talking about userland code it

We are talking about libc. _Nothing_ in that code couldn't be reimplemented by any half-competent programmer. It's a textbook stuff. Those who don't like GPL would be trivially able to reimplement all these functions in their own code anyway. End of story. Whatever license is chosen, it won't prevent people from putting their code under any license they like.

There is a crucial difference from the situation with nVidia, Veritrash and the rest of let's-bugger-the-kernel team. _They_ want more than using syscalls from user mode - they want an access to guts of the kernel and that's a very different can of worms. And _that_ I have problems with. A lot. Especially when they expect us to abstain from changes of kernel internals that might break their junk and when they whine when such changes are done.

People do have a right to put their code under whatever license they like. Now, _I_ won't use the stuff I don't have a source for unless I have exceptionally good reason to believe that authors of that stuff are among the few percents of programmers who *can* find their arse without outside help. But that has nothing to do with licensing or any moral considerations and everything to the fact that I know what kind of crap most of the software is.

13. KDB Update

8 Aug 2002 - 10 Aug 2002 (2 posts) Archive Link: "Announce: kdb v2.3 i386 updates for kernels 2.4.18 and 2.4.19"

People: Keith Owens

Keith Owens announced an update (ftp://oss.sgi.com/projects/kdb/download/v2.3/) of the kernel debugger for 2.4.18 and 2.4.19.

14. IDE Update

9 Aug 2002 - 13 Aug 2002 (7 posts) Archive Link: "[PATCH] 2.5.30 IDE 115"

Topics: Disks: IDE, Ioctls

People: Marcin DaleckiJens AxboeAdam J. RichterMorten Helgesen

Marcin Dalecki posted an IDE update and announced:

- Fix small typo introduced in 113, which prevented CD-ROMs from working altogether.

- Eliminate block_ioctl(). This code can't be shared in the way proposed by this file. We will port it to the proper blk_insert_request() soon. This will eliminate the _elv_add_request() "layering violation".

- Don't play IRQ wreck havoc in ata_dump().

- Fix delays on seeks in ide-cd.c

- Integrate special_intr() and tcq_nop_intr() in to one single IRQ handler. This way we don't have to kmalloc anything for sending a NOP to the drive in TCQ.

- Eliminate the usage of the XXX_handler from the ata_taskfile structure. Now we always deduce which handler will be needed from the command which will be executed. This makes the usage of the channel IRQ handler pointer much more cleaner now.

- Don't pass taskfiles through rq->special. Pass them through rq->cmd[] as every other block device does or at least should do. This way we don't pass pointers to structures on local stack around any more.

- pdc4030 code doesn't have anything to do with the normal taskfile stuff.

Marcin and Jens Axboe proceeded to argue about various technical details, and at one point Jens remarked, "I would appreciate it if you would keep your hands out of the block code." Marcin replied, "OK. I have enough." Adam J. Richter interpreted this exchange as Jens asking Marcin to step down as IDE maintainer, and Marcin agreeing. He queried both of them privately, and said on-list:

I have confirmed by email with Jens (cc'ed to Martin) that Jens did not mean that Martin should step down as IDE maintainer or anything like that.

Jens was referring to the generic block code that he maintains (including elv_add_request and drivers/block/block_ioctl.c, which Martin had submitted patch for in IDE 115 without consulting with Jens).

Personally, I hope that Martin stays on as IDE maintainer. Getting IDE to a maintainable state was a minefield that had to be crossed. Could someone else have done with fewer mistakes? Maybe, but there was plenty of time for someone else to do it, and nobody stepped up to the plate. Of course there is a trade-off point that point is more conservatively set with the software that controls disk storage, but, in general, I think it's important to be supportive of those who actually produce.

Morten Helgesen hadn't thought the Jens/Marcin exchange was unclear, but added, "I agree with you - there`s no point discussing whether or not someone else would have been able to do what Martin has done with fewer mistakes - I think we should focus on helping Martin make 2.5 IDE stable ..."

15. Maintainer List

9 Aug 2002 (1 post) Archive Link: "lk maintainers"

Topics: Bug Tracking, Disk Arrays: RAID, Disks: IDE, Disks: SCSI, FS: NFS, FS: NTFS, FS: ReiserFS, FS: autofs, FS: devfs, FS: ext2, FS: ext3, FS: smbfs, Framebuffer, Hot-Plugging, I2O, Kernel Build System, Networking, PCI, Real-Time: RTLinux, Samba, Serial ATA, Software Suspend, Sound: ALSA, Spam, USB, Virtual Memory

People: Denis VlasenkoTrond MyklebustArnaldo Carvalho de MeloAlexander ViroHans ReiserRik van RielLinus TorvaldsVojtech PavlikGeert UytterhoevenJeff GarzikAndre HedrickGreg KHJaroslav KyselaAnton AltaparmakovTigran AivazianMartin J. BlighArjan van de VenEric S. RaymondMike PhillipsOleg DrokinH. Peter AnvinAlan CoxPavel MachekDave JonesRichard GoochAndrew MortonJens AxboeIngo MolnarVictor YodaikenJames SimmonsTim WaughRusty RussellGerd KnorrAndrea ArcangeliMartin DaleckiDavid S. MillerRogier WolffUrban WidmarkPetr VandrovecMarcelo TosattiNeil BrownRalf BaechleRussell KingKeith OwensRobert LoveMaksim Krasnyanskiy

Denis Vlasenko posted the kernel maintainers list, saying, "This document is mailed to lkml regularly and will be modified whenever new victim wishes to be listed in it or someone can no longer devote his time to maintainer work. If you want your entry added/updated/removed, contact me." The list:

So, you are new to Linux kernel hacking and want to submit a kernel bug report or a patch but don't know how to do it and _where_ to report it? Then save this file for future reference.

Preparing bug report:

Compile problems: report GCC output and result of "grep '^CONFIG_' .config" Oops: decode it with ksymoops Unkillable process: Alt-SysRq-T and ksymoops relevant part Yes it means you should have ksymoops installed and tested, which is easy to get wrong. I've done that too often.

More info in the FAQ at http://www.tux.org/lkml/

Sending bug report/patch:

Current Linux kernel people

Note that this list is sorted in reversed date order, most recent entries first. This means than entries at bottom can be outdated :-(

Linux kernel mailing list <linux-kernel@vger.kernel.org>
        Post anything related to Linux kernel here, but nothing else :-)

Ingo Molnar <mingo@elte.hu> [30 jul 2002]
        Ingo wrote the new scheduler for 2.5.

Ralf Baechle <ralf@uni-koblenz.de> [30 jul 2002]
        I am maintainer of the AX.25 code

Victor Yodaiken <yodaiken@fsmlabs.com> [30 jul 2002]
        RTLinux patches, updates, contributions, drivers.
        Please send first to the list: rtl@rtlinux.org

Pavel Machek <pavel@ucw.cz> [27 jul 2002]
        I am network block device maintainer. Visit http://nbd.sf.net.
        (see Steven Whitehouse <steve@gw.chygwyn.com> entry)
        I am working on software suspend.

William Irwin <wli@holomorphy.com> [02 jul 2002]
        Send bug reports and/or feature requests related to many tasks,
        rmap, space consumption, or allocators to me. I'm involved in
        * rmap
        * memory allocators
        * reducing space consumed by data structures (e.g. struct page)
        * issues arising in workloads with many tasks
        * kernel janitoring
        See also:
        Rik van Riel <riel@surriel.com>
        Andrea Arcangeli <andrea@suse.de>
        Martin Bligh <Martin.Bligh@us.ibm.com>
        Andrew Morton <akpm@zip.com.au>

Dave Jones <davej@suse.de> [23 apr 2002]
        I collect various bits and pieces for inclusion in 2.5,
        especially small and trivial ones and driver updates.
        I'll feed them to Linus when (and if) they
        are proved to be worthy.

Andre Hedrick <andre@linux-ide.org> [09 apr 2002]
        ATA/ATAPI Storage Architect [2.0,2.2,2.4]
        HBA interface developer
        Serial ATA Architect [future release]
        Voting NCITS member AT-Attachment Committee

Andrea Arcangeli <andrea@suse.de> [28 mar 2002]
        Send VM related bug reports and patches to me.
        I'm especially interested in VM issues with:
        * lots of RAM and CPUs
        * NUMA
        * heavy swap scenarios
        * performance of I/O intensive workloads (in particular
          with lots of async buffer flushing involved)
        See also Martin J. Bligh <Martin.Bligh@us.ibm.com> entry
        Mail also:
        Arjan van de Ven <arjanv@redhat.com>

Martin J. Bligh <Martin.Bligh@us.ibm.com> [28 mar 2002]
        I'm interested in VM issues with lots (>4G for i386)
        of RAM, lots of CPUs, NUMA

Steven Whitehouse <steve@chygwyn.com> [27 mar 2002]
        I am the Linux DECnet network stack maintainer
        Visit http://www.chygwyn.com/decnet/

Arnaldo Carvalho de Melo <acme@conectiva.com.br> [26 mar 2002]
        IPX, 802.2 LLC, NetBEUI, http://kerneljanitors.org,
        cyclom2x sync card driver

John Cagle <jcagle@kernel.org> [19 mar 2002]
        The current maintainer of devices.txt, the list of
        assigned device numbers for LANANA.  Consult the web
        site (www.lanana.org) for instructions on submitting
        requests for new device numbers.  Send all device
        related email to <device@lanana.org>.

Tigran Aivazian <tigran@veritas.com>
        I am author and maintainer of BFS filesystem and IA32
        microcode update driver.

Rogier Wolff <R.E.Wolff@BitWizard.nl> [12 mar 2002]
        I do "specialix serial ports":
        drivers/char/specialix.c (IO8+)
        drivers/char/sx.c        (SX, SI, SIO)
        drivers/char/rio/*.c     (RIO)

Martin Dalecki <martin@dalecki.de> [11 mar 2002]
        IDE subsystem maintainer for 2.5
        (mail Vojtech Pavlik <vojtech@suse.cz> too)

Ed Vance <serial24@macrolink.com> [05 mar 2002]
        Maintainer for the generic serial driver, serial.c,
        for 2.2 and 2.4 kernels.  Please post patches to
        linux-serial@vger.kernel.org for tested bug
        fixes or to add support for a new serial device.
        Limited to time available. If I have not responded
        in a week, yell at serial24@macrolink.com

netfilter/iptables development <netfilter-devel@lists.samba.org> [23 feb 2002]
        Please report all netfilter/iptables related problems
        to this mailinglist, where all netfilter developers are present.
        See also http://www.netfilter.org/contact.html

Hans Reiser <reiser@namesys.com> [16 feb 2002]
        Send me all reiserfs related patches with a cc to
        reiserfs-dev@namesys.com, send bug reports to
        reiserfs-dev@namesys.com, send paid support requests to
        support@namesys.com after going to www.namesys.com/support.html
        to pay, send discussions (not bug reports unless they are
        interesting to most persons) to reiserfs-list@namesys.com.
        If we sit on your patch for a week without responding,
        yell at us, we deserve it.  Look at our web page
        at www.namesys.com for more about sending us code,
        working with us, and our patch submission and tracking system.

Paul Bristow <paul@paulbristow.net> [16 feb 2002]
        I am an ide-floppy driver maintainer
        (ATAPI ZIP, LS-120/240 Superdisk, Clik! drives).

Mike Phillips <phillim2@comcast.net> [15 feb 2002]
        Token ring subsystem and drivers.

Anton Altaparmakov <aia21@cam.ac.uk> [15 feb 2002]
        I am the NTFS guy.

https://bugzilla.redhat.com/bugzilla [14 feb 2002]
        Reports of problems with the Red Hat shipped kernels.

Alan Cox <alan@lxorguk.ukuu.org.uk> [14 feb 2002]
        Linux 2.2 maintainer (maintenance fixes only).
        Collator of patches for unmaintained things in 2.2/2.4.
        Maintainer of the 2.4-ac (2.4 plus stuff being tested) tree.
        I2O, sound, 3c501 maintainer for 2.2/2.4.

Robert Love <rml@tech9.net> [14 feb 2002]
        Preemptible kernel is mine.

ALSA development <alsa-devel@alsa-project.org> [12 feb 2002]
Jaroslav Kysela <perex@perex.cz> [12 feb 2002]
        Advanced Linux Sound Architecture
        ALSA patches are available at
        ftp://ftp.alsa-project.org/pub/kernel-patches/*

Neil Brown <neilb@cse.unsw.edu.au> [08 feb 2002]
        I am interested in any issues with the code in:
        NFS server    (fs/nfsd/*)
        software RAID (drivers/md/{md,raid,linear}*)
        or related include files.

Maksim Krasnyanskiy <maxk@qualcomm.com> [08 feb 2002]
        I'm author and maintainer of the Bluetooth subsystem
        and Universal TUN/TAP device driver.
        These days mostly working on Bluetooth stuff.

Rik van Riel <riel@conectiva.com.br> [07 feb 2002]
        Send me VM related stuff, please CC to linux-mm@kvack.org

Geert Uytterhoeven <geert@linux-m68k.org> [07 feb 2002]
        I work on the frame buffer subsystem, the m68k port (Amiga part),
        and the PPC port (CHRP LongTrail part).
        Unfortunately I barely have spare time to really work on these
        things. My job is not Linux-related (so far :-). I can not
        promise anything about my maintainership performance.

H. Peter Anvin <hpa@zytor.com> [07 feb 2002]
        i386 boot and feature code, i386 boot protocol, autofs3,
        compressed iso9660 (but I'll accept all iso9660-related
        changes.)  kernel.org site manager; please contact me
        for sponsorship-related issues.

kernel.org admins <ftpadmin@kernel.org> [07 feb 2002]
        Kernel.org sysadmins.  Contact us if you notice something breaks,
        or if you want a change make sure you give us at least 1-2 weeks.
        Please note that we got a lot of feature requests, a lot of
        which conflict or simply aren't practical; we don't have time to
        respond to all requests.

Greg KH <greg@kroah.com> [07 feb 2002]
        I am USB and PCI Hotplug maintainer.

Trond Myklebust <trond.myklebust@fys.uio.no> [07 feb 2002]
        I am NFS client maintainer.

James Simmons <jsimmons@transvirtual.com> [07 feb 2002]
        Console and framebuffer sybsustems.
        I also play around with the input layer.

Richard Gooch <rgooch@atnf.csiro.au> [07 feb 2002]
        I maintain devfs. I want people to Cc: me when reporting devfs
        problems, since I don't read all messages on linux-kernel.
        Send devfs related patches to me directly, rather than
        bypassing me and sending to Linus/Marcelo/Alan/Dave etc.

Russell King <rmk@arm.linux.org.uk> [06 feb 2002]
        ARM architecture maintainer.  Please send all ARM patches through
        the patch system at http://www.arm.linux.org.uk/developer/patches/
        New serial drivers maintainer for 2.5.  Submit patches to
        rmk+serial@arm.linux.org.uk

Andrew Morton <akpm@zip.com.au> [05 feb 2002]
        I'm receptive to any reproducible bug anywhere in the 2.4 kernel.
        Specialising in ext2, ext3 and network drivers.
        Not thinking about 2.5.x at this time.

Petr Vandrovec <vandrove@vc.cvut.cz> [05 feb 2002]
        ncpfs filesystem, matrox framebuffer driver, problems related
        to VMware - in all of 2.2.x, 2.4.x and 2.5.x.

Reiserfs developers list <reiserfs-dev@namesys.com> [05 feb 2002]
        Send all reiserfs-related stuff here including but not limited to bug
        reports, fixes, suggestions.

Oleg Drokin <green@linuxhacker.ru> [05 feb 2002]
        SA11x0 USB-ethernet and SA11x0 watchdog are mine.

Vojtech Pavlik <vojtech@ucw.cz> [05 feb 2002]
        Feel free to send me bug reports and patches to input device drivers
        (drivers/input/*, drivers/char/joystick/*)
        I also want to receive bug reports and patches for following
        USB drivers: printer, acm, catc, hid*, usbmouse, usbkbd, wacom.
        All other (not in the list) USB driver changes should go to USB
        maintainer (hopefully there is one listed here :-).
        Also CC me if you are posting VIA IDE driver related message
        (although I am not IDE subsystem maintainer).

======= These entries are suggested by lkml folks ========

Ralf Baechle <ralf@gnu.org> [27 mar 2002]
        I am mips/mips64 maintainer.

David S. Miller <davem@redhat.com> [07 feb 2002]
        I am Sparc64 and networking core maintainer.

======= These ones I made myself ========
======= I am waiting confirmation/correction from these people ========

Urban Widmark <urban@teststation.com> [13 feb 2002]
        smbfs

Jeff Garzik <jgarzik@mandrakesoft.com> [12 feb 2002]
        I am the network-card-drivers guy (8139 for instance).
        CC me and Andrew Morton <akpm@zip.com.au> on network driver patches.

video4linux list <video4linux-list@redhat.com> [12 feb 2002]
Gerd Knorr <kraxel@bytesex.org> [12 feb 2002]
        video4linux

Tim Waugh <twaugh@redhat.com> [08 feb 2002]
        > Who is maintaining the linux iomega stuff?
        For 2.4.x, me (in theory). I don't have time for 2.5.x at the moment.

Alexander Viro <viro@math.psu.edu> [5 feb 2002]
        I am NOT a fs subsystem maintainer. But I won't kill
        you if you send me some generic fs bug reports and (hopefully) patches.

Eric S. Raymond <esr@thyrsus.com> [5 feb 2002]
        Send kernel configuration bug reports and suggestions to me.
        Also I'll be more than happy to accept help enties for kernel config
        options (Configure.help).

G?rard Roudier <groudier@free.fr> [5 feb 2002]
        I am SCSI guy.

Jens Axboe <axboe@suse.de> [5 feb 2002]
        I am block device subsystem maintainer.

Keith Owens <kaos@ocs.com.au> [5 feb 2002]
        ksymoops, kbuild, .. .. .. .. .  are mine.

Linus Torvalds <torvalds@transmeta.com> [5 feb 2002]
        Do not send anything to me unless it is for 2.5, well tested,
        discussed on lkml and is used by significant number of people.
        In general it is a bad idea to send me small fixes and driver
        updates, send them to subsystem maintainers and/or
        "small stuff" integrator (currently Dave Jones <davej@suse.de>,
        see his entry). Sorry, I can't do everything.

Marcelo Tosatti <marcelo@conectiva.com.br> [5 feb 2002]
        Do not send anything to me unless it is for 2.4 and well tested.
        If you are sending me small fixes and driver updates, send
        a copy to subsystem maintainers and/or "small stuff" integrators:
        - Alan Cox <alan@lxorguk.ukuu.org.uk>,
        - Rusty Russell <trivial@rustcorp.com.au>.

Rusty Russell <rusty@rustcorp.com.au> [5 feb 2002]
        > Here are some cleanups of whitespace in .....
        Want me to add this to the trivial patch collection for tracking?
        If so just send (or cc:) it to trivial@rustcorp.com.au.

16. New Block Allocator For ReiserFS In 2.4

9 Aug 2002 - 12 Aug 2002 (11 posts) Archive Link: "[BK] [PATCH] reiserfs changeset 7 of 7 to include into 2.4 tree"

Topics: FS: ReiserFS

People: Hans ReiserAndrew Morton

Hans Reiser posted a patch to implement a new block allocator for ReiserFS in the 2.4 tree. A number of folks were concerned that this should go into 2.5 first, since it wasn't technically a bug fix. Hans replied, "I understand why all of you are doubtful about it going into 2.4, it is not that you are crazy, but my closeness to the code makes me think it is stable enough that it should go in. Also remember that we have an extensive test suite, we have been benchmarking variations on this code for months, and I frankly don't think that 2.5 insertion will get it enough testing to be instructive to us." Andrew Morton asked, "Block allocation algorithms are really, really important. I'd be very interested in a description of what this change does, what problems it is solving, how it solves them, observed results, testing methodology, etc. Is such a thing available?" Hans replied:

The block allocator code is one of the key remaining pieces we would have fixed before 2.4 shipped if we had had time. The block allocator code that shipped is simply ugly.

The block allocation algorithms in ReiserFS were once extremely simple. They would attempt to allocate a block near to its left neighbor in the tree ordering, searching for a free block starting from the block number of that neighbor, and doing the search in increasing block number order. (Increasing not decreasing block number order was significant to performance we found.)

The problem with this algorithm occurred when there were no free blocks anywhere near that neighbor. It would perform a linear scan of the bitmaps, and this scan might consume quite a lot of CPU as it checked each bit. Additionally, if you cannot get a free block near the neighbor, then proximity to the neighbor is actually a bad thing to achieve, because it means proximity to a full part of the disk.

So long ago I suggested that we try attaching a count to the bitmap, and not bother to scan its bits if that count was not zero. This new code does that.

Additionally, Jeff Mahoney wrote code to pick a random bitmap to go to if the current bitmap was full, and to try to make it a bitmap that is less than 90% full. This new code does that by default. (Oleg rewrote Jeff's code, and I have lost track of what aspects of it are Jeff's vs. Oleg's.)

However, we also tried a whole bunch of other things, and it looks like Jeff's/Oleg's code makes those other things not so valuable because those other things were achieving value by doing what Jeff's/Oleg's code does but less thoroughly or even as an unintended side effect.

In 2.4 we have code that takes all of the formatted nodes, and tries to put them into the first 10% of the disk. This makes me uncomfortable, because the 10% number is inflexible. Maybe I am wrong to dislike this. More experiments are needed, though I may wait for V4.1 to do them. It also does things with displacing things according to a hash function, which was a broken hash function at one time (you could tell that it didn't work the way that the programmer intended, and that it put things near to the start of the disk by accident due to directory ids tending to be numerically smaller than the device size in block numbers). I can't remember if that hash function has been fixed in the 2.4.19 code tree or if it is only fixed in our new patch.

We experimented with dispersing directories randomly across this disk.

We experimented with randomly displacing files large enough to have unformatted nodes (option in new allocator allows you to displace files larger than some arbitrary size).

In the end I decided that the improved bitmap scanning code plus the avoidance of 90% full bitmaps when nothing near is free plus starting from the left neighbor was close to as good as any other combination, and had the advantage of being simpler, so I made it the default, because I trust more in the robustness of simpler algorithms that I understand more fully.

The default code path is either far simpler than the current code, or clearly superior, depending on what part of it one considers. I do not claim that I have found the right answer, but I have probably found the best that I will invent for V3.

Almost everything that we at Namesys are going to change in V3 is written and going into the next several 2.4.20-pre* releases. The only thing that I know of that remains and is unwritten is to perhaps revert to the tail conversion policy used in Linux 2.2.* (the current code is very inefficient in its tail handling, and one of us thinks it might speed up if we go back to the old way, and I'd be interested to see a benchmark of it.) We would probably like to also junk the 4k at a time read code, but most likely that will be done in V4 (Linux 2.5/2.6) not V3.

V3 will probably change very little after 2.4.20, and that is what our users need in the period while V4 stabilizes --- they need something that always just works albeit not as fast as V4.

17. Multimedia Card Support In 2.4

9 Aug 2002 - 10 Aug 2002 (2 posts) Archive Link: "new driver: multimedia card (mmc) framework, patch against 2.4.19"

People: James HicksGarst R. Reese

James Hicks posted a patch and announced:

This patch implements the core framework for Multimedia Cards (MMC). It has been tested with iPAQ H3800 handhelds with MMC storage cards.

At the moment, access to the information required to write a driver for SD or SDIO requires signing and NDA that precludes the release of an open source driver, so only MMC is supported at this time.

Garst R. Reese remarked, "I am currently using an MMC card via usb with a SanDisk SDDR-33 reader. I can use either MMC cards or SD cards using the sddr09 driver in usb/storage. All I had to do was mount -t msdos /dev/sda1 /mnt. But your post sent me scurrying through that code and I looked at unusual_devs.h and noticed that sddr-33 was not mentioned, but sddr-31, -09 etc were. I'd love to have an MMC slot in place of my floppy drive." There was no reply.

18. USB Update To New Driver Model

9 Aug 2002 - 12 Aug 2002 (9 posts) Archive Link: "[RFC] USB driver conversion to use "struct device_driver""

Topics: FS: driverfs, USB, Version Control

People: Greg KH

Greg KH posted a patch and announced:

Here's a patch against 2.5.30 that is the start of converting the USB code over to using the new core "struct device_driver" logic and functionality. Right now only the HUB and HID drivers are converted, and so they are the only ones that will work properly (the others will compile, but no devices are ever bound to them.)

The code is still quite rough in places, but the baic functions seem to work well (driver callbacks, etc.) There is some odd USB specific tweaks that were needed to be done in order to get this working properly.

The USB subsystem only binds drivers to USB "interfaces". A USB device may have many "interfaces", so a single device may have many drivers attached to it, handling different portions of it (think of a USB speaker, which has a audio driver for the audio stream, and a HID driver for the speaker buttons.) Because of this I had to create a "empty" device driver that I attach to the USB device structure. This ensures it shows up properly in the driverfs tree, and that no USB drivers try to bind to it.

Also, the driverfs representation of the USB devices has changed, possibly for the worse. Just try the patch to see what I mean :)

There is a known bug that happens in put_device() when a USB device is removed from the system, but the proper person already knows about it.

Comments?

If I don't hear any objections, I'll work on converting all of the USB drivers over to this model (the probe and disconnect function parameters have changed) and send the patch to Linus.

Many thanks to Pat Mochel for the original version of this patch (way back against 2.5.15) and for putting up with all of the USB device / interface madness.

Elsewhere, under the Subject: [BK PATCH] USB changes for 2.5.31 () , he posted a new update, saying:

This patch against 2.5.31 fixes some problems in the konicawc driver

19. IDE Comparison Between 2.4 And 2.5

10 Aug 2002 (2 posts) Archive Link: "what's the difference between 2.4-IDE and 2.5-IDE in ATA ?"

Topics: Disks: IDE

People: Ed TomlinsonDave Jones

Stephane Wirtel asked what the difference was between the 2.4 and 2.5 IDE code. Ed Tomlinson replied:

2.4 IDE was (re)written on top of the old ide code by Andre Hendrik. Seems that Andre, who it seems does really understand IDE and its quirks, can be a pain to work with... In early 2.5.x Linus started taking patches from Marcin (or Martin) Dalecki. Since then, with over a 115 patches, Marcin has proceeded to rewrite the IDE code. This has tended to make IDE a bit unstable in 2.5. About 2.5.26ish Jens Axobe got tired of the 2.5 ide problems and ported 2.4 ide to 2.5. This change was picked up in the Dave Jones series of kernels - it allows other pieces of the kernel to be tested while Marcin works towards a cleaner IDE.

IDE 2.5 is stable for some but not for others (like me) depending on it is iffy - have backups. If you do not want to test 2.5 IDE use -dj (Dave is a bit behind Linus now due to other work and a much needed vacation)

20. Alpha Updates For 2.5

10 Aug 2002 (8 posts) Archive Link: "[patch 2.5.30] alpha: pte/pfn/page/tlb macros update [1/10]"

Topics: Assembly, Forward Port, Real-Time, SMP

People: Ivan Kokshaysky

Ivan Kokshaysky posted a large set of Alpha updates that had been building up since around the 2.5.18 time frame. He listed the changes (grouped here from multiple emails):

21. Linux 2.5.31

10 Aug 2002 - 13 Aug 2002 (12 posts) Archive Link: "Linux 2.5.31"

Topics: FS: JFS, FS: NTFS, FS: driverfs, Kernel Release Announcement, Networking, USB

People: Linus Torvalds

Linus Torvalds announced 2.5.31:

Hmm. I've switched home machines this week, and people have been reasonably busy, so there is most likely a lot of dropped stuff.

There's a lot of merged stuff too, including various architectures getting up to speed with all the big changes (tlb and irq etc, and rmk is apparently trying to shrink his arm patch). Sparc64, alpha, ppc32, ARM..

The series of patches from Al Viro should fix a semaphore deadlock when partition reading was triggered while a new device was opened, and lay the groundwork for more disk description cleanups.

NTFS, JFS, driverfs and networking updates. USB, ISDN and network driver updates, partial merge with akpm (you've seen the discussions about some of the stuff dropped) etc. And let's see how much fallout there is from the 30-bit pids etc.

Here is the ChangeLog (http://www.kernel.org/pub/linux/kernel/v2.5/ChangeLog-2.5.31) .

22. uClinux Update Against 2.5.31

11 Aug 2002 (1 post) Archive Link: "[PATCH]: linux-2.5.31uc0 MMU-less patches"

People: Greg Ungerer

Greg Ungerer announced:

I have put the latest uClinux (MMU-less) patches at:

http://www.uclinux.org/pub/uClinux/uClinux-2.5.x/linux-2.5.31uc0.patch.gz

Nothing much new, just updated against 2.5.31.

23. VM Regress: Benchmarking The VM Subsystem

11 Aug 2002 - 12 Aug 2002 (9 posts) Archive Link: "[ANNOUNCE] VM Regress - A VM regression and test tool"

Topics: User-Mode Linux, Virtual Memory

People: Mel GormanBernd EckenfelsDaniel PhillipsRik van Riel

Mel Gorman announced:

Project page: http://www.csn.ul.ie/~mel/projects/vmregress/
Download: http://www.csn.ul.ie/~mel/projects/vmregress/vmregress-0.4.tar.gz

This is the first public release of VM Regress v0.4 (BumbleBee). It is the beginnings of a regression, benchmarking and test tool for the Linux VM. The web page has an introduction and the project itself has quiet comprehensive documentation and commentary so I am not going to go into heavy detail here.

There appears to be frequent trouble reliably testing the VM and comparing the impact (beneficial or otherwise) of VM features. As best as I can tell, there is heavy reliance on stress testing or intuitive decisions made by individual kernel developers to prove a VM is working or that is is better than another implementation. This tool eventually will be able to provide empirical data on VM performance as well as acting as a regression tool to make sure changes don't break anything.

It works by using kernel modules to get a definite view of what the kernel is at and to provide reliable, reproducible tests. Modules are divided up into 4 catagories. Core modules provide infrastructure for the tool. Sense modules tell what is going on in the VM. Test tests particular features and bench modules (none yet) will benchmark different sections of the VM.

The aim is to eventually eliminate guesswork in development. The tool will be able to tell for definite if a feature works. If it does work, it will be able to tell how well or poorly the feature performed. This will hopefully replace ad-hoc shell script tests and provide concrete performance data any developer can reliably produce and use as proof of "Feature X is better"

The interface to the tests are via proc at /proc/vmregress. Help is provided for most of them by cat'ing the entries after module load. The README and manual are very comprehensive and each c file has a detailed description at the top so I'm not going to go into heavy detail in this mail. The README includes a sample set of tests to illustrate how the tool can be used to provide useful information about performance.

This was developed against 2.4.18 and 2.4.19 but will compile with 2.5.30 and takes into account the existence of rmap (will compile and work with or without rmap). Bear in mind the tool is far for complete and I'm just looking for feedback on the viability and usefulness (or the lack thereof) of this tool. Consequently, it doesn't do much. Currently it

This has been tested heavily with UML 2.4.18 and with a dual PII350 running 2.4.19 . It is known to compile with 2.5.30 but I haven't done any 2.5 testing yet due to the lack of a crash box. It will work with or without rmap as the tool was written with it (as well as every other VM feature) in mind.

Bernd Eckenfels replied:

This sounds more like a micro benchmark tool, which is a good start, but the real problem with VM optimizations is, that they have to take into account real world load and especially user experience.

A simple example is the fact, that an idle desktop box will feel very sluggy if a user comes back after a few hours break, because all visible programs are paged out. To improve this, one could think about adding a flag to applications like "connected to gui". This feature would need a test then, which is no usual micro benchmark.

So I think it is a good idea to avoid to introduce slow operations in hot code path, but it does not help much the developers in the problem of simulating workload and measuring the interactive and real throughput.

But perhaps you can take this into account?

Mel agreed that VM Regress was indeed a micro benchmark tool, and was intended to be so. He explained that VM Regress was intended to benchmark individual parts of the VM subsystem, not the computer as a whole. He went on:

I am more interested in answering questions like

For example, in time, it'll be able to tell exactly how well rmap is performing and compare it to a VM without rmap in terms of "how long it took to find a page to replace" and "what did the address space look like after kswapd worked". I should be able to show that rmap kept the correct pages in memory for instance where as an overall benchmarking tool is going to tell me nothing new. Used in combination with a profiling tool like oprofile, I should be able to get very specific performance data that I suspect will be useful to developers and to a much lesser extent, users.

I am making a persumption that if it can be shown that each individual component is working and performs well, then overall performance should improve.

There was no reply to this, but close by, Daniel Phillips also replied to Bernd:

We get too hung up on 'real world' world loads, that is not a productive way VM developers to spend their time. Developers need to use tests that focus on very specific aspects of VM performance. Yes, this testing should be backed up by 'real world' tests to confirm what the VM developer thinks, that improved performance on a subsystem translates into improved overall performance, and to keep a watch out for unexpected or undesirable interactions. That's called a 'reality tests'.

If you want to help with 'interactive performance', i.e., user experience, then *quantify what contributes to that* and write a micro-measurement tool that measures such things. E.g, latency of response to keyboard events under load. It's not rocket science, it just takes time and effort to set this kind of thing up so it's accurate and predictive.

It's an incredible waste of developer's time to be running 'reality tests' all the time, and never using more precise measurement methods. Anyone who wants to run reality tests and post the results is more than welcome to, and this is valuable. It's not valuable to throw mud at a testing/measurement tool because you think it's not 'realistic'.

Rik van Riel replied:

The thing is that developers need some benchmarking thing they can script to run overnight. Watching vmstat for hours on end is not a useful way of spending development time.

On the other hand, if somebody could code up some scriptable benchmarks that approximate real workloads better than the current benchmarks do, I'd certainly appreciate it.

For web serving, for example, I wouldn't mind a benchmark that:

  1. simulates a number of users, that:
    1. load a page with 10 to 20 associated images
    2. sleep for a random time between 3 and 60 seconds, "reading the page"
    3. follow a link and grab another page with N images
  2. varies the number of users from 1 to N
  3. measures
    1. the server's response time until it starts answering the request
    2. the time it takes to download each full page

Then we can plot both kinds of response time against the number of users and we have an idea of the web serving performance of a particular system ... without focussing on, or even measuring, the unrealistic "servers N pages per minute" number.

Mel reiterated that this was not the purpose of VM Regress, and added:

In VM Regress land, I would be much more likely to provide a benchmark that did something like the folllowing. (Remember that VM Regress aims to provide more than been a pure benchmarking tool. Benchmarking is just one aspect)

  1. Memory map with MAP_SHARED a number of regions
    1. Each region is 512 pages large (2MB on a x86)
    2. Create a number of regions until a percentage of memoryt is used that would hit the various watermarks of the zones
  2. Over 1 hour do, reference regions with a gaussion pattern to simulate popular pages and images
  3. At the end, give the best, worst and average time to read a region. Print out what regions are still resident in memory and compare that to the references. Regions referenced often should still be in memory and dead regions should be in swap
  4. Repeat the test altering the following parameters

With a low percentage of physical memory used, there shouldn't be anything too interesting happening because cache should be doing most of the work. With more regions, it should be noted how the VM holds up, how well it selects regions to swap out and how long it takes to find the proper pages and so on.

This type of benchmark is far away but I already do most of this work with the fault.o module. I memory map a region of which the size is related to the amount of physical memory (more accuratly it's related to the zone watermarks for the zone known to be affected by the test) and touch every page in the region. For n passes, I check if each page is present, if it's swapped out, I touch it to swap it back in. I then print out how many pages were swapped in, how many pages are physically present in the region and how long that pass took in milliseconds.

That is most of the work there so this isn't quiet vapourware, more a really dense fog. I just need a few more bits and pieces such as printing graphs of present pages vs references and meaningful data is easily accessible

24. Submitting Patches To trivial@rustcorp.com.au

11 Aug 2002 - 13 Aug 2002 (3 posts) Archive Link: "Trivial Patch Policy (trivial@rustcorp.com.au)"

People: Rusty RussellDave Jones

Rusty Russell said:

I've been collecting trivial patches for a few months now, and it's time to solidify some rules:

  1. Trivial patches must qualify for one of the following to be accepted:

    1. Spelling fixes (useful for grep, and sets a good example)
    2. Warning fixes (cluttering with useless warnings is bad)
    3. Compilation fixes (only if they are actually correct)
    4. Runtime fixes (only if they actually fix things)
    5. Removing use of deprecated functions/macros (eg. check_region).
    6. Contact detail and documentation fixes
    7. Non-portable code replaced by portable code (even in arch-specific, since people copy, as long as it's trivial)
    8. Any fix by the author/maintainer of the file. (ie. patch monkey in re-transmission mode)

    They must also be "trivial" by my definition of trivial. Best patches contain enough context for me to judge without opening the file (diff -C<nn> -u is your friend).

    NOTE: This means I'll only take whitespace/indentation fixes from the author or maintainer.

  2. The patch will not be forwarded to anyone until a new kernel has been released after I receive the patch, *unless* noone else is sent the patch. So if you cc: the trivial patch monkey, it'll only be forwarded from there if it doesn't make the next kernel.
  3. The first time the patch is forwarded, it will be sent to the author and/or maintainer. If they say they've included it in their tree, no more forwards will occur (modulo some timeout eventually). If they NAK it, the patch will be closed. Otherwise, the patch will be sent directly to Linus or Marcelo on future forwards (the maintainer will still be cc'd).

Hopefully this will be a good compromise between coordinating with maintainers who want control of their files, and stopping trivial patches from slipping through the cracks.

Dave Jones had a few questions. He asked about point 2:

What happens in this case..

person a sends the monkey a patch.
person b replies to l-k (cc'ing monkey) with a "no do it this way" ?

do you have a hand-operated means to say "this patch supercedes the previous" ?

Rusty replied, "Yes, I close the old patch, and add the new one. Low-tech, I know 8). The original person will get a one-liner on why the patch was closed (like, "obsoleted by new patch")."

In his same post, Dave also asked about point 3:

What would be *really* good, for the case where retransmits are necessary, if Alan hasn't picked it up for 2.4 (or me for 2.5), you could add us to the relevant Cc's, (and remove after Alan/Myself takes it).

This could however get tricky, as the same patch may need a bit of hand-merging to fit against -ac/-dj.

Maybe just simpler to remove us when Alan/I send an ACK ?

Rusty thought it wasn't worth the effort, and pointed out that it also added a "multiple paths to Linus" problem. As far as patches requiring massage in order to fit into -ac or -dj, Rusty said, "That's something I've simply refused to get into: patches either apply or they don't. With about 40 patches a week and other responsibilities, I rely on the author seeing that something broke and retransmitting."

25. JFS 1.0.21 Available

12 Aug 2002 (1 post) Archive Link: "[ANNOUNCE] Journaled File System (JFS) release 1.0.21"

Topics: Disk Arrays: EVMS, FS: JFS

People: Steve BestChristoph Hellwig

Steve Best announced:

Release 1.0.21 of JFS was made available today.

Drop 59 on August 12, 2002 (jfs-2.4-1.0.21.tar.gz and jfsutils-1.0.21.tar.gz) includes fixes to the file system and utilities.

The new feature in this release is the capability to resize the file system.

Utilities changes

File System changes

For more details about JFS, please see the patch instructions or changelog.jfs files.

26. Linux 2.4.20-pre2

12 Aug 2002 - 13 Aug 2002 (5 posts) Archive Link: "Linux 2.4.20-pre2"

People: Bhavesh P. DavdaMel GormanMarcelo Tosatti

Marcelo Tosatti announced 2.4.20-pre2, and Mel Gorman sent him a patch containing documentation and comments (http://www.csn.ul.ie/~mel/projects/vm/) . Elsewhere, Bhavesh P. Davda said:

For the Nth time, here again is the patch for fixing the scheduler for correct SCHED_FIFO and SCHED_RR behaviour.

I waited to see it in 2.4.20-pre1, not there, 2.4.20-pre2, not there...

Please apply it in 2.4.20-pre3

27. qla2xxx Driver Update

12 Aug 2002 (1 post) Archive Link: "[ANNOUNCE] QLogic FC Driver for Linux 6.01b4 Released."

Topics: Disks: SCSI, Forward Port, Networking

People: Andrew VasquezArjan van de Ven

Andrew Vasquez said:

QLogic is pleased to announce the 6.01b4 release of its qla2xxx driver for ISP2100/ISP22xx/ISP23xx chips and HBAs.

Major improvements from the 5.3x driver series include:

The 6.xx series driver supports kernels 2.4.x and above ONLY, and contains most of Arjan van de Ven's (Redhat) changes made to the 5.31 driver.

A side note regarding ISP2100 support: QLogic has formally retired the QLA2100 card and will *not* provide technical support for these chips and HBAs. The initial 6.xx series drivers were stripped of ISP2100 card recognition. The current ISP2100 support was forward-ported from 5.3x. From an engineering standpoint, any patches or fixes for the ISP2100 will be considered.

The driver distribution can be downloaded at:

http://download.qlogic.com/drivers/5537/qla2x00-v6.1b4-dist.tgz

The distribution contains three main components:

qla2x00src-vX.YY.tgz -- The FC/SCSI driver
qla2xipsrc-vM.NN.tgz -- The IP network driver
qlapi-P.QQ-rel.tgz -- The SNIA API library

Changes since 6.01b2 distribution:

QLA2X00:

QLA2XIP:

28. XFS Split Patches For 2.4

12 Aug 2002 (1 post) Archive Link: "Announce: XFS split patches for 2.4.19 - respin"

Topics: Access Control Lists, FS: XFS, Kernel Build System

People: Keith Owens

Keith Owens announced:

ftp://oss.sgi.com/projects/xfs/download/patches/2.4.19.

The xfs patches for 2.4.19 have been respun as of 2002-08-13 01:22 UTC. This includes kdb v2.3 2.4.19 common-2, i386-3 plus some recent quota and acl fixes.

For some time the XFS group have been producing split patches for XFS, separating the core XFS changes from additional patches such as kdb, xattr, acl, dmapi, kbuild 2.5. These patches were initially intended for internal use and for feeding to Linus but we got no response at all. The split patches are now being released to the world with the hope that developers and distributors will find them useful.

Read the README in each directory very carefully, the split patch format has changed over a few kernel releases. Any questions that are covered by the README will be ignored. There is even a 2.4.20/README for the terminally impatient :).

29. Status Of khttpd And Tux In 2.4

13 Aug 2002 (4 posts) Archive Link: "TUX in 2.4.20?"

Topics: Web Servers

People: Alan CoxOliver NeukumRoy Sigurd Karlsbakk

Roy Sigurd Karlsbakk suggested that 2.4.20 would be a good target for removing khttpd and replacing it with Tux. Alan Cox replied, "Tux is invasive. It isnt a clean simple patching job." And Oliver Neukum added, "2.4 is a stable series. Removing something needs a very good reason, which just isn't there."

30. User Mode Linux Update For 2.5.31

13 Aug 2002 (1 post) Archive Link: "UML 2.5.31"

Topics: User-Mode Linux

People: Jeff Dike

Jeff Dike announced:

UML has been updated to 2.5.31 and UML 2.4.18-52.

The changes since 2.5.30 have been mostly bug fixes and cleanups.

I fixed a "I'm tracing myself and can't get out" race.

The kernel entry and exit code was cleaned up and reduced from three copies to one.

telnetd is now killed when UML shuts down, so telnet connections to UML consoles die properly.

Fixed a crash caused by a non-GFP_ATOMIC allocation in an interrupt.

UML now exits when 'debug' is asked for and CONFIG_PT_PROXY is disabled.

Fixed some bugs in the ubd device plugging code.

Fixed ethertap by making CLOEXEC optional in os_pipe.

Made UML build on 2.2 by defining the SHUT_* macros if no header file does.

The patch is available at
http://uml-pub.ists.dartmouth.edu/uml/uml-patch-2.5.31-1.bz2

For the other UML mirrors and other downloads, see
http://user-mode-linux.sourceforge.net/dl-sf.html

Other links of interest:

The UML project home page : http://user-mode-linux.sourceforge.net
The UML Community site : http://usermodelinux.org

31. Linux Test Project Update

13 Aug 2002 (1 post) Archive Link: "Release announcement"

Topics: Ioctls

People: Paul LarsonAndreas Jaeger

Paul Larson announced:

The Linux Test Project test suite LTP-20020813 has been released. Visit our website ( http://ltp.sourceforge.net ) to download the latest version of the test suite.

There are some important fixes in this release, so please consider upgrading.

LTP-20020813

* Fixes

32. NFSv4 Under 2.5

13 Aug 2002 (1 post) Archive Link: "announcing NFSv4 patches against 2.5.31"

Topics: Access Control Lists, Extended Attributes, FS: NFS

People: Kendrick M. Smith

Kendrick M. Smith announced:

This is an announcement of a set of 38 patches against Linux 2.5.31, which implement basic NFSv4 functionality. The patches were developed as part of the NFSv4 project at the University of Michigan. We are aiming to work with other kernel developers in the next few weeks, to get the patches included in 2.5.x. In addition, we could always use more testing, so if anyone wants to apply the patches and bang on them, that would be extremely helpful as well.

For now, only the bare minimum of features have been implemented -- just enough to create a functional network file system. Byte-range file locking, state management, recovery from server reboot, extended attributes, ACL's, the NFSv4 "pseudo filesystem", and strong security are still unimplemented in 2.5.x. Actually, we have implementations of most of these features in 2.4.x, but are waiting to port them to 2.5.x until the "minimal" patches have been approved.

(For those familiar with the NFSv4 protocol, I should elaborate by saying that the client does all of its READs and WRITEs using "magic stateids" and uses OPEN and CLOSE only when it needs to create a file. The server treats all state-manipulating operations such as SETCLIENTID,OPEN,CLOSE... as no-ops and does not remember state. Instead of a proper implementation of the pseudo filesystem, the first entry in /etc/exports is chosen as the nominal root.)

Here are directions for running NFSv4. Note that a few userland utilities are necessary (URL's given as needed):

  1. Apply the NFSv4 patches to a clean 2.5.31 kernel, either one at a time from the messages following this one, or all at once by downloading the "grand unified patch" from www.citi.umich.edu/projects/nfsv4/patches-2.5/patch-2.5.31-E (http://www.citi.umich.edu/projects/nfsv4/patches-2.5/patch-2.5.31-E) . There will be seperate kernel config options for NFSv4 support in the client and server.
  2. Download and install the GSSD daemon from www.citi.umich.edu/projects/nfsv4/simple-gssd/ (http://www.citi.umich.edu/projects/nfsv4/simple-gssd/) . This daemon must be running on both the client and server.
  3. Warning!! Special configuration on the server is necessary before an NFSv4 export will be accessible.

    First, it must be exported specifically to the client in question, without use of string wildcards, e.g.

    /some/export farringdon.citi.umich.edu(rw)

    in /etc/exports is OK, but

    /some/export *.citi.umich.edu(rw)

    is not.

    Second, only the _first_ export (by order in /etc/exports) will be visible to a given client.

    Finally, this export will appear to an NFSv4 client as '/index.html' instead of the actual export pathname (/some/export in the example above).

    All of this special configuration is only necessary because this set of patches does not include a proper implementation of the NFSv4 pseudo filesystem. As soon as these patches patches (or a variant thereof) is incorporated into the kernel, our next goal is to develop a satisfactory pseudofs for 2.5.x, at which point the need for this special configuration will go away.

  4. On the client, you will need a patched version of the mount and umount utilities. Download util-linux-2.11t from kernel.org, apply the patch

    www.citi.umich.edu/projects/nfsv4/patches-2.5/patch-util-linux-2.11t-A (http://www.citi.umich.edu/projects/nfsv4/patches-2.5/patch-util-linux-2.11t-A)

    and build mount and umount as usual. If you are nervous about tampering with the built-in mount and umount on your system, we recommend creating symlinks, e.g.

    # ln -s /usr/src/util-linux-2.11t-nfsv4/mount/mount /usr/local/bin/mount4
    # ln -s /usr/src/util-linux-2.11t-nfsv4/mount/umount /usr/local/bin/umount4

    You should then be able to start the server as usual, and mount with

    # mount4 -t nfs -o vers=4,... somehost.somedomain.com:/ /mnt/nfs

    If this doesn't work, it is probably because GSSD is not running on either the client or the server, or because you're not mounting '/index.html' (see 3. above).

These directions, as well as copies of the patches, are also available at www.citi.umich.edu/projects/nfsv4/patches-2.5 (http://www.citi.umich.edu/projects/nfsv4/patches-2.5) . This URL will always be kept up-to-date with the latest versions of the patches.

 

 

 

 

 

 

Sharon And Joy
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.