Kernel Traffic #218 For 7�Jun�2003

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 867 posts in 4150K.

There were 265 different contributors. 132 posted more than once. 152 posted last week too.

The top posters of the week were:

1. In-Core AFS Multiplexer And PAG Support

13�May�2003�-�18�May�2003 (78 posts) Archive Link: "[PATCH] in-core AFS multiplexor and PAG support"

Topics: FS: NFS, Samba

People: David Howells,�Linus Torvalds,�Christoph Hellwig

David Howells said:

Here's a patch to add three things that are required for AFS and that may be of use to other stuff (such as NFSv4 and Samba):

  1. PAG (Process Authentication Group) support. A PAG is ID'd by a unique number, and is represented in memory as a structure that has a ring of associated authentication tokens.

    Each process can either be part of a PAG, or it can PAG-less - in which case it has no authentication tokens.

    Two new syscalls are added: setpag and getpag.

  2. Authentication token support. An authentication token is a blob of binary data that is keyed by filesystem name and fs-specific key (such as AFS cell name or SMB workgroup).

    These are retained in two places: each PAG has a ring of tokens appropriate to the group of processes within that PAG; and each struct file has a pointer to the single token governing that file (if there is one).

  3. AFS multiplexor support. Not complete at the moment, but implemented far enough to provide access to the PAG mechanism. Further patches will be forthcoming to make this fully functional.

It is my intention to add Trond's vfs_cred stuff in at some point, but that has a lot greater impact than this patch (which has negligible impact).

Christoph Hellwig said this would be too significant a change to make it into the 2.6 tree; but David said he thought it could go in. Elsewhere, Linus Torvalds said the patch was ugly; and offered some technical advice; then a bunch of folks discussed the implementation details for awhile.

Elsewhere, under the Subject: [PATCH] PAG support, try #2 (http://www.uwsg.indiana.edu/hypermail/linux/kernel/0305.1/1912.html) , David posted an updated patch incorporating folks' comments. Linus replied:

I still really don't like this, and think it needs to be thought through a _lot_ more. I also think this is _way_ waaaay too late to get into 2.6.x anyway.

Anyway, the thing I think is just fundamentally broken about this is

I suspect both of these problems could be fixed by another level of indirection: a "user credential" is really a "list of PAG's", with the PAG being a "list of keys". Joining a PAG _adds_ that PAG to the user credentials, instead of replacing the old credentials with the new one.

And "pag_t" needs to be bigger, at least 64 bits. That, together with the "credential == 'list of PAG'" thing means that you can choose to do things like:

Anyway, I htink the current patch is totally unusable for any reasonable MIS setup (ie you couldn't make it useful as a PAM addition even if you tried), and is totally special-cased for one (not very interesting, to me) use.

And I think this will be a 2.7.x issue, if only because you guys will need to convince me that I'm wrong.

The technical discussion continued, but no one pushed for inclusion in 2.6.

2. New submount Removable Media Handler

15�May�2003�-�19�May�2003 (10 posts) Archive Link: "[ANNOUNCE] submount: another removeable media handler"

Topics: FS: autofs, FS: ext2, Ioctls

People: Eugene Weiss,�H. Peter Anvin,�Alex Riesen

Eugene Weiss announced:

Submount: Yet another attempt at solving the removeable media problem.

It has been tested only on 2.5.66 and 2.5.69 so far, but should work on many earlier 2.5.x kernels as well. I would greatly appreciate feedback from anyone who would like to check it out. It is available at http://sourceforge.net/projects/submount/

How it Works:

It is composed of two parts: a kernel module and a userspace program.

The kernel module, titled subfs, implements a dummy filesystem which is mounted on the desired mountpoint. Before a process can access a directory, or any file bellow it, one of two filesystem methods must be called: open() or lookup(). When subfs gets a call to either of these functions, it calls the userspace part of submount, which then mounts the appropriate filesystem on top of the subfs mountpoint, forks off a daemon for unmounting, and exits. If the mount was successful, subfs uses the signal handling system to restart the system call, which then is executed on the real filesystem. Subfs then restarts the system calls of any other requests that arrived while the mount was taking place.

The userspace portion of submount is titled /sbin/submountd. It is a small program that does some minimal options processing, and then makes the mount() system call. If the mount is successful, it forks off a new process which enters a one second loop checking whether the filesystem can be unmounted.

Advantages:

Small, light, and fast. The kernel module is about 11kB, the user program about 21kB.

Requires no changes to the kernel code outside its own module.

The kernel portion is very simple. The feature set is implemented in userspace.

All IO is handled through the real filesystem at its full speed. When the IO is heaviest, submount imposes no performance penalty at all.

Flexible. Another program can be substituted for submountd if the system in question has particular needs. One could even use a shell script that calls the regular mount and umount utilities.

No configuration needed, except fstab.

Problems:

Not quite as fast as a permanently mounted filesystem, since the dentry cache is purged on unmounting. Directories must be read again each time they are called after unmounting even though the disk hasn't changed.

Errors are registered quietly. If the user makes a typo in the mount command, or in the fstab file, it may be necessary to read the system log to discover it. (Perhaps mount could be made to do some syntax checking when a subfs filesystem is mounted?)

Programs which automatically mount a cdrom directory from fstab can mount a second subfs directory over the filesystem mounted by the first. This could be checked for in subfs, but it would be better to do it in the mount utility.

Installation and usage:

The sources, both kernel and userspace, can be downloaded from http://sourceforge.net/projects/submount/. The userspace program is built in the usual way, and a makefile is provided for building the kernel module.

To mount a drive under subfs, use the usual syntax, except put subfs in the filesystem type field, and add the option fs=<fstype> in the options list.

for example

mount -t subfs /dev/scd0 /mnt/cdrom -o fs=iso9660,ro

or for fstab

/dev/scd0 /mnt/cdrom subfs fs=iso9660,ro

I've copied the function to find the filesystem type by reading the superblock from mount, so fs=auto will work. It can, however, cause a noticeable pause, particularly on floppies, so there is another method for using multiple filesystems. If a keyword is used in the fs= option, submountd will attempt to mount filesystems from a list. Currently there are two options: fs=floppyfss attempts vfat and ext2, and fs=cdfss tries iso9660 and udf. Submountd will strip the options "codepage", "iocharset" and "umask" from filesystems that don't take them, so these can be included in list mounts, or auto-detected mounts.

These fstab lines should work:

/dev/scd0 /mnt/cdrom subfs fs=cdfss,ro,iocharset=iso8859-1,umask=0 0 0
/dev/fd0 /mnt/floppy subfs fs=floppyfss,iocharset=iso8859-1,sync,umask=0 0 0

Once this is done, just access the mountpoint directory as usual.

Alex Riesen asked how this was different from the automounter (AutoFS) project, and Eugene replied, "Autofs works by creating a special filesystem above the vfs layer, and passing requests and data back and forth. Submount actually does much less than this- it puts a special filesystem underneath the real one, and the only things it returns to the VFS layer are error messages. It handles no IO operations whatsoever. Peter Anvin has called using the automounter for removeable media "abuse." Submount is designed for it." H. Peter Anvin replied:

Sure, but it's not clear to me that you have listened to me saying *why* it is abuse.

Basically, in my opinion removable media should be handled by insert and removal detection, not by access detection. Obviously, there are some sticky issues with that in the case where media can be removed without notice (like PC floppies or other manual-eject devices), but overall I think that is the correct approach.

Eugene explained:

I managed to read several of your warnings about using autofs for media without coming across an explanation of why. I just assumed that as maintainer, you had good reasons to do so. I more-or-less agree with you about the desirability of insert and removal detection. I'm not sure if it could ever be made to work for floppies, but there is no reason why one solution should fit all cases. If there were common ioctls which could check the insertion and removal status of the various drives, I might have taken that approach.

I wanted to get the same functionality as supermount without the instability, and as far as I can tell, I have succeeded. It's not ideal, but it works for me, and hopefully will work for others as well until something better is produced.

3. Layer-7 Filter For Linux QoS

18�May�2003 (1 post) Archive Link: "[ANNOUNCE] Layer-7 Filter for Linux QoS"

Topics: Networking

People: Ethan Sommer

Ethan Sommer announced:

We have written a filter for the QoS infrastructure that looks at the data segment of packets and uses regular expressions to identify the protocol of a stream of traffic regardless of port number.

Many peer-to-peer programs (such as Kazaa and Gnucleus) will change to use a different port (including well known ports such as, say, 80) if they find that they can get better throughput there. That means that the port based filtering is no longer sufficient. However, by analyzing the application layer data, we can differentiate Kazaa from non-Kazaa HTTP, and lower the priority of whichever we deem to be less important. :)

It is a filter in the existing QoS infrastructure, so it can be used in conjunction with u32 filters, HTB or CBQ scheduling, SFQ queueing etc, etc...

Commercial companies sell devices which do layer-7 classification for anywhere from $6000-$80,000 depending on the bandwidth required. If we can build a comprehensive set of patterns I don't see any reason why Linux can't beat the pants off the commercial devices; we already have excellent queueing, and scheduling.

Our home page is http://l7-filter.sourceforge.net/ but if you want to skip right to the downloads go to http://sourceforge.net/projects/l7-filter/ (there is a kernel patch, a patched version of tc, and some sample patterns for HTTP, POP3, IMAP, SSH, Kazaa, and FTP.) You'll notice the patch is a somewhat large, most of that is regexp code.

We're still working on it. It currently only does TCP for example... Do you guys/gals have any comments/suggestions/etc? I suspect that this is a post 2.6 thing, but it is very non-invasive (it only adds approx. 2 lines of code that would affect anything if the user were not using the layer-7 filters,) so I still have a little bit of hope.

4. QLogic qla2xxx Driver Update Released

19�May�2003 (1 post) Archive Link: "[ANNOUNCE] QLogic qla2xxx driver update available (v8.00.00b2)."

Topics: Hot-Plugging

People: Andrew Vasquez

Andrew Vasquez announced:

A new version of the 8.x series driver for Linux 2.5.x kernels has been uploaded to SourceForge:

http://sourceforge.net/projects/linux-qla2xxx/

In addition to the standard kernel-tree and external build tar-balls, a patch file is provided to update v8.00.00b1 sources to v8.00.00b2.

Changes include:

5. New wiggle Tools For Applying Patches With Conflicts

19�May�2003 (2 posts) Archive Link: "ANNOUNCE: wiggle - a tools for applying patches with conflicts"

People: Neil Brown,�Andrew Morton

Neil Brown announced:

I am pleased to announce the first public release of 'wiggle'.

Wiggle is a program for applying patches that 'patch' cannot apply due to conflicting changes in the original.

Wiggle will always apply all changes in the patch to the original. If it cannot find a way to cleanly apply a patch, it inserts it in the original in a manner similar to 'merge', and report an unresolvable conflict. Such a conflict will look like:

<<<<<<<
Some text from
the original file
|||||||
Some text that the patch changes
=======
Some text that is the result of the patch
>>>>>>>

with the meaning that the "text that the patch changes" was expected somewhere in the "text from the original file" and should be replaced with "the result of the patch".

wiggle analyses the file and the patch in terms of words rather than whole lines and so is able to find matches that patch is unable to find. If a patch changes a word at the end of a line, and a word at the start of that line has been modified since the patch was made, then wiggle will have no trouble applying the patch.

wiggle has proved very useful for back-porting patches that were generated for the development kernel, onto the stable kernel. Sometimes it does exactly the right thing with the patch. When it doesn't it reports a conflict which is easy to resolve with an understanding of what the code and the patch were trying to achieve.

Wiggle is available under the GPL and can be fetched from:

http://www.cse.unsw.edu.au/~neilb/source/wiggle/

The name 'wiggle' was inspired by Andrew Morton's comment:

The problem I find is that I often want to take
(file1+patch) -> file2,
when I don't have file1. But merge tools want to take
(file1|file2) -> file3.
I haven't seen a graphical tool which helps you to wiggle a patch into a file.

which google can find for you: http://www.google.com/search?q=graphical+tool+which+helps+you+to+wiggle+a+patch

It isn't a graphical tool, but it is a good first step.

NOTES:

This release contains a 'tests' directory with a number of test cases that have proved invaluable in developing the program and my understanding of the subtleties of some of the issues involved. If you find a case where wiggle behaves sub-optimally (e.g. dumps core), please consider sending me a test case to add to the tests directory.

This release also contains a script 'p' and accompanying 'p.help'. This is a script that I use for patch management for my kernel patches and it makes use of wiggle to allow me to apply patches that 'patch' cannot manage. It is included both as an example of how wiggle can be used, and as a tool that some might find useful.

One shortcoming I find with wiggle is that I would like to be able to 'see' what it has done. I would love it if someone were to write a program that allowed the results of wiggle to be visualised. The closest that I have come to imagining a workable UI is to have two side-by-side windows, one of which shows the original patch, and the other shows a "diff -u" of before and after wiggle has done it's thing, and to have these windows automatically aligned so that when a change is shown in one, the corresponding change appears in the other. Maybe something like tkdiff, but that knows about patches and knows about word-based diffs....

Wiggle is also able to perform a function similar to 'diff' and show the differences and similarities between two files. It can show these differences and similarities at a word-by-word level. The output format is not machine readable as the character sequences used to delimit inserted and deleted words are not quoted in the output. Hence this format will probably change at some stage and should not be depended upon.

If you read the source, beware of comments: they were probably written while I was still trying to understand the issues myself, and so are probably wrong and out-of-date. I would like to review all the code and comments, but if I wait until I do that before releasing it, it'll never get released!

Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.