Kernel Traffic #143 For 26 Nov 2001

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1926 posts in 7692K.

There were 566 different contributors. 297 posted more than once. 210 posted last week too.

The top posters of the week were:

 

1. Common ACL API For All Filesystems (Specifically NTFS)
10 Nov 2001 - 15 Nov 2001 (14 posts) Archive Link: "[RFC][PATCH] extended attributes"
Topics: Access Control Lists, Extended Attributes, FS: NTFS, FS: ext2, Samba
People: Timothy Thomas RingenbachNathan ScottAnton AltaparmakovAlexander ViroAndreas Gruenbacher

Timothy Thomas Ringenbach asked, "I'm glad to see you guys are working on a common acl api for ext2/3 and xfs. I was just wondering if this api provided what would be needed for linux to support NTFS's acls." Nathan Scott replied, "The API doesn't favour any one form of ACL - it allows for any implementation to be layered above it, provided the semantics of those ACLs can be expressed using extended attributes, of course." While Anton Altaparmakov replied to Tim at greater length:

Comments/problems for NTFS with proposed EA/ACL API:

I think the API is good for extended attributes, no doubt. If we ever get round to implementing EAs in NTFS then I would be happy to use the API. It fully satisfies the needs of the NTFS EAs. The only addition I would put into the API is that the names of the extended attributes have to be able to have different name spaces themselves. For example I am fairly sure that the name of an EA in NTFS cannot contain just any character and it certainly cannot have a name of any length... This is something that needs to be considered. At least there must be a defined error return values of "EILSEQ" (bad name namespace) and "ENAMETOOLONG" (self evident).

But for ACLs I am not so positive:

I guess the real problem is that NTFS security doesn't map very well onto Unix/Linux type of security model because the NTFS model has way more features.

If you are asking the question whether NTFS can work with the proposed API then yes, it can support all its features, but not the other way round...

Particular problems:

  • The proposed API puts ACLs inside extended attributes (EAs). On NTFS ACLs have nothing to do with extended attributes. They are two entirely different things. I suppose they could be merged into one API and the NTFS driver would have to parse and decide whether it is supposed to be operating on ACLs or EAs. But that will be a pain, especially as there may be ways of abusing the system, depending on how exactly it is implemented.
  • The ACLs in NTFS are _way_ more complex than the suggested ones. So mapping from one to the other is possible only when creating new files. When reading/writing existing ACLs a lot of information would be lost.

    Further each inode has a "user" owner and a group "owner" plus two types of ACLs: system one (SACL) and discretionary "normal" one (DACL).

    These four thigns are stored within a self relative security descriptor. And some of them are optional or can be inherited from parent inode or can be defaulted. - This actually breaks the current API which says that files cannot inherit/default file ACLs. In NTFS they can.

    The actual permissions in NTFS are not just RWX but they are a lot more granular (a 32 bitfield, see below URL for a list of all defined values) and some of them even determine the access rights to extended attributes, which needless to say causes a problem if ACLs are treated as EAs...

  • NTFS doesn't store uids but Security Identifiers (SID ones not Security_ID ones, both are separate things on NTFS. Are you confused yet? I am...) so mapping would need to exist between NTFS SID and Linux UIDs. Samba needs to do this (and does it already AFAIK), too, but that is more a problem of NTFS and not a Linux ACL API.

All NTFS security stuff can be seen at the following URL - just search for IDENTIFIER_AUTHORITY and read from there on... all security related structures are defined there and there are quite a few comments.

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/linux-ntfs/ntfs-driver-tng/linux/
include/linux/ntfs_layout.h?rev=1.11&content-type=text/vnd.viewcvs-markup (http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/linux-ntfs/ntfs-driver-tng/linux/include/linux/ntfs_layout.h?rev=1.11&content-type=text/vnd.viewcvs-markup)

You can also read the NTFS documentation on SF but note that this is not as complete as the header file above but it might be easier to understand. The url with the description of the security descriptor is:

http://linux-ntfs.sourceforge.net/ntfs/attributes/security_descriptor.html

Nathan had a few technical comments, then replied again to himself with a (very rough) patch, announcing:

Andreas and I have been looking at several different VFS mechanisms for extended attributes, I've included the code for one below, and we're keen to get a bit of feedback here as well.

We started off with the simplest mechanism, just passing everything straight down into the filesystem. I then played around with some ways of separating out the different operations and then passing off to the filesystem that way (see patch) to give the interface a more rigid definition. Andreas' original mechanism was alot like this, except used NULLs in some field values instead of explicit flags to distinguish similar operations - that's another approach.

Yet another way would be to have an ea_operations vector separate to the inode_operations with an ea_operations pointer in struct inode, enumerating each EA operation and doing away with the flags (in the patch below) altogether.

Alexander Viro replied that the patch was totally broken, and implemented a vastly too complex API. He said scornfully, "Folks, it's not a rocket science. Let a function do _one_ thing, don't turn it into a multiplexed monstrosity. Yes, you've used only 3 syscalls. But actually you've managed to hide ~20 of them in that code and the fact that you've spent only 3 syscall table entries doesn't make the things better." Andreas Gruenbacher defended the specific technical decisions made in the patch, while Anton asked Alexander, "Out of interest, which access interface(s) would you like to see used? Giving a few suggestions you would be happy with would be a lot easier on anyone trying to develop a filesystem API than for them having to come up with one after the other until one is found which you approve of..."

Later on in the thread, Nathan posted a new patch that attempted to answer some of Alexander's objections. He and Andreas exchanged a few technical comments, and the thread ended inconclusively.

 

2. New User-Space Filesystem: FUSE
12 Nov 2001 - 19 Nov 2001 (4 posts) Archive Link: "Introducing FUSE: Filesystem in USErspace"
Topics: FS: NFS, SMP
People: Miklos SzerediJamie Lokier

Miklos Szeredi announced:

Had enough of life? Nothing to do? Write a filesystem!

What is FUSE?

FUSE (Filesystem in USErspace) provides a simple interface for userspace programs to export a virtual filesystem to the Linux kernel. FUSE also aims to provide a secure method for non privileged users to create and mount their own filesystem implementations.

There's NFS or CODA. Why FUSE?

Yes both NFS and CODA make it possible to create userspace filesystems. But none of them were designed for this task. The design of FUSE differs from the above in the following:

  • Ability to provide a _very_ simple userspace library interface.
  • Thin layer in kernel. Minimal caching, predictable behavior.
  • Communication is not over a network, and is optimized for local data transfer
  • Secure environment even if userspace client is non-cooperative.

All this is nice, but does it work?

I've tested fuse with a simple 'loopback' test program, and also with AVFS (http://www.inf.bme.hu/~mszeredi/avfs/), for which FUSE was designed for. That doesn't mean that there are no bugs in it, but it's a good sign...

Is it available?

Yes it can be downloaded from

http://sourceforge.net/projects/avf

How can it be installed?

FUSE currently works only on 2.4.X kernels. Installation requires the kernel source to be present. The kernel does not need to be patched or recompiled: the kernel part of FUSE is installed as a module. The FUSE module is SMP safe.

There is also a kernel patch (for kernels 2.4.12 and up) included in the distribution, which makes mounting by non-privileged users secure.

Comments on design, implementation, and on my state of mind are welcome.

To the point that FUSE implemented a thin layer in the kernel, with only minimal caching, Jamie Lokier replied, "Minimal caching? I would hope for maximal caching, for when userspace is able to say "yes the page you have is still valid". Preferably without a round trip to userspace for every page." Miklos replied:

I made some performance tests with FUSE, and the raw throughput is about 60MBytes/s on a Celeron/360 for both reads and writes. And yes, that includes two context switches and a copy_to_user/copy_from_user pair for each page.

I think that at such speed it's not really such a grave problem if caching is not done in kernel, and it simplifies things a _lot_.

End of thread.

 

3. GPLONLY: The Saga Continues
12 Nov 2001 - 15 Nov 2001 (14 posts) Archive Link: "Changed message for GPLONLY symbols"
People: Keith OwensAnthony DeRobertisMike FedykAlex Bligh

Keith Owens had had enough of the ongoing confusion regarding GPL-only symbols. He announced:

When insmod detects a non-GPL module with unresolved symbols it currently says:

Note: modules without a GPL compatible license cannot use GPLONLY_ symbols

I thought that hint was self-explanatory, obviously it was not clear. Never underestimate the ability of lusers to misread a message. insmod 2.4.12 will say

Hint: You are trying to load a module without a GPL compatible license and it has unresolved symbols. The module may be trying to access GPLONLY symbols but the problem is more likely to be a coding or user error. Contact the module supplier for assistance.

Does anyone think that this message can be misunderstood by anybody with the "intelligence" of the normal Windoze user?

Anthony DeRobertis replied wistfully, "Make something idiot-proof and the universe will create a better idiot." But Mike Fedyk said, more optimistically, "Somebody would have to be *trying* to be an idiot with this new message..."

Elsewhere, Alex Bligh put in:

Yes I think it can be misunderstood, and, perhaps more importantly, still points the user at GPLONLY when it's more likely to be a straightforward version mismatch. Better might be:

Hint: You are trying to load a module which has unresolved symbols. These symbols may not be exported by this version of the kernel (perhaps you have a version mismatch), or they may be exported GPLONLY, (in which case they will not be available to your module which does not carry a GPL compatible license). In either case, contact the module supplier for assistance.

 

4. Migration Toward Marcelo As 2.4 Maintainer
13 Nov 2001 - 22 Nov 2001 (16 posts) Archive Link: "Tuning Linux for high-speed disk subsystems"
Topics: Disk Arrays: RAID, Disks: SCSI, FS: sysfs, PCI
People: Roy Sigurd KarlsbakkCraig I. HaganMarcelo Tosatti

Roy Sigurd Karlsbakk said, "After some testing at Compaq's lab in Oslo, I've come to the conclusion that Linux cannot scale higher than about 30-40MB/sec in or out of a hardware or software RAID-0 set with several stripe/chunk sizes tried out. The set is based on 5 18GB 10k disks running SCSI-3 (160MBps) alone on a 32bit/33MHz PCI bus." There were a number of replies. Among them, Craig I. Hagan posted a patch and said:

this isn't quite true. use either the RH kernel, the -ac series, or the attached patch (for 2.4.15-pre4). Then set /proc/sys/vm/max-readahead to 511 or 1023 (power of 2 minus 1)

this should allow you to generate large enough io's for streaming reads to do what you are looking for.

Marcelo Tosatti replied, "This patch is already on my pending list. So if Linus does not apply it, I will."

 

5. File Server Recommendations
13 Nov 2001 - 17 Nov 2001 (15 posts) Archive Link: "File server FS?"
Topics: Disk Arrays: LVM, Disk Arrays: RAID, FS: NFS, FS: XFS, FS: ext2, FS: ext3
People: Jamie LokierAndreas DilgerAndrew MortonSean ElbleSteve LordRobert SzentmihalyiMike Fedyk

Someone wanted recommendations for the best filesystem to use for a file server, the requirements being that it had to support KNFSD, LVM, resizing, and quotas. Historians continue to speculate on why the expected flamewar did not erupt.

Mike Fedyk recommended ext3 as satisfying all of the above requirements except quotas, which in any case he said should be sorted out soon. Jamie Lokier was suspicious of this answer, as it seemed that resizing an ext3 filesystem might have some strange results. Specifically, if the journal file itself changed with the filesystem size, it would be necessary for resize2fs (or ext2resize) to take account of that fact, and adjust the journal file accordingly. He added, "When I have resized ext3 filesystems, I have removed then recreated the journal manually because it wasn't clear from the documentation whether resize2fs does the appropriate thing." Mike replied that he was sure it would work, though he hadn't actually tried it. He said he'd do some tests, and called upon Andrew Morton or Andreas Dilger to confirm his hopes.

Andreas replied that neither ext2resize nor resize2fs would adjust the journal file, but added, "Like Mike says, there should be very minimal impact to the filesystem operation, unless you are going from, say, a 16MB filesystem to a 500GB filesystem. You also have to watch out if you start with a filesystem smaller than 500MB - you will get 1kB blocks, and you don't want to have a large filesystem (10's of GB) with a 1kB blocksize. There is nothing that resize2fs or ext2resize can do about that, unfortunately." And put in for good measure, "It works just fine with ext2resize, and I'm pretty sure resize2fs also works on ext3 filesystems." Andrew also said, "mke2fs and tune2fs choose an initial journal size based on the size of the fs, so if you were increasing the fs size by a large ratio then there may be a case for increasing the journal size. But as you've pointed out, an 8, 16 or 32 megabyte journal covers an awful lot of metadata." He added that he'd actually tested this awhile ago, and would welcome a new test, if Mike wanted to do it. Mike said he'd give it a try in the next few weeks.

Elsewhere, Sean Elble recommended XFS to the original poster, saying, "it supports the kernel mode NFS server very well, it supports LVM, an XFS file system can be enlarged (not reduced), and XFS has great quota support, just be sure you use a 3.0 or greater quota tools package. Why use XFS over Ext3 you ask? XFS is faster, and scales better, IMHO. Again just my opinion, but I hope that helps." Robert Szentmihalyi replied that he had recently built an 800G fileserver using XFS on a 3ware RAID. He said it worked great, even under heavy load, the only drawback being that group quotas were unsupported. But Steve Lord replied, "XFS on linux has had group quota support for quite a while - certainly longer than 3 months. All the other features are available too." Robert came back with, "I have not tried it since the FAQ at http://oss.sgi.com/projects/xfs/faq.html#quotaswork said it didn't. (It still does, by the way. Perhaps you could update the FAQ :-))" By KT press time, the FAQ has been updated. EOT.

 

6. New Kernel Configuration Tool: mconfig
16 Nov 2001 - 18 Nov 2001 (4 posts) Archive Link: "[ANNOUNCE] mconfig 0.20 available"
Topics: Kernel Build System
People: Christoph HellwigKeith OwensSamium GromoffMichael Elizabeth Chastain

Christoph Hellwig announced:

The mconfig release 0.20 is now available.

Mconfig is a tool to configure the linux kernel, similar to make {menu,x,}config, but written in C and with a proper yacc parser.

The following changes have been made since the last public release, 0.18 by Michael Elizabeth Chastain:

  • switched to autoconf/automake.
  • build 'menu' mode only if curses are available.
  • added manpage (VERY simple).
  • added specfile for RPM builds.
  • help text moved from C source to external file.
  • modes 'text' and 'old' implemented.
  • verb 'dep_mbool' implemented.
  • relaxed error checking - moan in

This release is available as gzip/bzip compressed source tarball at:

ftp://ftp.kernel.org/pub/linux/kernel/people/hch/mconfig/

ftp:/ftp.opengfs.org/pub/opengfs/0.0.91/opengfs-0.0.91.tar.gz

ftp:/ftp.opengfs.org/pub/opengfs/0.0.91/opengfs-0.0.91-1.src.rpm

Keith Owens asked:

Christoph, could you explain why this is being added now and how it compares to CML1 and/or CML2?

kbuild 2.[45] is completely agnostic about how .config and autoconf.h are built, the only requirement is that .config be internally consistent before it goes into the main build phase. I don't care how .config is built, but I do want to understand why another version of CML is being developed.

Christoph replied, "It's not added now - Michael started the development about 5 years ago, in 1998 he stopped working on it. In 1999 or 2001 I started hacking on it, only adding what I needed. Now I finally found the time to make a formal release. The tool mconfig parses CML1 rules, and does so _much_ more strictly then any other parser." He added, "The current cml1 scripts are _very_ ugly, and even if cml2 makes it in 2.5 (yes, I don't like it - but I don't have to decide it..) kernels using cml1 will be around for a long time."

Elsewhere, Samium Gromoff remarked, "Personally i tried mconfig just now, and i was charmed by its speed, comparing to the one of the current cml1 implementation... Yes that was poor old p166... I think i'll stick to it for a while now..."

End of thread.

 

7. Weird Developer Interaction
18 Nov 2001 - 19 Nov 2001 (3 posts) Archive Link: "[PATCH] nwfs-2.4.15-pre5-4 NWFS Patch"
Topics: FS: NTFS
People: Jeff V. MerkeyAndre HedrickLinus Torvalds

Jeff V. Merkey announced:

I've posted another patch. Te previous patch for some reason had some fixes to NTFS included as well. Corrected. This patch is located at ftp.timpanogas.org:/nwfs/nwfs-2.4.15-pre5-4.gz (ftp://ftp.timpanogas.org/nwfs/nwfs-2.4.15-pre5-4.g) and incorporates the NetWare File System (NWFS) into Linux kernel 2.4.15-pre5.

This patch is submitted to Linus for consideration of inclusion into the Linux kernel.

Andre Hedrick (former CTO of Jeff's company, Timpanogas Research Group) quoted a private email (obscuring the identify of the author) received in response to Jeff's post:

I don't understand the big secrecy or whatever, on IRC.

If nwfs is legal to submit to Linus, then no problem. If it's not, then problem. Either way it's an honest question, "is this legal to post/submit to Linus?"

Andre added:

Mr. Jeff V. Merkey,

As you can see above, there are great concerns over the nature and status of the source code you have submitted for inclusion to the main linux kernel source tree. Given that I have intimate and detailed personal knowledge of your company, the nature an evolution of the source code in question, the following requirement is issued to you and your general council Mr. Andrew McCullough.

The follow issues must be settled in a legal brief to be reviewed by myself, any appointed general council, and key members of the core linux kernel development team and/or their employer. This may include other organizations where this content may be redistributed.

The first date and time the intial code base for the introduction of NetWare File System (classification to mimic) (now referred as NWFS), which is to be a direct replacement for system infrastucture based upon your time as original author of the Novell 4.XX NWFS, during your employment as "Second Fellow" and "Chief Architect".

Based on general knowledge of case Law in the State of Uath, you must legal define the date of first public disclosure of your rewrite of methodology to access/update native storage environments having the commerical product know to all as Novell OS (NOS) install upon the media.

Working from that date forward, you are required to outline all formal actionable steps taken by your former employer Novell that either can be properly described as legal, acceptable attempts to statisfy terms and conditions for the two (2) year period that claims can be made against TRG et al. If this time period has expired and Novell may not take action, then to the best of my knowledge the materials submitted could be acceptable for review. However, should there be any references or capablities to include the Novell Extended Directory Serivces (E-Directory), you will be required to remove such material.

Repeating the previous process above, but to include any environments which can be exercised in any other legal forum not described above.

I have now placed myself in an unstable position; however, until these issues can be addressed and verified, you should not expect the adoption of your work on this matter to be accepted at the current time.

Should you be able to statisfy mine and others concerns in the core development team, hand picked by Linus Torvalds, I would drop my concerns and suggest global review for adoption. You should know that I can only make suggestions, and raise issues of concern. The final decision is solely the responsiblity of Linus Torvalds.

Jeff replied:

Dear Mr. Hedrick,

I will comply with the directives of the Linux Community Representatives and direct Mr. McCullough to prepare the requested documentation.

As you are aware, NWFS is a total rewrite of Novell's Native File System. TRG has versioning systems that have tracked the creation of this code since it was begun, and no Novell Source code was used, nor was any confidential information or trade secrets of Novell as defined by UTSA (Uniform Trade Secrets Act).

These materials will be completed and submitted to you NLT Wednesday this week. They will not be posted to LKML, since they are attorney work product. We are greatly honored to be afforded the opportunity to submit our technology to the Linux Community, and for being given this chance to contribute to Linus Torvalds extraordinary efforts and the efforts of the other members of the Linux Community who have made a great success.

I will submit this documentation by Wednesday. Andrew will be available at your discretion for a telephonic conversation to address any specific concerns. His direct line is 801-222-9635.

End of thread.

 

8. Status Of NTFS Support
22 Nov 2001 (5 posts) Archive Link: "[vojkan@global.net.mt: Re: RAW NTFS Partition]"
Topics: FS: NTFS, Version Control
People: Vojkan PetkovicJeff V. MerkeyAnton Altaparmakov

Jeff V. Merkey forwarded a private email to the list, from Vojkan Petkovic, in which Vojkan had said, "sorry to bug you but I am in same serious trouble with W2k NTFS after bad memory crash trashing my disk with a year of video edited klips that I work with. Could you send me the tool to try fixing the problem with it, please?" Jeff added, "Can you help this person? My 18 months has now expired. I can help on NTFS now if you need some help." Anton Altaparmakov replied that he'd talk to Vojkan off-list, but that the problem sounded like he needed a data recovery company, rather than diskedit. Regarding Jeff's legal status, Anton added:

That's cool to know. I am developing a new NTFS driver - NTFS TNG. It is read-only for now and it is almost complete with regards to basic read support. I.e. it works NOW. - The only thing that is missing is attribute list attribute support but I am working on it as we speak. (-;

If you or anyone else of course is interested in participating in development, have a look. Code is in module ntfs-driver-tng in Sourceforge linux-ntfs CVS. URL with cvs access details:

http://sourceforge.net/cvs/?group_id=13956

Note that the module requires some small changes to the core kernel and the appropriate patch is maintained in ntfs-driver-tng/patches directory. Currently kernel 2.4.15-pre4 is supported but patch might apply to later -pres as well.

After applying the patch and installing the new NTFS module sources NTFS TNG is completely separate from the kernel tree (all code including headers is in fs/ntfs and nowhere else and the include/linux/fs.h dependencies are gone).

One word of warning: NTFS TNG requires gcc-2.96 or later to compile. It will NOT compile with earlier versions of gcc! You will just get a million or so errors if you try any earlier gcc compiler...

Jeff replied, "I will downlaod and review where the code is at. Write support is going to be very tough at this point -- they have changed some of the on-disk strucutures again for the journal and several meta files. However, with the DOJ settlement, they will be required to share this information, so our job could get easier. I will approach some folks and see where they are at. Hopefully, it won't go down the way it did last time." Anton replied that he wasn't going to worry about the journalling feature until he'd gotten NTFS working under Linux. He also added, "From what I have read I doubt the DOJ settlement will be any benefit to NTFS development on Linux but it's certainly worth a try." Jeff said he sent an email to some people he knew, to see where they stand on that issue. End of thread.

 

 

 

 

 

 

We Hope You Enjoy Kernel Traffic
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License, version 2.0.