Kernel Traffic
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic

Kernel Traffic #15 For 22 Apr 1999

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1252 posts in 5253K.

There were 428 different contributors. 186 posted more than once. 137 posted last week too.

The top posters of the week were:

1. Bug Hunt For 'Impossible' Errors

24 Mar 1999 - 14 Apr 1999 (31 posts) Archive Link: "`Out of memory for cc1' in linux-2.2.4"

Topics: Debugging

People: Stephen C. TweedieLinus TorvaldsAndrea ArcangeliMike GalbraithDavid Ford

Under Linux 2.2.4, Russell Senior was getting "out of memory" errors, as well as "put_dirty_page: page already exists" errors, in var/adm/messages when trying to compile something. Stephen C. Tweedie said, "This is very bad: something is seriously corrupt somewhere for these put_dirty_page messages to occur." Elsewhere he added, "those are supposed to be a this-can't-happen condition." Meanwhile, David Ford and others confirmed the problem on several Linux versions.

Stephen posted a patch to get more information, but the messages weren't reproducible on demand, so it didn't help much. For a moment it looked like he had a grasp on something, when Russell reported a "put_dirty_page: pte 00070040 already exists" and "put_dirty_page: pte 00078040 already exists" error. Stephen replied, "these are both SysV shared memory swap ptes. What on _earth_ are they doing there, in the fresh address space of a newly-exec()ed process?" He added, "Thanks for tracing this, at least it gives me somewhere to start looking for the problem." He asked Russell to try 'ipcs -a' as root, but when Russel gave the results, Stephen said, "I am completely at a loss to work out where your shm ptes are coming from."

This was April 7, and the thread ended on that sad note. But a couple days later Mike Galbraith started a new thread with the Subject: [TESTCASE] `Out of memory for cc1', in which he found a way to reproduce the problem with some reliability. There was no reply for about 3 days, and then Andrea Arcangeli and Linus Torvalds hashed it one out over the course of about a day.

Andrea said the problem was probably in free_pgtable() in mmap.c, and posted a patch. Linus' first impression was, "I think you're on to something, but I think your patch just hides the real problem." Two and a half hours later he had a bit more to say:

I was looking at the shared memory code, and just basically I cannot convince myself that it can be reliable. It has all these hacky things in it to bypass the normal mm layer logic, and I would not be at all surprised if it has serious problems when swapping. Whether this could explain any problems or not is unclear, but it does need to be cleaned up: at least enough that I can convince myself that it could possibly work..

Didn't somebody (Andrea?) have a good test-program for shared memory under low memory?

I have a truly cheezy patch that basically just rips out completely the mm-bypass code, but I've so far only verified that it compiles. It makes me feel happier about the shm code, but I don't have anything at _all_ that uses shm (I've even lost my xquake install in one of my machine upgrades). Anybody interested in checking the patch out and seeing what needs to be fixed up?

I bet there are things I missed - even though most of what I did was just to remove a lot of dubious code and rely on the generic mm layer doing the right thing instead.. But having _no_ testing it just basically is not very likely to work on the first try.

Andrea, Stephen, any comments/interest?

This should be further fixed to use the page cache etc etc (and remove the "attach" list that shouldn't be needed any more and other cleanups). Consider this a weak first step.

Later, he added, "I just verified that it at least runs doom, so while it is probably broken, it's at least not _completely_ broken, and that makes me happier about it all."

Stephen said that the shared memory test-program Linus had referred to was his. He added, "no amount of stressing was able to show up any shm fault under heavy swapping. However, one thing it did _not_ test was heavy mapping and unmapping of shm segments, which is what appears to be the problem here."

Responding to Linus, Andrea posted a new patch and gave some explanations of his thoughts; Linus replied with some explanations of his own; at this point Andrea jubilantly posted a two-line patch (at 4 AM) which he felt got right to the heart of the matter. He also posted another patch (untested due to lack of sleep) which he felt would improve the code, even though the first one was the actual fix.

To the first patch, Linus said, "Bingo!" and added, "completely agreed, this looks like a obvious silly thinko, and the fix looks obviously right. Good show, Andrea, thanks." But 10 minutes later he posted again, saying he felt Andrea's second patch was not correct. He added, "That function is just 30 lines of seemingly simple code, but even so it had two bugs in it (a off-by-one error that Ingo found, and now this)," and finished with, "Anyway, with your simple vm_end -> vm_start fix (ie your first patch), I believe we can put this function finally to rest."

And the thread was over.

2. Journalling And 'Capabilities' In ext3

31 Mar 1999 - 14 Apr 1999 (66 posts) Archive Link: "softupdates and ext2"

Topics: BSD, Backward Compatibility, Executable File Format, FS: NFS, FS: ext2, FS: ext3, POSIX

People: Theodore Y. Ts'oStephen C. TweedieIngo MolnarMatthew KirkwoodJeremy FitzhardingeAlan CoxPavel MachekAlbert D. CahalanThomas Pornin

This series of threads was covered in LWN's kernel section two weeks ago and last week, and now looks like it could go on a few more months.

It all started as a softupdates question from Thomas Pornin. Since softupdates in BSD try to make unexpected reboots more recoverable, Thomas hoped something similar was in the works for Linux. Alan Cox pointed out that Stephen C. Tweedie was working on journalling for ext3 (which would altogether eliminate the need for fsck after unexpected reboots), and that dtfs (basically ext2 with logging) was available for 2.0 kernels. Some folks were in a sweat to see Stephen's code, but he said it really wasn't in a sharable state, though the journalling layer itself was nearly finished.

John Wojtowicz veered off with a question about whether 'capabilities' would be included in ext3. A later post of his, defining the basic technical aspects of the issue, was quoted in full in the Linux Weekly News. There's also a page on discussing it. Stephen replied that was the mailing list discussing capabilities, and the last set of patches from Andrew Morgan were available at (this link seems dead by KT press time, however).

Albert D. Cahalan objected to Andrew's approach of putting capability information in the filesystem, favoring a header in each executable. He felt it would maintain better compatibility, save inode space, and reduce the risk of buggy filesystem code.

His idea was to use the setuid bit to indicate that a capabilities header might exist. Then the kernel could check for it and use its information to give the specified capabilities to the executable.

This apparently has some security questions (as do all other possible implementations). If the suid bit is going to have this overloaded meaning, there's the problem of making sure that older systems have the right behavior, regardless of the program or the capabilities involved. Albert seemed sure that the problems could all be resolved, but other folks had doubts. Alternatively, people like Matthew Kirkwood (in favor of handling the problem at the filesystem level) hoped ext3 wouldn't even bother with backwards compatibility with older kernels, but would clear out the crufty legacy code and start fresh. Albert was nonplussed with that idea, asserting that backwards compatibility was essential. He also came out against the whole idea of a new ext3 filesystem. He felt all the features people wanted could be handled by ext2. Matthew pointed out that journalling was already being done, and couldn't be done without breaking compatibility, so they might as well go whole hog and add other good stuff like capabilities while they were at it.

In a new thread, with Subject: [PATCH] So you want capabilities?, Pavel Machek said (essentially), less words, more action! and posted a patch to handle capability headers on ELF executables. In the same post he also included a (GPLed) script to add those headers to executables. There followed a bit of discussion about implementation.

A couple days later, Pavel posted a new patch based on the previous discussion, under the Subject: [PATCH] Capabilities, this time in elf section. One problem with his previous patch had been that the header had to be at a fixed position from the end of the executable. The new patch had the header in its own ELF section. He also asked if someone could write a utility to read and write those headers. Jeremy Fitzhardinge immediately volunteered (later under the Subject: [PATCH] Capabilities in elf he posted a third update to the patch).

Around here is where Ingo Molnar made the suggestion (covered in the LWN articles) that eventually root could be eliminated altogether and replaced with different capabilities. Pretty much everyone had a different opinion about every aspect of the situation by this time, which wasn't helped by the pure complexity of the issues. Folks were finding unrelated flaws in each other's reasoning, leading to strange circles of disagreement and bizarre but fascinating tangents.

Finally Theodore Y. Ts'o said (quoted in part):

There are at least three definitions of "capabilities" floating around. The first is the one which Andrej Presern asks for, which is the academic C.S. definition of "capability". This is the model where all accesses are mediated by magic tokens which are called capabilities, which can be passed around between functions and between programs/processes. In order to actually gain access to a file, you have to present a capability which gives you the right to read it. This means that the open system call would require a new argument, which would be the "capability". Your user id has no meaning, and there is no "capability set" associated with a process. This is actually a very nice model, since it means that there is no implicit set of privileges associated with your process. Instead, all accesses are mediated via explicit use of capabilities for which your process has access, and the application program has to explicitly select which capability (out of potentially dozens or hundreds) it might have when it calls each and every system call.

The only problem with this model is that it is hopeless impractical. It involves making a completely change to the entire POSIX API, and a system which used this would not be able to use any of the programs ever written for Unix or POSIX systems, since it requires a change to the fundamental OS API. So, no one besides Andrej has ever agitated for this kind of "capabilities", and this is indeed something for which you might as well rewrite Linux from scratch, and it will fail since none of the existing Unix tools could be trivially ported to it.

The second model of "capabilities" is the one which folks like Pavel have argued for, and which I will call "capabilities lite". The argument here is for something which doesn't require making any changes to tar, cp, or any filesystem specific changes. The main goal is to solve the setuid root executable problem, but taken to its logical extreme, this model only makes sense if you still have a root account and you don't support the "inherited" capabilities set. Thatis, by default, all processes still inherit the capabilities of their parents across exec()'s. This has some of the advantages of the "full capabilities" model (described below), but it is a pale, castrated model of the full POSIX capabilties design. The main argument for adopting this seems to be as a short-cut, and possibly because some of the proponents may not understand all of the advantages of the full capabilities model.

The final model is the one that I and others who have been involved from the linux-privs project from the beginning have been argueing for, and that's the full capabilities model as worked on by the POSIX subcommittee. This model at its fundamental core does require making changes to the filesystem, and to programs like tar.(*) It also does require making fundamental changes in how a system is configured, and administered, and how system administrators go about their daily business. I expect at least at first, people will only use capabilities in the "capabilities lite" mode. That is, they will simply only give sendmail the few limited privileges it needs, and not actually try to make root go away. But I think it's really important that we not completely rule out the full capabilities model as originally envisioned by the POSIX subcommittee. It has a lot of advantages that the "capabilities lite" model doesn't have, and it's basically as far as you can take capabilities while still being true to the fundamental Unix design. It may not be as far as Andrej would like, but it's as far as you can practically go.

So, given that, what do we do? I think we should strive for the full capabilities model as enviisioned by POSIX. Yes, it's more work, but at the end of the day, you end up with a much great set of advantages. It also allows us to be compatible with other Secure OS's that have been implemented, and with the POSIX standard if/when it ever gets revived and completed.

The main thing that had been holding up the full capabilities folks had been the necessary ext2 changes so that you could store the capabilties information in the the filesystem, attached to the file. This is what Secure Solaris and Secure HPUX does, and it's what the POSIX draft clearly assumed how this would be implemented. The problem was that doing this in a clean way is non-trivial and Stephen and I both haven't had the time to take the initial set of patches which tried to implement this and clean them up. This is our fault, and those folks who came up with the hack of storing them in the ELF headers was trying to get around the fact that we couldn't get support into the ext2 filesystem quickly enough. Fair enough.

What I would suggest is either using a combination of the sticky bit plus the immutable flag, *or* define a new ext2 flag which means "this file has capability information". The second is probably the better choice, and since it's only a bit in the flags word, it's easy enough to implement. Now, people will inevitably complain that this means you can't use other filesystems. That's correct. And I think, that's OK. As I mentioned, all of the other Secure OS's that have done this have assumed filesystem support. And you wouldn't want to use capabilities over NFS anyway, since NFS stands for No File Security. People who use capabilities are generally serious about their security, and it's easy enough to corrupt an executable going over the network via NFS to insert an attacker's trojan horse code. So running setuid programs over NFS is a *really* bad idea to begin with.

By using a special ext2 flag meaning "there be capabilities here", and using the ELF header hack, it means that you do allow a program to be setuid daemon as well as having capabilities; and it allows you to build a system with the full set of capabilities in the POSIX capabilities models. It does mean that we will need to extend some tools and build a few new ones. In addition to writing a setuid scanner, we will need to write a capability scanner. But in the long run, I think the extra effort will be worth it.

(*) Secure HPUX has a very clever way of hiding the capabilities information in a tar file such that non-capability-aware tar programs can still read tar files with capability information. Basically, the capability information is stored in the tar file first, with a flag marking it as containing capability information, and with the same name as the file for which the capability information is for. It is then followed by the actual data contents of that executable. A non-capability aware tar program doing an extract of that tar file will ignore the flag indicating the contents of the capability information, write the capability information as a file to disk, and then when it sees the second file record, it overwrites the first file with actual data contents of the file, since it has the same name as the first.

The conversations are still going on, with a mixture of flames, reasoned arguments, philosophical differences, and some code.

3. VFAT Hacks

10 Apr 1999 - 12 Apr 1999 (6 posts) Archive Link: "[RFC] change of lookup() method."

Topics: FS: VFAT

People: Alexander ViroLinus Torvalds

Alexander Viro proposed that lookup() should (in case of success) return a pointer to a dentry, instead of just returning 0. According to him this would not impact current modules, but would allow him to scrape a lot of cruft out of the VFAT implementation. Linus Torvalds replied that he'd been planning this change for 2.3, and asked if Alexander had any overriding concerns that should up the priority to 2.2. Alexander replied, "VFAT has a nasty bug in rename() and I see no other way to fix it. Suppose we are doing rename("/mnt/LongDirName/index.html","/mnt/bar/index.html") and pwd of some process being /mnt/longdi~1. We'll either have the alias alive after the rename (hashed, under the old name, etc.) or we'll get a process with unhashed pwd. We've been through the latter variant already. Easily results in dentry leak. Former variant is outright BS - mv LongDirName foo; cd longdi~1; shouldn't succeed. Current code unhashes all aliases of source. Right thing for files, exploitable for directories. Notice that moving aliases does no good - target may have none," and added, "We could return -EBUSY if the source has busy aliases, but IMO it's extremely bogus, even for bandaid."

4. WinModems Under Linux

8 Apr 1999 - 14 Apr 1999 (54 posts) Archive Link: "Please!! Help me to help us to use WinModems in Linux"

Topics: Assembly, Modems

People: Richard ReynoldsGregory MaxwellMatthew KirkwoodLinus TorvaldsRiley WilliamsGerard Roudier

Richard Reynolds (MS Windows user) found some specs to WinModems and wanted help programming a driver for Linux. Some folks replied that WinModems sucked. Tracy Camp said if Richard wanted to write the driver, he should be encouraged. There was more discussion about WinModems and the industry, and eventually Richard wrote under the Subject: I have information/driver source for winmodem's (no Joke):

I am re-writing this for many reasons. mostly because of some mis-information of the winmodem's that some people responded with. and I failed to inform the right people that I have the information. and what information I actually have.

Here is the thing the winmodem style modem is not actually a windows modem but a modem that is missing critual hardware to be considered a modem there is only 2 parts of windows that supports the winmodem 1 its multitasking(or sort of) and 2 companies have little encouragement to write for other operating systems. the greator numbers of phone-line modem users use windows version something.

there are no AT commands for any of the lucent modems! so having the commands programmed elseware doesnot help. the commands are proprietary.

Any one can distribute the binary code for the modems(at least viking the only one that i speak for in legal terms although there are some that use the same thing packaged as something else) the source code is protected and requires a nda agreement of which i have signed. and any work must be submitted to the company which I also will do.

The interface part of the drivers is the part I can not write. also there were comments as to which programming language i should and should not use well the one that the base drivers are written in is assembly for the intel processor (the only chip the modems will work with) however if i found that any one had responded i would have replied with a comment like "i have the internal hardware IO source in assembly and i can not share it in source code either electronically or printed but i can program that part" but i got a lot of replies so i am replying in as much detale as i can.

I do have source/documantation for the full driver including all the windows insertion commands which have no barring on linux.

thanks for your replies and this is a good topic according to the replies i got either on the list or thru private email so it does not have to die yet.

There was some discussion of implementation, and then Riley Williams pointed out that being under NDA meant he would not be able to release any code. Gregory Maxwell replied that Richard could release code if he wanted and Linux would benefit, but Richard would get burned. Matthew Kirkwood as an aside added that NDA's didn't necessarily specify that no code could be released; often the terms applied only to docs. Gerard Roudier also put in that binary modules could be linked into the kernel because of Linus Torvalds' licensing policy, so Richard could release a working module at least.

The thread died after awhile. I suppose part of the problem is that a lot of folks just hate WinModems, so there's not much interest in helping out someone writing drivers for them.







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.