Kernel Traffic #10 For 18�Mar�1999

By Zack Brown

Table Of Contents

Introduction

Thanks go out to Peter Samuelson, who noticed we had a lot of buggy html ("I ran weblint on your main page and it had a fit"). With one or two exceptions it's all been fixed, hopefully never to return. Thanks for the heads-up Peter!

Thanks also go to Axel Boldt for pointing us to the very well threaded linux-kernel archive at http://www.kernelnotes.org/lnxlists/linux-kernel/. This link has replaced the old one on our sidebar and in our article titles. Soon all KT back issues will use this link as well. Much obliged, Axel!

BTW, We have the latest Linus Torvalds interview (at LinuxWorld) on our interviews page (interviews.html) .

We've also divided our quotes page into people quoted in a single issue (singlequotes.html) of KT, and people quoted in more than one issue (quotes.html) .

Mailing List Stats For This Week

We looked at 1082 posts in 4026K.

There were 372 different contributors. 146 posted more than once. 138 posted last week too.

The top posters of the week were:

1. PPP Problems In 2.2.x

6�Mar�1999�-�7�Mar�1999 (4 posts) Archive Link: "ppp & IDE problems on 2.2.2 ac7 :("

Topics: Disks: IDE, Disks: SCSI, Networking

People: Douglas Gilbert,�Mark Lord

Enrico Demarin noticed that doing a cdparanoia rip caused ppp to start reporting errors on his machine. Stopping cdparanoia would bring ppp back to normal. He tried to set umaskirq on all drives, but it didn't help. Previously, irqtune had fixed the problem for him on 2.0.x, but it wouldn't run on 2.2.x

CaT confirmed the problem, and Douglas Gilbert asked Enrico if he was using IDE or SCSI emulation, adding that if it was SCSI emulation he might have a solution; but CaT replied that (at least in his case) it was pure IDE. So much for that.

At this point Enrico found a solution and started a new thread, with the Subject: more on IDE and PPP on 2.2.2 (http://www.kernelnotes.org/lnxlists/linux-kernel/lk_9903_01/msg00923.html) . Apparently 'hdparm -u 1' in his init scripts solved the problem, but would give an error if there were no cd in the drive. That worked for CaT too, who added that the latest hdparm (http://www.dyer.vanderbilt.edu/server/udma/) gave no error.

Mark Lord broke in, saying that hdparm was the wrong tool for the job, and that 'echo "unmaskirq:1" > /proc/ide/hd?/settings' was better. End Of Thread.

2. Unsuccessful Bug Hunt

7�Mar�1999�-�8�Mar�1999 (5 posts) Archive Link: "matrox-fb"

Topics: Debugging

People: Petr Vandrovec

Matthias Runge and Riccardo Facchetti had problems with matrox-fb: Matthias got a blank screen on exiting X, while Riccardo's machine would (among other bad things) lock solid when starting X while sound was playing. Petr Vandrovec asked for some information, Riccardo gave it, Petr couldn't find the problem, and the thread died.

3. System Clock Losing Time in 2.2.x

6�Mar�1999�-�8�Mar�1999 (10 posts) Archive Link: "SCSI access creating lost time"

Topics: Disks: SCSI

People: Doug Ledford

Buddha Buck noticed that his system clock was losing time during SCSI accesses, both on 2.2.1 and 2.2.2. Colin McFadden confirmed the problem, and pointed out that this wasn't the case under 2.0.x. At this point Doug Ledford explained, "both of you have polling based SCSI controllers where the drivers sit in tight loops waiting for things to happen inside of a spin_lock_irqsave(). There's nothing intelligent that can be done until the locking in the drivers/mid-level SCSI code is redone except switching to a SCSI controller/driver combo that doesn't spin wait for things like individual bytes on the SCSI bus."

4. 'gcc' Vs. 'egcs'

6�Mar�1999�-�7�Mar�1999 (17 posts) Archive Link: "glibc-2.1 upgrade headaches. Any ideas??"

Topics: Compiler

This thread started when Alex Kamalov had trouble getting gcc to work. In the course of the discussion it came out that the latest kernels will indeed compile with egcs (don't try it with 2.0.x though). Thus ends a great controversy: did Linux have bad coding, or did egcs break their implementation? Finally that question can quietly fade away. But oh, the flames it inspired!

5. Linux As A Microkernel

6�Mar�1999�-�15�Jun�1999 (5 posts) Archive Link: "Philosophical issue type thing...."

Topics: Microkernels

People: Oliver Xymoron,�Alan Olsen,�Linus Torvalds

Michael Loftis had the idea that Linux may be gradually becoming a micro kernel. He pointed to the relatively recent addition of kernel threads as support for this.

Oliver Xymoron replied, "This is still a far shot from a MK, however - there are no protection boundaries to cross between kernel threads. As the protection boundaries are simultaneously the big plus (stability) and big minus (performance) of the microkernel approach, this difference is quite significant."

Alan Olsen put in for good measure, "Actually the goal is to fold Wine into the kernel eventually. I figure it should be done around kernel version 6.6.6."

As of KT press time, Linus Torvalds has yet to pass final judgment on that idea, so we can only wait and hope. ;)

6. Crashing xterms In X

7�Mar�1999 (4 posts) Archive Link: "Linux-2.2.2 bug or feature?"

People: Thomas Zehetbauer

Espen Ellevseth upgraded from 2.0.36 to 2.2.2 and found some interesting behaviour. Apparently doing:

$ mkdir test
$ mkdir test/test2
$ cd test/test2
$ rmdir .
$ rmdir ..
$ cd ..

from X would crash the xterm. Thomas Zehetbauer replied that it was a bash bug, since he could reproduce it with bash and no X, but not with tcsh.

7. Developer Interaction

7�Mar�1999�-�9�Mar�1999 (20 posts) Archive Link: "TCP window updates [Was: PROBLEM: Sending mail-attachment]"

Topics: Developer Interaction, Networking

People: David S. Miller,�Andrea Arcangeli,�Stanislav Meduna

Matthias Moeller continued a TCP debugging session. David S. Miller gave a short patch. Matthias and Stanislav Meduna initially reported success, but Matthias posted again two hours later, still having a problem. Andrea Arcangeli posted a patch, and Dave had this harsh reply:

Andrea I'm going to start ignoring your TCP patches because:

  1. You continue to bundle them into larger and larger and larger patches, because you believe all of your fixes are correct, many of them are not. I refuse to have to scan through a monster set of TCP changes just to see the "new" fixes and you make some of the new fixes depend on the existing cruft in your monster patches.

    You cannot report fixes to me in this way, sorry.

  2. You code too quickly without understanding the issues first.

In this case, do you understand how ACK's are supposed to be unreliable and there are other mechanisms in TCP which deal with the fact that they may be lost? No, so you tried to "fix" something which was not broken. Perhaps you should add an ACK retransmit timer while you are at it? That would fix the problem right?

Take a look some time at the kinds of patches I am putting into 2.2.x TCP these days, small, well defined, and obviously correct. I'm not rewriting the entire ACK'ing mechnaism when a carefully considered 2 line fix will do the same for example. And that is what will go into 2.2.x if I have anything to do with it, small well contained and well defined fixes to TCP bugs, nothing more.

Andrea sent another lengthy patch in, and asked, "could you tell me what's wrong with this patch?" He went on, "if something is wrong with this patch I'd like to know why because I passed the whole last three days of my spare time into trying to fixing this bug with the help of Andi " [Kleen] "and Alan " [Cox] ". I hope I am not causing you a waste of time."

Later on, Dave had a different reaction (#19) to a patch from Andreas.

8. Running a.out Binaries Over NFS

7�Mar�1999�-�8�Mar�1999 (10 posts) Archive Link: "[PATCH] a.out don't exec over NFS"

Topics: Executable File Format, FS: NFS

People: Linus Torvalds,�Alan Cox

Jan Rekorajski posted a one-line patch making NFS_DEF_FILE_IO_BUFFER_SIZE equal to 1024 instead of 4096, to allow a.out binaries to be executed from NFS-mounted partitions. Alan Cox pointed out that the patch would cause performance problems, since the kernel would have to do a lot of small IO operations.

Linus Torvalds jumped in with, "The blocksize only matters for filesystems that support the bmap operation, so the a.out test is really not strictly correct anyway. A more correct thing would probably be to do the blocksize check only if we have a bmap function for the file." Jan submitted another patch, but there was no reply.

9. Device Assignment Ordering

7�Mar�1999�-�8�Mar�1999 (5 posts) Archive Link: "eth-Devices and naming wishes"

Topics: PCI

People: Riley Williams,�Martin Mares,�Paul Wouters

Marc Haber had two ne2k-pci-network cards and wanted to put one on his trusted network, and one on the untrusted one. But in order to make sure the same card would be associated with its network each bootup, he was worried he'd have to put cruft into all his init and config files.

Riley Williams said that as far as his limited experience went, "for any particular pair of PCI NE2K cards, the BIOS will always allocate them the same way round, so if a particular card appears first on one occasion relative to a particular second card, then it will always appear first relative to that particular second card."

Marc objected that in that case a BIOS update or a board upgrade could change the order, but Martin Mares added, "If you switch off PCI BIOS support or enter "pci=nosort" at the command line (the latter works only with 2.2 kernels), the only thing the scanning order depends on is the actual numbering of PCI slots which should be fixed for every sane chipset."

Paul Wouters finished up with some different advice. He said, "You can load the modules in order. for example, say you have two 3c509 cards, do 'insmod 3c509 irq=10,11 io=0x300,0x320'. This will make eth0 be 10,0x300 and eth1 be 11,0x320. Alyways. in this order only." He added, "btw. you might want to have a look at www.linuxrouter.org (http://www.linuxrouter.org) "

10. Big Memory Machines

7�Mar�1999�-�8�Mar�1999 (9 posts) Archive Link: "Grr.."

Topics: Big Memory Support

People: Linus Torvalds,�Alex Buell,�Stephen Tweedie

Security

Alex Buell complained about the 4GB RAM limit on Intel architectures, and after a bit of discussion, Linus Torvalds came in with (as quoted in Issue�#9, Section�#7� (3�Mar�1999:�Buffer Overflow Attacks; Big Memory Machines) ):

Actually, while I've always looked at the 36-bit extensions with extreme distaste, Stephen Tweedie convinced me that we can really cleanly and fairly easily support it in a perfectly reasonable manner. It won't be 100% support, but it looks better than just using it as a ramdisk (we can basically use it for page caches and anonymous pages without having to get ugly about it - they have enough of an abstraction layer that it doesn't impact the rest of the system much at all)

So we may actually end up with a reasonable support for it. The page cache and anonymous memory is really what people want anyway, other memory uses are basically "fluff" compared to those two.

We'll see. The proof is in the code, and while Stephen made a very good case for something that I would accept, we'll just have to see how it works out. Right now the alpha/sparc64 approach is still the only reasonable choice with Linux.

11. Symlink Recursion

8�Mar�1999�-�11�Mar�1999 (12 posts) Archive Link: "Recursion level of symlinks limitted to five?"

Topics: POSIX

People: Andi Kleen,�Alexander Viro

Ruben Schattevoy was very upset to discover a much smaller limit to symlink recursion than he expected. Andi Kleen summed up the principle, with, "The reason for the small number is that the routine that parses the path name in the kernel does actually recurse on symlinks. Kernel stack space is very limited (~6K - stack space for interrupt handlers on 2.2). To avoid overflowing the kernel stack the maximum nesting has to be limited. AFAIK there are no plans to change it." In another post, he added, "There is a POSIX requirement that at least 8 hard links are allowed to a single file, but that has nothing to do with symlinks."

Junio Hamano had the wild idea of moving symlink resolution into userspace, but Alexander Viro strongly objected, saying, "Current implementation of symlinks resolving is broken, but fixes belong to kernel, not to userland. It can (and I hope will) be done in 2.3. The Right Thing (tm): for normal symlinks add a new method that would do kernel-space readlink (current ->readlink() actually does exactly that and then copies the stuff to userland). Remove ->follow_link() for them - it is nothing but kernel-space readlink() followed by lookup_dentry() (I'm talking about the *current* code). For pseudo-symlinks in /proc leave the stuff as-is - they can't give any recursion anyway (and are *not* symlinks). Add a context for lookups (i.e. structure filled before call of lookup_dentry() and containing the pointer to name + flags + credentials of caller) (there is a lot of other reasons to do it). Keep the stack of pointers to buffers returned by kernel-space readlink() inside that structure. Make lookup_dentry() completely iterative. There you go - you need 2 pointers + integer for each level of recursion (pointer to symlink's dentry, pointer to current position in the symlink contents, length of remaining part of said contents)." He added, "Notice that you *can't* rely on arbitrary depth of nested symlinks in applications - all Unices impose some limit here. You can't assume that application will never receive ELOOP."

12. RPC Issues In 2.2.2

8�Mar�1999 (7 posts) Archive Link: "2.2.2 Bug? RPC: sendmsg returned error 101"

Topics: FS: NFS, SMP

People: Jan Sevelin,�Trond Myklebust,�Steven N. Hirsch,�Matthew G. Marsh

Jan Sevelin said, "symptom: Under kernel 2.2.2 when shutting down networking in runlevel 6 the shutdown process gets stuck repeating "RPC: sendmsg returned error 101"." The only difference between his machines with this problem and those without, was that the affected machines had SMP enabled and 'define MAX_MD_DEV 6'.

He also included a patch to add an ENETUNREACH case to /usr/src/linux/net/sunrpc/xprt.c, but Trond Myklebust asked in surprise, "Why are you getting ENETUNREACH? In principle that error should never occur at that point unless you've shut down the network in the middle of an RPC request. Please check your links in /etc/rc.d/rc6.d..."

Jan checked his links and found them correct, but found a workaround (aside from his patch) and said, "putting a sleep 10 after killproc portmap in /etc/rc.d/init.d/portmap also solves the problem... Those 10 seconds is a cheap price to pay for avoiding a half hour journey and an hour of ckraid and fsck :-)"

Meanwhile, Steven N. Hirsch confirmed the problem, saying, "I've been plagued with the same messages ever since the early days of knfsd, and have played with the system shutdown sequence endlessly in search of a solution. It doesn't happen all the time, and seems related to something that tries to hold an open file handle to a remote system during shutdown. My sysV sequence most certainly does not try to shut down the network prior to the nfs client subsystem - these messages are triggered _after_ nfs is downed. Something is trying to hang on for dear life, I guess.." and Matthew G. Marsh put his 2 cents in, with, "As a further data point - I get these whenever I forget that my X is running and also when I use nohup on an NFS mount as the nohup.out file is held open. Seems harmless - eventually the RPC times out and life continues on. But I can still see the NFS session being held open on the server (Netware NFS) for another few minutes until the server decides I really am gone."

13. Sound Card Preferences For Linux

8�Mar�1999�-�15�Mar�1999 (12 posts) Archive Link: "CMI8330 and SIS6326 [maybe offtopic]"

Topics: PCI, Sound

People: Ernest JW ter Kuile,�Alan Cox,�Conde Martinez Rodolfo,�Dan Hollis

Catalin Boie asked how the CMI8330/C3D and SIS6326 sound cards work under Linux.

Ernest JW ter Kuile said of the SIS6326, "It does need tweeking in the XF86Config file, and you will need the lastest (>=3.3.3.1) version of the xserver. I had some problems using more than 8 bit colors, 16 bit and 24 bits didn't work, but 32 bits did. and a few of the options (don't remember which) had to be on before X even tryed to start."

Alan Cox also replied to Catalin with, "they are low priced and relatively low performing parts. The CMI8330 for example isnt really comparable to any of the PCI cards in output quality. It does however do a nice sb and ad1848 emulation." He added, "similarly the SiS 6326 is hardly the worlds top end graphic card but it works."

Meanwhile, Conde Martinez Rodolfo interjected, "I had problems with the cmi8330, it works with the sb drivers, but only with sb 8, when i used snes9x and played games suddenly the game started to make a horrible noise......no more music, had to restart the emulator but the problem would pop up again sooner or later, this is because games usually require 16 bit output sound....." He added, "The best solution i think there is with the cmi8330 is to use it with the MSS/WSS drivers......"

Someone said that the CMI8330 and its cousin the CMI8338 both have the virtue of providing a cheap < 10ukp SPDIF input card, but Dan Hollis replied, "CMI8330 spdif is broken for sure. Reportedly fixed in the 8338 but then noones shipping 8338 cards yet. The only ones so far seem to be reference cards direct from cmedia." He went on, "its also always a hack job since 8330/8338 cards usually dont come with SPDIF connectors," and added, "also from what I understand the output is stucked at 44khz so DATheads might have a problem with that."

14. Mounting Loopback Issues

8�Mar�1999 (6 posts) Archive Link: "loopback devices are not freeed on umount"

Topics: FS

People: Andi Kleen,�Matthew Wilcox,�Andries Brouwer

Hans de Goede noticed that mounting a floppy image using loopback, and then umounting it, would leave the loopback module usage count at 1 in /proc/modules. This would block him from mounting any other images through /dev/loop0. Andi Kleen summed up the answer for him with, "mount does an implicit losetup, but umount does not do an implicit losetup (because that could be inconvenient e.g. with encrypted filesystems). So you have to remove the loop device afterwards with losetup -d."

Matthew Wilcox gave a similar answer, and asked Andries Brouwer if mtab could be extended "to have a "loop" option in the options field so that unmount can know to delete it if it was created by mount?" to which Andries replied, "Surprise! That is precisely how things work today."

15. Comments On Dual Pentium Systems

8�Mar�1999�-�9�Mar�1999 (5 posts) Archive Link: "dual pentium"

Topics: SMP

People: Vladimir Dergachev,�Alan Cox,�Keith Bennett,�Alec Smith

Keith Bennett asked how dual pentiums did under Linux. Alan Cox said dual pentiums were junk. Alec Smith disagreed, and said Linux support could only improve. Vladimir Dergachev replied, "This argument has a drawback that when things do improve your board may not actually be able to use them. And the same board will cost a lot less at the time you'll be able to use the improvements." But he acknowledged, "On the other hand two processors are a good thing to have - and a lot of people are using dual cpu systems right now."

16. Hardware Vendors Reluctant To Release Specs

8�Mar�1999�-�10�Mar�1999 (5 posts) Archive Link: "[PATCH] linux-222.dj1"

Topics: PCI

People: Dave Jones,�Alan Cox,�Thomas Molina

Dave Jones announced his latest omnibus patch (http://www.comp.glam.ac.uk/students/djones2/computers/kernel/patches.html) against 2.2.2ac7. One of his fixes was, "Added support for DLink CT528 (http://www.dlink.com) This ugly is a clone of a RealTek 8029, which is a clone of an NE2k. Most bizarre. Even more bizarre, they used almost the same PCI info, so adding code to read the subsystem was necessary."

Thomas Molina said he'd used one of those cards without a problem, and asked what Dave's fix involved. Dave replied, "It's recognised in existing kernels as a RealTek 8029. Which effectively it is. The people at DLink wouldn't tell me any technical information. Their downloadable pdf's are little more than product placement. I asked if it was the same as a RealTek 8029, which got a "No". When I asked what was different they didn't answer." He added, "My code just reads the subsystem vendor to determine between a genuine RTL8029 and a clone (such as the CT528)."

Alan Cox put in, "This is quite common. Companies who used to fab their own innovative boards find it quite embarrasing to admit they are brand naming standard taiwanese chips nowdays."

17. Inconsistant File Size Listing

8�Mar�1999 (7 posts) Archive Link: "Wrong file size, is it dangerous?"

Topics: FS

People: Zygo Blaxell,�Richard B. Johnson

Ricardo Galli Granada was worried about finding a 3.2 GB mailbox file on his system, when df only reported 2 GB used. Richard B. Johnson didn't think it was anything to worry about, since the file attributes showed it to be a normal file. He figured it was a sparse file, which can show up as being really big, and recommended just deleting it. Ricardo confirmed this later and thanked everyone for their help.

Although Richard turned out to be right, Zygo Blaxell (and others) had a more cautious approach. Zygo assumed filesystem corruption, and said, "reboot and fsck should fix that sort of problem, assuming there's nothing wrong with the physical disk where that inode is. You will probably lose the mailbox file though." He went on, "if you just delete it, e2fs may notice that the inode doesn't make sense and panic or remount the filesystem read-only. Either way, the mounted filesystem will quickly become useless anyway until you umount/fsck/mount it again."

18. Makefile Bug Discovery And Fix

8�Mar�1999�-�11�Mar�1999 (5 posts) Archive Link: "2.2.3-pre3 - 'Make modules_install' doesn't."

Topics: Debugging

People: Zygo Blaxell

Zygo Blaxell found that 'make modules; make modules_install' was failing to install a lot of modules, roughly since the mid 1.3.x days. His work-around was just to copy the binaries by hand, but he had trouble explaining this to folks interested in Linux.

There were a few replies, but then he apparently found the trouble himself: 'ls' was defined as a function on his system, which was messing up 'make modules_install'! He submitted a short patch to remove that dependency, and the thread was over.

19. TCP Bugfix

9�Mar�1999�-�10�Mar�1999 (5 posts) Archive Link: "TCP FIN-fragment failure in 2.2.x"

Topics: Networking

People: Greg Beeley,�David S. Miller,�Andrea Arcangeli

Greg Beeley solved an old TCP problem for himself, and posted a patch which he hoped more experienced folks could improve on. Apparently, "when a TCP connection is exhibiting backpressure from the receiving side of a long data transfer, and then the sender closes the connection, a FIN is added to the last packet on the send queue. Now, say, that the receiving end is a serial terminal server, and so the window on the receiver side opens up very slowly. Now, say that one packet is left to be transmitted and it is, for instance 500 bytes of data. The receiver then opens up its zero window after some time to perhaps 100 bytes. Linux then splits the packet into a 100 byte and 400 byte packet, and sends the 100 byte one to fill up the window. Unfortunately, the TCP code was including the FIN's sequence number in the count when computing the end_seq value for the first packet, so when the ACK comes back, it was one sequence number short of ACKing the packet, so Linux re-sends the packet over and over, with the receiving end re-ACKing the packet repeatedly." His solution was to change "the first packet (skb) end sequence number so that it does not include the FIN's sequence number, which of course originally it did." He added, "That corrects this particular problem, but I'm not sure if there are any other implications."

At first Andrea Arcangeli was skeptical, but a couple posts later he saw what Greg was talking about, and posted an improved two-line patch, to which David S. Miller replied, "Nice work, this is certainly it. I'll test this a bit and send it off to Linus..."

They must have had some private email since Dave's rejection (#7) of a previous patch from Andreas.

20. Real Time And MP3 Skipping

9�Mar�1999�-�19�Mar�1999 (19 posts) Archive Link: "MP3 skippety skip skipageness"

Topics: Real-Time

People: Mike A. Harris

Mike A. Harris was upset about mp3s skipping. There was a bit of discussion about this, the main consensus being that without real-time in the kernel, there will always be this problem.

21. Dealing With Spam On linux-kernel

10�Mar�1999�-�12�Mar�1999 (5 posts) Archive Link: "That spam, and ways to block it (Amazing Breakthrough)"

Topics: Mailing List Administration, Spam

People: Aaron T Porter,�David Monniaux,�Matti Aarnio

Matti Aarnio is all set to implement some powerful anti-spam stuff for linux-kernel (basically just blocking all mail that's sent directly from a dialup machine, but he warned the list that it would also block legitimate posters who don't go through their ISP's smtp relay.

Not surprisingly, there was some dissent. Aaron T Porter said, "I would sooner make the list members-only for posting than shutdown all dialup SMTP access. That definately crosses the line when considering cost to users vs. rewards for the list," and David Monniaux added, "It is IMHO really a bad idea not to allow posting from unsubscribed addresses. Many people read linux-kernel on a mailing-list to newsgroup gateway."

Unpleasant as it may sound, this looks like the wave of the future all over the internet, not just for linux-kernel. And after the recent mail-bombing of linux-kernel by an unknown assailant, some kind of unilateral protection will probably be implemented.

22. Hitachi ATA Flash Memory Card Problem In 2.2.x

10�Mar�1999�-�12�Mar�1999 (10 posts) Archive Link: "Problems with ATA FLASH MEMORY"

Topics: Disks: IDE, FS, Flash RAM

People: Joao Marigonda,�Alan Cox,�David Hinds

Joao Marigonda wrote:

Im trying to install linux in an ATA Flash Memory Card from Hitachi. In DOS mode, this card works like an IDE HDD but the same doesnt happens with linux. While DOS fdisk works fine, linux fdisk returns the following errors when trying to write the partition table:

Re-reading the partition table
hda hdb ..

hda:drive_cmd:status=0x51 {DriveReadt SeekComplete Error}
hda:drive_cmd:error=0x04 {DriveStatusError}

The partition table is writen but mkswap returns the same errors!

Alan Cox replied, "So does mine. The error return is nonsensical too. As far as I can guess the hitachi cards dont have a 512 byte block size, but I've yet to manage to find out."

David Hinds added, "I'm pretty sure that these messages are harmless. They are generated when the IDE driver tries to lock the drive door on a "removable" drive that doesn't support this command. Depending on kernel version, the messages may be generated each time the device is opened and closed. I thought this was fixed in a 2.1 kernel."

23. dquota Fixes

11�Mar�1999�-�18�Mar�1999 (4 posts) Archive Link: "[patch] fix for major securty problem in dquota code"

Topics: FS: NFS, SMP

People: Andrea Arcangeli,�Jes Sorensen,�Simon Kirby

Andrea Arcangeli said Andrea Borgia found that dquota could be overridden by the nfs-server on 2.2.x kernels. He posted a patch, and said, "The problem is that dquot think that the initiator of the operation is the euid of nfsd (and nfsd run as root) and not the euid of the student that is using the NFS client (stored instead in the fsuid field)."

Jes Sorensen replied, "Andrea, if you got your head down in the dquota code (and have an idea about what is going on there) you might want to look at the fact that it is not SMP safe. I have a report from a guy who had it blow up on him like 1-2 times/day when running on SMP, whereas it seems to be fine on UP," but Simon Kirby came in with, "Alan did something in 2.2.2-ac7 that fixed this for us (we got our first SMP machine that had to use quotas a week or so ago and I put on 2.2.2-ac1...it blew up as soon as I turned on quotas and typed "sync"). Upgrading to ac7 on that machine has fixed/avoided the problem from what I can see (it's running fine for a few days now :))."

24. 3dfx Driver Exploit

11�Mar�1999�-�12�Mar�1999 (10 posts) Archive Link: "3dfx - a security hazard?"

Topics: FS, Ioctls, PCI

People: Tigran Aivazian,�Nathan Hand,�Brian Gerst,�Alan Cox,�Krzysztof G. Baranowski,�Daryll Strauss

Tigran Aivazian asked, "I am probably missing something, but access to both /dev/3dfx and (e.g.) /dev/mem is controlled by filesystem permission rules. Writing garbage to either will crash the entire machine. Why is one a security hazard and another is not?"

Nathan Hand replied, "The 3dfx can lock the pci bus if fed garbage. The /dev/3dfx driver doesn't stop garbage being fed in, and so has no benefit over suid glide libs which tickle the hardware directly." He added, "The proper way is to have an opengl interface which prevents nasty garbage being sent to the 3dfx card. This has numerous other neato benefits too, like hardware independence for 3d developers, and an already well documented API that ties in nicely with X11 if you're looking for that sort of thing."

Brian Gerst came in with, "The only real secure way of handling such devices are with full kernel drivers or at least kernel drivers that do proper sanity checking. However, since 3Dfx won't release anything but the binary-only Glide libs, we've got only the two choices. Both ways of using Glide introduce security risks. Suid-root however has a larger set of security risks that are more easily exploitable." He added, "OpenGL is truly the way to go, but the X server shouldn't have to get involved. It just adds more overhead to a usually speed demanding operation. OpenGL vs. Glide hasn't been much of an issue up until now since 3Dfx was the only hardware that had _any_ support under Linux. Now that the Matrox G200 specs are out there will finally be more supported cards and OpenGL will gain acceptance. However, until 3Dfx releases info, you still need Glide, even to use Mesa, which means user-space access to the hardware."

Meanwhile, Krzysztof G. Baranowski explained that any user with access to the device could write to arbitrary PCI config space of the card; but Alan Cox corrected him with, "It isnt just PCI config space . At least it doesnt appear to be. A simple loop dumping /vmlinuz to the mmio space on a 3dfx card appears to crash the card, the bus and the PC," and added, "If it was just PCI configuration space it would be relatively easy to fix - you make PCI config setup root only and provide a "set3dfx" tool." But Krzysztof corrected in turn, "No, it wouldn't. It's worse than you think. It's not just PCI config setup problem. The doQuery() ioctl is called many times from within Glide. For example, when running simple 'test3Dfx' program (which just initializes the board, displays 3Dfx logo and exit), doQuery() is called 25 times or more," and added, "To make the driver safe, one would have to redesign the whole stuff or... chg gur Tyvqr vagb gur xreary <very evil grin>."

Daryll Strauss started a new thread with the Subject: "3Dfx security issues (http://www.kernelnotes.org/lnxlists/linux-kernel/lk_9903_02/msg00801.html) ", in which he said:

Yes, the device driver for the 3Dfx hardware makes it easily possible to crash the machine. As you noticed, there are a couple of big problems. First, is that it mucks in PCI space. Second, if you send random cruft to the MMIO space you can crash the board. This is why I have made no effort to put this into standard kernels.

So, why did I create it and is it any better than setuid? I figured it was better than setuid, because the applications getting linked against Mesa had no security built into them. Quake would read arbitrary files, and it was running setuid to speak to the 3Dfx hardware. So, this limited the problem to bad programs crashing the machine as opposed to reading and writing arbitrary files.

Second, the crash problem seems to difficult to work around. For example, the original Voodoo Graphics required that you "SNAP" vertices. That means subpixel values had to be a multiple of 1/16. If you passed unsnapped vertices to the Voodoo Graphics you hung your hardware. Glide didn't check for snapping because the performance hit was too large. Having /dev/3dfx check for valid data would require all of Glide to be in the device. This would cause a big performance drop, would bloat the kernel, and can't be done for intellectual property reasons.

It seems that any of the kernel graphics driver projects are moving to similar approachs and will have similar problems. The question is how can this be handled better within the existing contraints?

There followed a small staircase between him and Alan Cox. Alan replied:

It tends to come down to "was the board designed by someone who felt it shouldn't trash your PC". SGI get away with this because their MMIO stuff isnt capable of trashing the machine. I'm hoping Matrox got this right as XFree86 + the PI direct render work + Mesa is going to be really nice if they did.

As regards the 3DFx doodoo driver, I would personally say the situation the driver creates is potentially worse but not if set up carefully. It goes from "root can install something stupid" to "any user can trash the box" in the naive setup. But with your driver and group only access it becomes setgid voodoo and thats yes - definitely a win

Daryll replied, "Yes, if the person installing the device didn't read the README, they could be worse off. I would hope that root installing a new device would be at least as careful as installing a program setuid. The README warned the user that you could crash the box with the device and to restrict access appropriately. That's why I would never want it in the standard kernel, as it would be too easy to just throw it in without understanding the issues," and to Alan's first comment, added, "I think you haven't worked on an SGIs enough. We've seen many applications crash the machine. We've got one at DD currently. A gamma application, when you set the gamma to 0, your Onyx will crash. I suspect SGI gets away with it, because no one has tried writing random data to their boards. :-)" Alan's reply to that last was, "I've been writing random data at the indy without problems."

25. Unfinished Bug Hunt

12�Mar�1999 (5 posts) Archive Link: "ide_set_handler: timer already active"

Topics: Debugging, Disks: IDE

People: Geert Uytterhoeven,�Mark Lord,�Gadi Oxman

Geert Uytterhoeven said, "Just wondering: anyone else on a non-m68k platform who ever saw the message 'ide_set_handler: timer already active'? This is related the thing that made my box crash under heavy IDE activity for more than a year, and I just can't believe this bug (_if_ it happens on ia32) was never seen on ia32."

He added that he's been seeing the message several times a day.

Mark Lord replied, "I have *never* seen this message on Intel-Architecture-32bit (ia32). And my systems have a *lot* of IDE devices/activity." He added, "The message is is only there "temporarily" anyway, just to check for bugs seen (but not understood) on other architecture(s)."

Gadi Oxman said, "This error is very strange -- del_timer() is performed on all entry points to ide_do_request(). Geert, can you log each add_timer() and del_timer() performed by the IDE driver, and verify that indeed two add_timer() were called in a raw? We can also try to force an oops at that place and look at the stack trace for some hints."

End OF Thread. They must have taken it to private email or something.

26. Advansys ABP940UW Shaky In 2.2.x

12�Mar�1999�-�13�Mar�1999 (3 posts) Archive Link: "2.2.3-ac1 (and other 2.2.x) advansys doesn't work"

People: Rik van Riel

Scott James Remnant's Advansys ABP940UW card was giving him trouble after he upgraded from 2.0.36 to 2.2: Giving a "modprobe advansys" would find the first drive and then lock up the console. Rik van Riel confirmed the problem, but added that under 2.2.2 and 2.2.3 the situation seemed to have cleared up (though he could see the card breaking again in the future).

27. Writing Drivers In An Open Source World

12�Mar�1999 (3 posts) Archive Link: "frames for drivers (net(-protocol stack)/scsi/sound/etc. modules etc.)"

Topics: I2O

People: Alan Cox,�Folkert van Heusden

Folkert van Heusden confessed to laziness, and asked if there were any driver templates, so folks like him could write drivers without getting too diverted from hardware issues.

Olivier Tharan pointed him to /usr/src/linux/drivers/net/skeleton.c, and Alan Cox said, "The general approach with a GPL driver is to cut bits out of other drivers and put them together with your hardware code. The i2o scsi driver for example I did by cutting up the symbios 53c416 driver and then adding pieces from the qlogicfc driver. Very little of it is "new" code."

28. Storing Kernel .config Information In The Kernel

12�Mar�1999�-�13�Mar�1999 (6 posts) Archive Link: "[Patch] support for /proc/.config (against 2.2.3)"

People: Tigran Aivazian

This has always been a very controversial issue, but Tigran Aivazian said calmly, "Many people (on various wishlists) requested this feature so I implemented it." This issue was covered briefly in Kernel Traffic Issue�#2, Section�#3� (14�Jan�1999:�Tracking Kernel Patches And Configurations) . The main feature of Tigran's patch is: "correct size of /proc/.config file and binary identical to /usr/src/linux/.config:"

The primary reaction to it this time around was, "good idea, it won't get in the main tree." It also managed not to balloon up into the usual flame war.

29. Direction Of Stack Growth Under Linux

12�Mar�1999�-�15�Mar�1999 (9 posts) Archive Link: "which stack direction?"

People: Larry McVoy,�Oliver Xymoron

Larry McVoy said, "I know of a new high end machine, under gentleman's NDA, which might be running Linux some day. The compiler people for this machine want to know if the Linux folks care which way the stack grows."

There was a bit of discussion, and Oliver Xymoron summed up with, "it just doesn't matter on a 64 bit system. Current 64-bit systems only have 40-some bits of real address space. There is zero danger of running out of virtual address space. By the time we're in danger of exhausting 64 bits of physical memory we will have all long since moved to 128 bit systems. Put everything wherever you want. Force all code addresses to have zero bytes to discourage stack exploits. Have the stack grow up and give it a nice address prefix so it's easy to read. Use one stack for addresses, another for variables. Give the heap its own address prefix too. Whatever's easiest to read."

30. Sleeping While Holding A Spinlock

14�Mar�1999�-�18�Mar�1999 (7 posts) Archive Link: "sleeping while holding a rwspinlock?"

People: Ingo Molnar,�Tigran Aivazian

Tigran Aivazian asked if it was possible to sleep while having read-locked rwspinlock. Ingo Molnar replied, "The Linux rule is that you must not sleep while holding any spinlock. spinlocks are 'light and short operations' (in theory), and the Linux locking architecture relies on us not scheduling during held locks. (we'd get into deadlocks easily if we allowed this, semaphores should be used in places where complex locking has to be done)"

31. ext2 Under Windows NT

14�Mar�1999�-�15�Mar�1999 (8 posts) Archive Link: "Q: ext2fs on other OS's"

Topics: FS: ext2, Microsoft

People: Bo Branten,�Theodore Y. Ts'o,�Michal Jaegermann

Bo Branten said, "AFAIK there exists at least two proprietary ext2fs drivers for Windows NT. I would like to ask if anyone here know about any open source project going on to produce such a driver?" Theodore Y. Ts'o replied, "Part of the problem is that in order to build an installable filesystem for Windows NT, you need to get the Microsoft IFS SDK, which is horribly expensive (approx USD $1000). So an Open Source project is pretty hard; you wouldn't be able to recompile from sources without paying $1000 to Microsoft." The next day, Bo said joyfully, "I am happy to inform you that there are an option, GNU ntifs.h is available from: http://www.acc.umu.se/~bosse/ntifs.h," and Michal Jaegermann suggested, "Are you guys talkig about something like http://uranus.it.swin.edu.au/~jn/linux/Explore2fs.htm by any chance?"

32. fs/proc/array.c Fixes

14�Mar�1999�-�15�Mar�1999 (9 posts) Archive Link: "[patch] bugfix for fs/proc/array.c FIXME tasklist_lock"

People: Tigran Aivazian,�Jeremy Fitzhardinge

Tigran Aivazian posted a patch and explained:

I went through the FIXMEs in fs/proc/array.c and fixed them. However, I did not fix the one in read_maps() for two reasons:

a) I do not know how to do it most efficiently.

b) I am really unsatisfied with read_maps() function and your idea (Bruno's implementation) of handling >1page buffers, but, at the moment, I don't have any better replacement for it (I will try and think hard if it can be done better).

I think read_maps() violates the ancient tradition of UNIX - "a regular file is just a stream of bytes". I tried to read(2) from /proc/<pid>/maps 49 bytes at a time and I get sometimes 49 and sometimes 3 bytes - i.e. a record structure is imposed, where, imho, it should NOT have been.

So, I left read_maps() alone as is.

Jeremy Fitzhardinge replied, "The ancient tradition of Unix is that read() is allowed to return as many bytes as it wants (0 < bytes read <= bytes asked for). Mostly it can return the exact number you ask for when you read files from disk, but sometimes it doesn't. Programs just have to deal with it." He added, "Note that /proc/<pid>/maps *does* have a record-oriented structure, and that's just the nature of the beast."

Tigran agreed that the current system was not broken, but he still felt it could be improved (though he didn't know how to do it himself).

Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.