Kernel Traffic #138 For 22 Oct 2001

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 2013 posts in 8538K.

There were 651 different contributors. 326 posted more than once. 187 posted last week too.

The top posters of the week were:

 

1. Status Of linmodem Support; New User-Space /dev File Callback Handler
28 Sep 2001 - 15 Oct 2001 (16 posts) Archive Link: "[ANNOUNCE] FUSD v1.00: Framework for User-Space Devices"
Topics: Modems, USB
People: Jeremy ElsonPavel MachekJeff GarzikTim JansenRogier WolffEric W. Biederman

Jeremy Elson of Sensoria Corporation unwittingly solved a large piece of the WinModem problem under Linux. He announced FUSD version 1.00, explaining, "FUSD lets you write user-space daemons that can respond to device-file callbacks on files in /dev. These device files look and act just like any other device file from the point of view of a process trying to use them. When the FUSD kernel module receives a file callback on a device being managed from user-space, it marshals the arguments into a message (including data copied from the caller, if necessary), blocks the caller, and sends the message to the daemon managing the device. When the daemon generates a reply, the process happens in reverse, and the caller is unblocked." He gave a link to the FUSD homepage (http://www.circlemud.org/~jelson/software/fusd) .

After some bug reports that showed FUSD had been prematurely released as 1.00 (possibly due to its corporate sponsorship), Pavel Machek suggested forwarding the announcement to the linmodem mailing list (http://www.linmodems.org/) , adding, "Killing all those binary-only modem drivers from kernel modules would be good thing... Hmm, and maybe we can just hack telephony API over ltmodem and be done with that. That would be good." Jeremy asked, "Perhaps I don't understand how linmodems work to understand well enough how FUSD would apply - do you talk to linmodems through the serial driver? If so, sounds like a good application - but we might still have the same problem with binary-only drivers as the user-to-kernel message format used by FUSD may change over time. (Indeed, it's already changing relative to v1.0 in response to some of the mail I've gotten in the past few days.)" Pavel confirmed, regarding whether linmodems used the serial port, "Yep. And linmodem driver does signal processing, so it is big and ugly. And up till now, it had to be in kernel. With your patches, such drivers could be userspace (where they belong!)." But he (and others) were not happy to hear that the interface would be changing over time. Jeremy Explained:

FUSD's user-kernel interface won't change spuriously, but it sometimes will need to change as features are added. Some such changes are already in the works.

The fact that FUSD provides a semi-stable binary interface for servicing device-file callbacks isn't really FUSD's design goal as much as it is an accidental side effect. Making a stable binary interface for kernel device drivers is the objective of, say, UDI (I think). The purpose of FUSD is just to be able to proxy the callbacks to userspace.

Elsewhere, Eric W. Biederman asked for a fuller explanation of how FUSD could be useful to linmodems, and Jeff Garzik replied:

My best guess for a Linux winmodem solution for Linux is three pieces: The existing Lucent (and other) hardware work (by Pavel/Richard/Jamie/others?) Rogier Wolff's user space serial driver code, and A work called "modem" by a now-deceased scientist at SGI(IIRC). Alan pointed me to the last piece. 'modem' handles up to 14.4k speed, and supports some error correcting protocols we all remember from the BBS days.

Just need someone to glue those pieces together... and you have a winmodem driver with the proper portions in userspace, and the proper portions in kernel space.

Pavel added, "One of students here was/is working on the glue; it is non-trivial as 'modem' is obfuscated with coroutines." And Tim Jansen gave a link to the (somewhat stale) 'modem' page (http://perso.enst.fr/~bellard/linmodem.html) . He also added, "This is also important for USB modems. As Intel requests PC vendors to stop including serial ports in 2002 and linux-compatible USB modems are quite hard to find it will be really difficult to get an external modem for new computers. Almost every new USB modem uses either the ST7554 or the Connexant HCF chipset, and at least the ST7554 is controllerless."

 

2. Identifying Kernels Linked With Undebuggable Code
10 Oct 2001 - 11 Oct 2001 (37 posts) Archive Link: "Tainted Modules Help Notices"
Topics: BSD, Compression, Legal Issues, Networking, Patents
People: David WoodhouseKeith OwensAlan CoxAndreas DilgerAndreas FerberPekka PietikeinenAlexander ViroHenning P. Schmiedehausen

Morgan Collins noticed that his kernel was being marked as 'tainted' with non-free modules, because the PPP compression module carried the BSD license. Alan Cox and David Woodhouse said this was a bug. As David put it, "Any code which is distributed as part of the kernel source tree has a sane, if not 100% compatible, licence and shouldn't taint your kernel." But Keith Owens put in, "Any license not listed in include/linux/module.h is not GPL compatible." He gave this list (from 2.4.11) as:

The BSD license on its own was not part of the list, so (he implied) the PPP compression should mark the kernel tainted. David felt it was silly not to include the BSD-NAC (no-advertisement-clause), and Alexander Viro also felt the LGPL should also be included. Keith said the current list had been created by Alan Cox, and Alan also said:

If you hold a patent on the BSD code you can't GPL it nor is it GPL compatible.

The problem we have is that "BSD without advertisment" can be claimed by almost any binary only module whose author doesnt include source or let it out fo their company ever

This didn't make sense to David, who pointed out that a patent could also restrict distribution of GPLed code. He added, "Either way, I didn't think that a political stance against patents was the point of the kernel tainting code - I thought it was about maintainability." He went on:

But if we're not going to allow BSD-licensed modules to be loaded without tainting the kernel, we shouldn't mark any of the code distributed with the kernel as BSD-licensed - we should make it all "Dual BSD/GPL" instead.

It might also be useful to have a 'Dual GPL/Other' option, for covering the other randomly dual-licensed code (like JFFS2).

An unidentified person suggested that modules could simply lie about their licensing and bypass these safeguards altogether, but Alan replied, "under the DMCA thats probably a criminal offence with five years in jail. The truth however is that if you want to lie about licensing or run a modutils that doesn't do it nobody stops you. Its there primarily to deal with bug filtering from people who don't know better. Folks who know enough to subvert the mechanism generally also know better than to post Nvdriver bugs to l/k." Andreas Dilger pointed out that all anyone had to do was edit their ksymoops output to not display the "tainted" flag, and added, "I don't think we need to be mucking with "GPL vs. BSD" or anything, but rather "source available or not" as the criterion for a tainted module. Heaven forbid that using some driver currently in the kernel sources marks your kernel as tainted, it would make the whole thing useless."

At one point Andreas Ferber suggested, "What about simply adding "BSD (included in kernel)" as a possible "untainted" MODULE_LICENSE()?" Alan liked the idea, and Henning P. Schmiedehausen suggested "BSD (included in kernel source)" as showing source availability. Pekka Pietikeinen added, "Or even something like "BSD (unmodified source freely available)", which would cover 3rd party drivers as well." Close by, someone suggested simply including a flag to indicate source availability, but Alan said, "Available under what terms, NDA'd, subject to unacceptable other rules etc.. Its not as simple as it looks."

The discussion petered out around there, with no firm decisions either way.

 

3. Comparing The Two Virtual Memory Subsystems
10 Oct 2001 - 14 Oct 2001 (12 posts) Archive Link: "[CFT][PATCH] smoother VM for -ac"
Topics: Virtual Memory
People: Rik van RielAndrea ArcangeliJohn L. MalesLorenzo Allegrucci

Rik van Riel announced:

over the last week I've created a small patch which seems to drastically improve VM performance and interactivity for 2.4.10-ac{9,10}. Initial test results mostly seem to suggest that the system runs lots smoother for desktop use and doesn't get into thrashing until the working set _really_ exceeds the size of RAM.

People have already asked to have this patch integrated into the -ac kernel, but it would be nice to have a few more test results from this combined eatcache + stophog patch before having it integrated ...

The patch implements the following things:

  1. bypass page aging entirely for unused objects in the cache
  2. increase the distance between inactive_shortage and inactive_plenty, so kswapd should spend less time shuffling random pages around ... shouldn't make a difference for most loads, but should add some robustness in worst cases
  3. does page aging _before_ the zone_inactive_plenty() test, so old referenced bits get cleared [not a big cpu eater, since the code won't run unless we have a free or inactive shortage somewhere]
  4. in page_alloc.c, the "slowdown" reschedule has been made stronger by turning it into a try_to_free_pages(), under memory load, this results in allocators calling try_to_free_pages() when the amount of work to be done isn't too bad yet and pretty much guarantees them they'll get to do their allocation immediately afterwards ... statistics make sure that the memory hogs are slowed down much more than well-behaved programs

Please test this patch and tell Alan and me how it works for you and whether there are loads where the system performs worse with this patch than without...

Several folks were interested in this, and at one point Andrea Arcangeli said, "Later, if you've some time to test, I'd also be very interested in a comparison with 2.4.12aa1." A little later, John L. Males said:

I had a had a chance to do some testing with the unofficial SuSE 2.4.12 Kernel that I believe is based on your 2.4.12aa1.

I used:

ftp://ftp.suse.com/pub/people/mantel/next/RPM/k_i386-2.4.12-0.i386.rpm

I have an AMD K6-2-500 and find that the Pentium or "default" SuSE kernels hang using the AMD K6-2. I was unable to compile the kernel from the souce using:

ftp://ftp.suse.com/pub/people/mantel/next/linux-2.4.12.SuSE-0.tar.bz2

due some unknow technical problems. One was "make xconfig" would not build, and using "make oldconfig" and the usual make commands to build a kernel caused a error with a module a ways down. The kernel image compiled fine.

Ok, enough of those side issues. The meat of the testing results is that the k_i386-2.4.12-0.i386.rpm kernel seems to fair very well with the testing I have done. The tests take about 20 minutes to complete with the k_i386-2.4.12-0.i386.rpm kernel. The test mix is a bit interesting, so I will only suggest it might be nice to shorten the lapsed time of the test, but may not be possible due to I/O being the bottleneck.

With respect to comparing the same test but using the 2.4.10-ac12 kernel that appears to have the both of Rik van Riel's patches:

http://lwn.net/2001/1011/a/cache-reclaim.php3

http://lwn.net/2001/1011/a/smooth.php3

The results were not great. The "exact" same test takes a little over 3 hours to complete.

The part of the test that seems to cause problems with the 2.4.10-ac12 involves a 300MB working set. This 300MB working set was on top of a basic 60MB (combined System, shared, cache and buffer) after the initial system start up. The system used for testing has 256MB RAM and 256MB Swap file. It seems the 2.4.10-ac12 ends up with extra memory (overhead??) allocated during this part of the test to basically have next to nil shared+cache+buffer whilst having both RAM and the cache full to the brim. If that is not enough the 2.4.10-ac12 seems to vary at times back and forth +-25MB while the working set is still trying to be processed by the kernel. Along the way the the 2.4.10-ac12 kernel also tends to kill or cause a signal 9 to some of the working set applications, but despite this the system seems to churn on this 300MB working set for just about 3 hours (other part of test brings total to just over 3 hours).

By comparison the k_i386-2.4.12-0.i386.rpm test during this same 300MB workig set showed little extra overhead. Hence 300 + 60 did cause RAM to fill up and next to nil of share+cache+buffer, but the swap file too the balance in a more expected manner, i.e about 120 MB into the swap file. Hence the swap file was never pressured to it full limit as would be expected.

In terms of workstation responsiveness, it was not great with the k_i386-2.4.12-0.i386.rpm kernel, but was extremely, extremely to ignoring workstation activity or taking in the order of 5+ minutes to respond to simple things like launching qps, or changing directories with kruiser and many big time problems getting into the screen saver to unlock the workstation.

I do need to refine my tests a bit. One thing I am going to do is move the detailed system information I am trying to log during the test to a different physical drive than where the swap files are located. I suspect this should ease the contention that may be ensuing between the system's need to page to the swap file and the need to keep logging the metric information.

I also need to find a way to collect certain metric information during the test. Not being a developer (I am a QA/Testing Person) it may take me a bit of effort to cut and carve what I need from qps and xosview to get the metric information that is lacking for these tests. I suspect I will not be able to look into the cutting and carving of code until next weekend. I may try this test on the 2.4.9-ac18, maybe even the 2.2.19 for a feel if they are greatly different in results to the two kernels tested earlier today.

Lorenzo Allegrucci also ran his own tests:

qsbench results,

Linux-2.4.10-ac9:

lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
seed = 140175100
71.370u 2.560s 3:17.94 37.3% 0+0k 0+0io 11773pf+0w
lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
seed = 140175100
71.760u 3.170s 4:02.93 30.8% 0+0k 0+0io 15487pf+0w
lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
seed = 140175100
71.090u 3.080s 4:07.94 29.9% 0+0k 0+0io 15856pf+0w
kswapd CPU time: 0:23

Linux-2.4.10-ac9 + Rik's smooth patch:

lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
seed = 140175100
71.090u 6.260s 3:21.65 38.3% 0+0k 0+0io 12868pf+0w
lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
seed = 140175100
72.460u 6.030s 3:58.10 32.9% 0+0k 0+0io 14637pf+0w
lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
seed = 140175100
71.630u 7.400s 4:00.86 32.8% 0+0k 0+0io 14894pf+0w
kswapd CPU time: 0:21

 

4. Status Of 2.4, 2.4-ac, and 2.5
10 Oct 2001 - 11 Oct 2001 (20 posts) Archive Link: "2.4.11 oops"
Topics: Security, USB, Virtual Memory
People: Alan CoxRudi SluijtmanDavid S. MillerRussell KingLinus TorvaldsMarco ColomboChristoph RohlandJeff GarzikTim WaughGreg KH

Bob Matthews reported an oops under 2.4.11; Linus Torvalds identified the problem and asked if anyone could come up with a patch. Alan Cox replied, "Ingo did patches for -ac a long time back. I've not submitted them since it simply didnt seem an important matter when prioritising patches. If you want them I can isolate them tomorrow."

In a completely different thread on a different technical topic, under the Subject: [patch] .version, newversion in Makefile (http://www.uwsg.indiana.edu/hypermail/linux/kernel/0110.1/0807.html) , Rudi Sluijtman reported, "Due to a change in the main Makefile the .version file is overwritten by a new empty one since at least 2.4.10-pre12, so the version becomes or remains 1 after each recompile." Russell King replied that there was a patch in the -ac series to fix that, and David S. Miller added, "I've also independantly just sent Linus a patch to fix this. I was not aware of the -ac fix, sorry." Russell replied, "It was sent around 20 September to Alan, Linus and lkml. Alan accepted it, Linus dropped it, and hardly anyone noticed on lkml. ;(" and Alan added (in reply to David), "Maybe he'll notice this time. Russell sent him a fix, I sent him Russells fix and now you've sent him a fix 8)"

That was the end of that thread. Elsewhere, under the Subject: Uhhuh.. 2.4.12 (http://www.uwsg.indiana.edu/hypermail/linux/kernel/0110.1/0932.html) , Linus announced:

2.4.11 had a fix for a symlink DoS attack, but sadly that fix broke the creation of files through a dangling symlink rather badly (it caused the inode to be created in the very same inode as the symlink, with unhappy end results).

Happily nobody uses that particular horror - or _almost_ nobody does. It looks like at least the SuSE installer (yast2) does, which causes a nasty unkillable inode as /dev/mouse if you use yast2 on 2.4.11.

("debugfs -w rootdev" + "rm /dev/mouse" will remove it, although I suspect there are other less drastic methods too if your fsck doesn't seem to notice anything wrong with it. Only one report of this actually happening so far).

So I made a 2.4.12, and renamed away the sorry excuse for a kernel that 2.4.11 was.

final:

  • Greg KH: USB update (fix UHCI timeouts, serial unplug)
  • Christoph Rohland: shmem locking fixes
  • Al Viro: more mount cleanup
  • me: fix bad interaction with link_count handling
  • David Miller: Sparc updates, net cleanup
  • Tim Waugh: parport update
  • Jeff Garzik: net driver updates
  • He replied to himself shortly thereafter after noticing more breakage, adding, "On the other hand, the good news is that I'll open 2.5.x RSN, just because Alan is so much better at maintaining things ;)"

    Marco Colombo asked, "will Alan release 2.4.13 asap with Rik's VM? - (sorry, couldn't resist)" Alan replied, "I think 2.4.13 will be a Linus release."

     

    5. Speeding Up diff Of Kernel Trees
    14 Oct 2001 - 18 Oct 2001 (22 posts) Archive Link: "Making diff(1) of linux kernels faster"
    People: Linus TorvaldsMarcelo TosattiWilly TarreauNick Craig-WoodWojtek PilorzHorst von BrandPaul Gortmaker

    Paul Gortmaker found that patching the diff program to read all files into cache before comparing them, would result in a speedup by a factor of five in comparing kernel trees. With the files already in the cache, his patch took a slight performance hit. Linus Torvalds suggested reading in just one directory at a time, to prevent too much memory use. He added, "I've for a long time thought about adding a "readahead()" system call. There are just too many uses for it, it has come up in many different areas.." He suggested submitting the patch to the diff maintainer, saying, "This change seems small and simple enough that they might accept it, and I'd love to see it. I'll probably do this in my copy anyway, but it would be nicer to not have to patch it specially.."

    Marcelo Tosatti recalled seeing a USENIX 2001 paper on this subject, entitled Design and Implementation of a Predictive File Prefetching Algorithm. He added, "They have a Linux implementation of their complex prediction algo, but I think directory readahead itself makes sense for most stuff."

    Elsewhere, Willy Tarreau described an incantation to get very fast diff results:

    I personnaly use hard links between kernels to make the effective data set smaller, and I'd like to explain here how I proceed since there are often people who seem completely amazed by this method which I learned here on LKML a few years ago :

    # cd /usr/src
    # tar Ixf anydir/linux-2.4.12.tar.bz2
    # cp -dRflp linux linux-2.4.12

    this way, only dir entries are duplicated, so very little overhead

    # (cd linux && bzcat anydir/patch-2.4.13pre1.bz2|patch -Np1)
    # cp -dRflp linux linux-2.4.13pre1

    now, only file affected by the patch are duplicated then, you can work inside linux dir, and construct your patches very quickly since a few files effectively differ from your new tree and old ones.

    Be very careful not to modify a multi-linked file, or it will be damaged in all trees and won't be seen by diff. your editor must unlink before saving.

    I hope it will help someone as it has helped me for a while now. I nearly always have sub-second diffs, even with not-so-much RAM.

    Horst von Brand pointed out that most editors did not unlink before saving. As far as he knew, only jed did this. Nick Craig-Wood pointed out, "emacs does mv file file~ before saving file so the edited file will not be linked byt the backup file will be. You can stop it doing this by setting backup-by-copying-when-linked." Wojtek Pilorz, also apparently using a similar method to Willy, offered:

    To be sure it is not possible to modify original tree files, I do
    chown -R root.root original_tree

    before copying it (via cp -lR) to new one, which will be modified with whatever tools by me, logged in as a regular user. For those having root access to a box this might be a useful way of preventing accidents ... (this of course also assumes sane file permissions)

     

    6. More Discussion Of The VM Changes In 2.4
    15 Oct 2001 - 17 Oct 2001 (34 posts) Archive Link: "VM"
    Topics: Virtual Memory
    People: Linus TorvaldsAlan CoxLuigi GenoniRik van RielAndrea ArcangeliPatrick McFarlandRobert Love

    Patrick McFarland asked why Linus Torvalds had put Andrea Arcangeli's "simple" version of the virtual memory subsystem into the 2.4 kernel, when Rik van Riel's VM code seemed to be kicking ass in Alan Cox's tree. Linus replied, ""complex" != "smart". The benchmarks I've seen says that the simple VM performs better - both in terms of repeatability and in terms of absolute performance. Search this list yourself if you don't believe me." Elsewhere, Alan also replied to Patrick, "I've not reached any final conclusions on the VM - there are things that Rik's VM shows up that look like the VM algorithm is right but it triggers other stuff, and there are a couple of hackish bits left in still. zSmart is often good - especially given how slow disk seeks are. But smart is not always best for any algorithm."

    Elsewhere, Luigi Genoni remarked, "I do not care which VM is simpler, nor which is faster. I loock for predictability, since this is the most important thing on the servers I am administering. Under a special situation I need something maybe less predictable, but smarter to manage a stressed system." Rik replied:

    This is a different approach to the situation. Most of the time in the early 2.4 kernels we were much too busy to stop machines from crashing to care about performance.

    Only in more recent -ac kernels have I actually had time to look at performance and it seems to be relatively easy to get the VM to perform better.

    Andrea seems to have optimised his VM for performance under low to medium loads from the beginning ... but in Linux 2.2 we've seen how impossible it is to tune such a simplistic VM to not fall apart under very high loads, so I won't be going that way ;)

    Robert Love pointed out that Alan's tree was more stable. Patrick asked why he thought so, and Robert said that Alan's code was not modified as extensively as Linus'. Elsewhere, Rik also said:

    Note that Linus hasn't been up to date on my VM since about 2.4.5. And before you blame me for not sending patches, I did send them but Linus didn't apply them for unknown reasons.

    The VM in Alan's kernel pretty much has been the only option for a reliable 2.4 kernel since 2.4.7.

     

    7. Some Difficulty Tracking Non-GPL-Compatible Modules
    16 Oct 2001 - 18 Oct 2001 (20 posts) Archive Link: "GPLONLY kernel symbols???"
    People: Keith OwensBen GreearChristoph Lameter

    Christoph Lameter noticed that 2.4.11 wouldn't load the loop driver, giving an unresolved symbol error with the message, "modules without a GPL compatible license cannot use GPLONLY_ symbols". This made no sense at all to him, since the loop driver came with the kernel and was properly licensed. Keith Owens explained, "If a symbol has been exported with EXPORT_SYMBOL_GPL then it appears as unresolved for modules that do not have a GPL compatible MODULE_LICENCE string. So when a module without a GPL compatible MODULE_LICENCE gets an unresolved symbol, I print that message as a hint to the user. I thought the response was obvious, but looks like I need to expand the hint text even further." Christoph reiterated that the loop driver was GPL-compatible, and Keith replied, "In 2.4.11 loop.c has no MODULE_LICENCE. It will take a while for all modules to be correctly flagged." A few posts down the line, Ben Greear suggested, "Can't you just make it a warning for now and give ppl a few months to clean things up? It strikes me that any code that serves no technical purpose and actively decreases functionality of the kernel is highly suspect. Or maybe even wait till 2.5..."

     

     

     

     

     

     

    We Hope You Enjoy Kernel Traffic
     

    Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License, version 2.0.