Kernel Traffic #132 For 10 Sep 2001

By Zack Brown

linux-kernel FAQ (http://www.tux.org/lkml/) | subscribe to linux-kernel (http://www.tux.org/lkml/#s3-1) | linux-kernel Archives (http://www.uwsg.indiana.edu/hypermail/linux/kernel/index.html) | kernelnotes.org (http://www.kernelnotes.org/) | LxR Kernel Source Browser (http://lxr.linux.no/) | All Kernels (http://www.memalpha.cx/Linux/Kernel/) | Kernel Ports (http://perso.wanadoo.es/xose/linux/linux_ports.html) | Kernel Docs (http://jungla.dit.upm.es/~jmseyas/linux/kernel/hackers-docs.html) | Gary's Encyclopedia: Linux Kernel (http://members.aa.net/~swear/pedia/kernel.html) | #kernelnewbies (http://kernelnewbies.org/)

Table Of Contents

Introduction

The EFF has issued a call for action regarding Dmitry Sklyarov, who faces up to 25 years in prison for violating the DMCA. I urge everyone to participate in the letter-writing campaign (http://www.eff.org/alerts/20010808_eff_sklyarov_alert.html) currently in effect. Please do what you can to help prevent what threatens to become a terrible tragedy. For more information on the case, see http://www.eff.org/IP/DMCA/US_v_Sklyarov/. For more information on the DMCA, see http://www.eff.org/IP/DMCA/. To join the mailing list surrounding this issue, see http://zork.net/mailman/listinfo/free-sklyarov/.

Mailing List Stats For This Week

We looked at 1269 posts in 5775K.

There were 458 different contributors. 214 posted more than once. 234 posted last week too.

The top posters of the week were:

 

1. Filesystem Comparisons
27 Aug 2001 - 30 Aug 2001 (18 posts) Archive Link: "Journal Filesystem Comparison on Netbench"
Topics: Disks: IDE, Disks: SCSI, FS: JFS, FS: ReiserFS, FS: XFS, FS: ext3
People: Andrew TheurerRandy DunlapYves RougyRoberto NibaliAndrew Morton

Andrew Theurer from IBM announced, "I recently starting doing some fs performance comparisons with Netbench and the journal filesystems available in 2.4: Reiserfs, JFS, XFS, and Ext3. I thought some of you may be interested in the results. Below is the README from the http://lse.sourceforge.net. There is a kernprof for each test, and I am working on the lockmeter stuff right now. Let me know if you have any comments." Randy Dunlap replied:

I am doing some similar FS comparisons, but using IOzone (http://www.iozone.org) instead of Netbench.

Some preliminary (mostly raw) data are available at: http://www.osdlab.org/reports/journal_fs/ (updated today).

I am using a Linux 2.4.7 on a 4-way VA Linux system. It has 4 GB of RAM, but I have limited it to 256 MB in accordance with IOzone run rules.

However, I suspect that this causes IOzone to measure disk subsystem or PCI bus performance more than it does FS performance. Any comments on this?

Default configurations for all filesystems were used.

Future:

  • measure operations/second
  • kernel profiling
  • measure CPU utilization for each FS
  • make graphs more readable
  • do some FS comparison graphs

Andrew replied:

You are definitly exceeding what the kernel will cache and writing to disk on some tests. I guess it depends on what is more important to you. I think both are valid things to test, and you may want to try not limiting memory to get just FS performace in memory for large files. However, writing to disk is important, especially for things like bounce-buffer. Did you have himem support in your kernel? If so, did you have a bounce-buffer elimination patch as well?

Does the storage system/controller have a disk cache? What size?

Also, does IOzone default to num procs=num cpus? I didn't see any options in your cmdline for num_procs.

Randy explained, "I'm interested in filesystem performance. I'm not trying to document IDE vs. SCSI vs. FC performance/price tradeoffs, benefits, etc." To Andrew's second paragraph, Randy had some trouble finding the answers. He said, "The FC host controller is a QLogic 2200. It is attached to an IBM FAStT controller/drive array -- one controller with 10 attached drives. I've been looking at the IBM FAStT OS console interface, but I can't see much cache info there. There is one item: cache/processor sizes: 88/40 MB." And to Andrew's final question, Randy replied, "No, IOzone doesn't default to num_processes = num_cpus. That's a command-line option that I didn't use, although I expect to do some testing with that option also."

Elsewhere, Yves Rougy also announced, "I am also doing such comparisons, with IOZone and Bonnie++ The currents results are available at http://www.pingouin.org/linux/fsbench/ More results are to come, especially to see the notail option impact of Reiserfs with iozone and bonnie++."

Elsewhere, Roberto Nibali took a look at Andrew's comparisons. He gave a link to the one of Andrew's pages (http://lse.sourceforge.net/benchmarks/netbench/results/august_2001/filesystems/raid1e/ext3/4p/droppped_packets.txt) and asked, "Why is ext3 the only tested journaling filesystem that showed dropped packets during the test and how do you explain it?" Andrew explained, "Dropped packets are usually a side effect of the interrupt delay option in the e1000 driver. I choose 256 usec delay (default is 64) for all these tests, and usually there is a very small % of dropped packets, which usually shows up as 0.00%, since I only show 1/100's of a percent in that output. The other tests do have dropped packets, and I should change that script to have more significant digits to show that. I'm not sure why ext3 shows more than the others. Does ext3 have any spin locks with interrupts disabled?" And Andrew Morton replied, "No. But raid1 does."

 

2. ext2-to-reiserfs Conversion
29 Aug 2001 - 5 Sep 2001 (5 posts) Archive Link: "ext2 -> reiserfs conversion?"
Topics: FS: ReiserFS, FS: ext2
People: Andreas DilgerHans ReiserRoy Sigurd Karlsbakk

Roy Sigurd Karlsbakk asked if there were any plans to create a utility to convert ext2 filesystems to reiserfs. Andreas Dilger replied:

It is probably more dangerous and difficult than it is worth. Use a backup/restore, that way you also have a backup in case there is a problem with the conversion.

Since you would ALWAYS do a backup before performing such an operation (right????) then doing the restore to the newly formatted reiserfs partition would probably take less time than any kind of conversion would take (and be a LOT more robust, as well as doing a "defrag"), so you are way better off to do it that way.

And Hans Reiser added, "Yes, it was the fear of a long debugging cycle that made me decide that tar over VFS was the most reliable conversion method, and to not attempt to do more. If someone was to write a tar plus resize based script, that might be reliable, and I would be interested to see it."

 

3. 2.4 SMP Register Corruption Under Intel
29 Aug 2001 - 30 Aug 2001 (3 posts) Subject: "[PATCH] 2.4.x i386 SMP interrupts can corrupt registers"
Topics: SMP
People: John ByrneLinus Torvalds

John Byrne posted a patch and reported, "Currently, the SMP interrupt code generated by the macros BUILD_SMP_INTERRUPT and BUILD_SMP_TIMER_INTERRUPT push the positive interrupt vector number on the stack. If the correct signal is pending on the process and %eax happens to have the correct value, do_signal() can be spoofed into adjusting %eax and %eip with almost certainly bad results." Linus Torvalds replied, "Wow. Good catch - that's just incredibly broken, and I wonder how come the SMP interrupt build stuff didn't get the right code copied from BUILD_IRQ.. How the h*ll did you happen to actually notice this?" John replied, "Some combination of blind luck, curiosity, pride, and Obsessive Compulsive Disorder..." He explained, "The bug came into existence in 2.3.14, when the file arch/i386/kernel/irq.h became include/asm-i386/hw_irq.h. The file was moved and changed at the same time, but the bug was missed because the diff would have shown the entire file being deleted in one place and added in another. If the file had been moved first, and then the changes made, the bug almost certainly would have been caught."

 

4. IBM Keeps Specs Private
29 Aug 2001 - 3 Sep 2001 (10 posts) Subject: "lcs ethernet driver source"
Topics: Networking
People: Ulrich WeigandArjan van de VenAlan Cox

Arjan van de Ven was unable to find the sources for the LCS and QETH ethernet drivers for IBM's S390 architecture. Ulrich Weigand of IBM replied, "Sorry, at this point we are not allowed to publish the source code of the lcs and qeth drivers (due to the use of confidential hardware interface specifications). We make those modules available only in binary form on our developerWorks web site." Arjan replied:

I actually believed that IBM took Open Source and the community seriously. Not releasing key parts of the S/390 architecture port make me very disappointed in IBM given the very massive and public claims of support.

It makes all the fuss IBM is making about how great Linux is look, well, fake. I know that large parts of IBM actually do take Open Source seriously but it's regrettable that an influential part of the S/390 division doesn't.

Elsewhere, Alan Cox asked if there were any plans to change that policy, and Ulrich said (speaking for himself and not IBM), "This is not something we (the Linux for S/390 development team) can decide; it's up to the hardware groups that 'own' the LCS / QDIO specifications whether they allow to make these public. As I said, at this point, we are not allowed to open the specs; while it is conceivable that this might change in the future, I'm not aware of any specific plan."

 

5. Hardware Detection Tool Announced
30 Aug 2001 - 1 Sep 2001 (9 posts) Archive Link: "[ANNOUNCE] Hardware detection tool 0.2"
Topics: FS: devfs, Hot-Plugging, USB
People: Carlos E GorgesTim Jansen

Carlos E Gorges announced:

Hardware detection tool 0.2

The main idea is keep a unified database of modules and create a good tool for hardware configurators.

This version supports detection of PCI, ISA PnP and USB (hotplug) devices.

ftp://ftp.techlinux.com.br/pub/people/carlos/kernel/hwd/hwd-0.2.tar.bz2
ftp://ftp.techlinux.com.br/pub/people/carlos/kernel/hwd/hwd-0.2-linux248-ac11.patch.bz2

Tim Jansen replied, "The next version of the device registry patch (http://www.tjansen.de/devreg) will contain a similar feature. In the current release bus drivers (like PCI, USB..) register their devices in the registry and the devices are then displayed in a generic, bus-independent form in the /proc/devreg directory. In the upcoming version those drivers with devreg support register themselves on initialization and also register each driver instance (an instance handles a single physical device) that they create. The instance will then be connected to the device, devfs nodes will be connected to the driver instance and you can get a pretty good graph of the relations between drivers, driver instances, devfs nodes/minor numbers and the physical devices."

 

6. Tracking Non-Free Kernel Modules Loaded At Runtime
30 Aug 2001 - 31 Aug 2001 (9 posts) Archive Link: "Linux 2.4.9-ac5"
People: Keith OwensAlan Cox

Alan Cox announced 2.4.9-ac5, and Keith Owens replied (regarding a patch of his that had gotten into Alan's tree):

__module_license needs to be static. Otherwise we get problems when MODULE_LICENSE() is used in two objects which are linked into the same module. Given the legal requirements for copyright etc., I expect people to put MODULE_LICENSE in every source file, not just one.

What do you need for licence support in modutils? Obviously modinfo needs to print it, but what about insmod? Should insmod issue warning messages for proprietary modules? What about ksymoops? IOW, what was the reason for adding MODULE_LICENSE?

Alan replied:

My goal is to eventually include the info tucked away on oops report lines so that I can automatically dump bug reports with binary drivers, including the growing number of people who lie about nvdriver and think that this will get their bug cured.

insmod warnings is something I want to stay out of. I think thats up to vendors and the like. I want to tell if people loaded crud I dont want to tell them not to...

Keith said:

Then we have a problem. The modinfo and modstring sections are not loaded into kernel space, they are processed by insmod then discarded.

Solution: /proc/sys/kernel/tainted. Set to 0 on boot, set to 1 by insmod when it finds a non-GPL module, printed by panic, extracted by ksymoops. Any load of a proprietary module taints the kernel, even if it is later removed. The kernel code for that sysctl only allows taint to be set, not to be cleared.

Not perfect, really malicious users can hack the kernel. Or they can simply edit the taint flag in the oops report. But it will catch 90%+ of the problem case.

Alan thought that would be a fine solution. There were a few more comments, and the thread ended.

 

7. ext3 Oops Under 2.4
30 Aug 2001 - 3 Sep 2001 (7 posts) Archive Link: "ext3 oops under moderate load"
Topics: FS: ext3
People: Andrew MortonStephen C. TweediePeter Braam

Someone reported an oops using ext3 under 2.4.9, and Andrew Morton replied, "Yours is the third report of this - it's definitely a bug in ext3. I still need to work out how you managed to get a page attached to the inode which has not had its buffers fed through journal_dirty_data(). There seem to be several ways in which this can happen." He asked if the original poster might have run out of disk space on the relevant partition before it died, but the poster said no, there was plenty of space on the drive. Stephen C. Tweedie said:

I've just been able to reproduce it, using large symlinks.

The killer seems to be a situation when you have a revoked buffer-cache buffer and we then start allocating, and deallocating, the same buffer from the page cache. Large symlinks work from the page cache and satisfy this condition nicely.

Running a few parallel tasks writing to the end of large sparse files and truncating them (to create and delete lots of indirect blocks, populating the buffer cache with revoked data), then adding large symlink create/delete activity in the same directory, I was able to reproduce the oops in a few minutes.

I suspect that the same sort of effect is causing the revoke oops Peter Braam saw with discretionally journaled files.

He went on, "the issue is that whenever we create any journaled data, we need to cancel all previous indications of the revoke, even if the old revoke was in the buffer cache but the new block is in the page cache. That implies we effectively need the same as unmap_underlying_metadata, but for our own specific piece of metadata. Indeed, unmap_underlying_metadata already does the required lookup of the old cached buffer_head."

 

8. Some Explanation Of Module Organization
31 Aug 2001 - 1 Sep 2001 (3 posts) Archive Link: "Why is tulip in its own directory (at least to 2.4.8) ?"
Topics: Networking
People: Ken MoffatKeith OwensJeff Garzik

Ken Moffat noticed, "I've just changed the NIC on my main box from a natsemi to a tulip. The natsemi module was in /lib/modules/`uname -r`/kernel/drivers/net along with the ppp modules, but the tulip module is in a tulip subdirectory." Keith Owens replied, "The module install directory tree follows the source tree structure. All the module tools know about this directory structure. Do not hard code pathnames in insmod, just "insmod tulip" and let the tools do their job." And Jeff Garzik said that the current situation was sane, saying, "At least two more tulip-alikes are moving into that directory in 2.5, even though they will remain separate drivers."

 

9. Mini-Bug In 2.4.9 USB Device Versioning
31 Aug 2001 (6 posts) Archive Link: "[PATCH] usb fix"
Topics: USB
People: Andries BrouwerAlan CoxMatthew DharmLinus Torvalds

Andries Brouwer posted a one-liner and said, "Wondering why my USB Compact Flash cardreader works with 2.4.7 but not with 2.4.9, I noticed that my name was added and some constant changed. Changing it back revived my CF reader." Matthew Dharm blamed himself for a possible merging error, and told Linus Torvalds and Alan Cox that the patch looked good. Alan replied to Andries initial post, with, "Yes you added the entry, someone changed the constant as it didnt work for them, now you change it back. I suspect both constants should be in 8)" But Matthew replied:

That doesn't sound right, Alan...

The constant in question is an upper-limit to the range of device versions what get accepted. Narrowing the range can only break things -- making it wider may not (necessarily) fix anything, but it does increase the scope of the entry.

I'm guessing that someone meant to change it from something smaller than either Andries' or the current value to where it is now, but the larger value (i.e. Andries') is the proper one.

Alan agreed, and the thread ended.

 

10. Status Of Adaptec ASR2100s Support
1 Sep 2001 (4 posts) Archive Link: "Adaptec ASR2100s support?"
People: Alan CoxRoy Sigurd Karlsbakk

Roy Sigurd Karlsbakk asked if there were support for the Adaptec ASR2100s controller, and Alan Cox replied, "DPT wrote drivers and after some cycles of them cleaning them up they are in the 2.4.9-ac kernel and targetted for Linus tree." And Juan Pablo Abuyeres gave a link to http://adaptec2100s.tecnoera.com/.

 

11. Big Kernels
1 Sep 2001 (6 posts) Archive Link: "is bzImage container large enough?"
Topics: Kernel Build System
People: Keith OwensSamium GromoffMike Castle

Samium Gromoff asked if the bzImage resulting from turning on all feasible config options would be small enough to boot, and Keith Owens replied, "No, it is far too big. BTW, if you want to test compiles against various combinations of config, there are kbuild patches that add make allyes, make allno, make allmod and make randconfig." Mike Castle found the idea of a 'randconfig' a bit sroirie, but Keith explained, "The config is random but valid, it passes the CML1 validation checks. randconfig is useful for finding errors in the CML1 checks, it also finds errors in code which assume that a feature is always present."

 

12. Status Of Reiserfs Endianness
1 Sep 2001 - 3 Sep 2001 (3 posts) Archive Link: "when will reiserfs for big-endian machines be added to the kernel?"
Topics: FS: ReiserFS
People: Alan Cox

Someone asked when patches to make Reiserfs endian-safe would be added to the kernel, and Alan Cox replied, "They are in the -ac tree if you want to try them." Someone asked Alan privately how stable they were, and Alan replied, "I've had no complaints, and a small patch for parisc (queued) which also wanted the alignment stuff." He added he felt the patch was a candidate for inclusion in the official sources as well.

 

13. Status Of 2.5
2 Sep 2001 (2 posts) Archive Link: "2.5?"
People: Daniel PhillipsRoy Sigurd Karlsbakk

Roy Sigurd Karlsbakk asked where to find a list of what was planned for 2.5, and Daniel Phillips gave links to http://www.usenix.org/events/kernel01/summit.pdf, http://osdn.com/conferences/kernel/, and http://lwn.net/2001/features/KernelSummit/.

 

14. Status Of iSCSI Support For Linux
5 Sep 2001 (6 posts) Archive Link: "iSCSI support for Linux??"
Topics: Disks: SCSI
People: Nitin DhingraBen Greear

Ben Greear gave a link to the ietf draft (http://www.globecom.net/ietf/draft/draft-ietf-ips-iscsi-02.html) for iSCSI, and asked if there were plans to support it. Chmouel Boudjnah gave a link to a SourceForge project (http://sourceforge.net/projects/linux-iscsi/) . Nitin Dhingra also replied to Ben, saying:

That is a pretty old iscsi draft that you have pointed to. The latest iscsi draft ver 7 is available from ietf.org (http://search.ietf.org/internet-drafts/draft-ietf-ips-iscsi-07.txt) I have about 5 different code's for iScsi

  1. by Cisco : I checked the code I guess this one is working on both client and server Code
  2. by Intel : I checked the code faked on the server side and was based on iscsi draft ver 3.
  3. by UNH : I checked the code faked on the server side and was based on iscsi draft ver 3.
  4. by Chris Loveland : I checked the code faked on the server side I don't remember right now where I got this one's code from
  5. by Ashish A. Palekar : I checked the code I guess this one is working on both client and server Code and was based on iscsi draft ver 3. I don't remember where I got this one's code from

I guess cisco's code has also implemented authentication & security. I think someone gave you the links and you must have d/l by now. I guess by the end this year end there will be support for iScsi in Linux Kernel.

 

 

 

 

 

 

We Hope You Enjoy Kernel Traffic
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License, version 2.0.