Kernel Traffic #222 For 10 Jul 2003

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 2025 posts in 9923K.

There were 525 different contributors. 285 posted more than once. 224 posted last week too.

The top posters of the week were:

 

1. Synaptics TouchPad Driver
10 Jun 2003 - 26 Jun 2003 (47 posts) Archive Link: "[PATCH] Synaptics TouchPad driver for 2.5.70"
People: Joseph FanninVojtech PavlikPeter OsterlundAndrew Morton

Joseph Fannin said:

Here is a driver for the Synaptics TouchPad for 2.5.70. It is largely based on the XFree86 driver. This driver operates the touchpad in absolute mode and emulates a three button mouse with two scroll wheels. Features include:

  • Multi finger tapping.
  • Vertical and horizontal scrolling.
  • Edge scrolling during drag operations.
  • Palm detection.
  • Corner tapping.

The only major missing feature is runtime configuration of driver parameters. What is the best way to implement that? I was thinking of sending EV_MSC events to the driver using the /dev/input/event* interface and define my own codes for the different driver parameters.

In a later post, he added, "Please note that I did not write this driver -- Peter Osterlund <petero2@telia.com> did. I meant only to forward this here, since the original sender seems to have problems getting through to the list."

Andrew Morton really liked the patch, and gave some small technical suggestions. Vojtech Pavlik also said to Joseph (or whoever was the right person):

you may want to put these nice features into the mousedev.c driver for now, so that the touchpad works with standard XFree without the event based driver.

Also, I'm attaching Jens Taprogge synaptics work, which you may want to integrate ...

To Jens: Sorry for me not using your driver. It's very good, too. Hopefully you'll be able to work together with Peter to bring the best out of the two to the kernel.

Peter Osterlund replied that there was no need to include anything in the mousedev.c driver, as "There is now a working XFree86 driver here: http://w1.894.telia.com/~u89404340/touchpad/index.html." He posted an incremental patch to get it fully working. Vojtech was very happy with this, and they went over some implementation details.

 

2. Some Memory Management Enhancements Planned For 2.7
24 Jun 2003 - 2 Jul 2003 (30 posts) Archive Link: "[RFC] My research agenda for 2.7"
Topics: FS: ext2, FS: ext3
People: Daniel Phillips

Daniel Phillips said:

This note describes several items on my research agenda for the 2.7 kernel development timeframe. First a little history.

In 2.3/2.4, the dominant change to Linux's memory management subsystem was the unification of the page and buffer cache, in the sense that most double storage was eliminated and copying from the buffer to page cache was no longer required on each file write. In 2.5, the dominant change was the addition of reverse page mapping. I participated actively in the latter, being motivated by the belief that with reverse mapping in place, a number of significant evolutionary improvements to the memory management subsystem would become possible, or indeed, easy. Here, I list the three main projects I hope to complete in the 2.7 timeframe, for comment:

  1. Active memory defragmentation
  2. Variable sized page cache objects
  3. Physical block cache

These are ordered from least to most controversial. I believe all three will prove to be worthwhile improvements to the Linux's memory management subsystem, and hopefully, I support that belief adequately in the text below. Of course, working code and benchmarks are the final arbiter of goodness, and will appear in due course.

  1. Active memory defragmentation

    I doubt anyone will deny that this is desirable. Active defragmentation will eliminate higher order allocation failures for non-atomic allocations, and I hope, generally improve the efficiency and transparency of the kernel memory allocator.

    The purpose of active memory defragmentation is to catch corner cases, rather than to be a complete replacement for the current allocation system. The most obvious and problematic corner case is the one where all physical memory units of a given order are used up, in which case the allocator only has two options: wait or fail. Active defragmentation introduces a third option, which should eliminate nearly all instances of the former and all of the latter, except when physical memory is genuinely exhausted for some reason (i.e., bona fide OOM).

    The idea is to run a defragmentation daemon that kicks in whenever availability of some allocation order falls below a threshold. The defrag daemon first scans memory for easily movable pages that can form new, free units of the needed order. If this pass fails, the daemon could go as far as quiescing the system (a technique already used in RCU locking) and move some not-so-easy-to-move pages.

    In order to move a page of physical memory, we need to know what is pointing at it. This is often easy, for example in the common case when the only pointer to a page is held by the page cache and the page is not under IO. We only need to hold the page cache lock and the page table lock to move such a page.

    Moving anonymous memory is pretty easy as well, with the help of reverse page mapping. We need to hold the appropriate locks, then walk a page's reverse map list, updating pointers to a new copy of the page. (If we encounter nasty corner cases here, we could possibly fall back to a quiescing strategy.)

    Some difficult situations may be dealt with by creating a new internal kernel API that provides a way of notifying some subsystem that page ownership information is wanted, or that certain pages should be reallocated according to the wishes of the defragmentation daemon. Obviously, there is plenty of opportunity for over-engineering in this area, but equally, there is opportunity for clever design work that derives much benefit while avoiding most of the potential complexity.

    Physical memory defragmentation is an enabler for variable-sized pages, next on my list.

  2. Variable sized page objects

    This item will no doubt seem as controversial as the first is uncontroversial. It may help to know that my prototype code, done under 2.4, indicates that the complete system actually gets smaller with this feature, and possibly slightly faster as well. Essentially, if we have variable sized pages then we can eliminate the messy buffer lists that are (sometimes) attached to pages, and indeed, eliminate buffers themselves. Traditional buffer IO and data operations can be trivially reexpressed in terms of pages, provided page objects can take on the same range of sizes as buffers currently do. The block IO library also gets shorter, since much looping and state handling becomes redundant.

    For my purpose, "variable sized" means that each struct page object can reference a data frame of any binary size, including smaller than a physical page. To keep the implementation sane, all pages of a given address_space are the same size. Also, page objects smaller than a physical page are only allowed for file-backed memory, not anonymous memory. Rules for large pages remain to be determined, however since there is already considerable work being done in this area, I will not dwell on it further.

    The most interesting aspect of variable sized pages is how subpages are handled. This could get messy, but fortunately a simple approach is possible. Subpage page structs do not need to reside in the mem_map; instead they can be dynamically allocated from slab cache. The extra bookkeeping needed inside the page cache operations to keep track of this is not much, and particularly, does not add more than a sub-cycle penalty to the case where subpages are not used (even so, I expect this penalty to be more than made up by shorter, straighter paths in the block IO library).

    One benefit of subpages that may not be immediately obvious is the opportunity to save some space in the mem_map array: with subpages it becomes quite attractive to use a larger PAGE_CACHE_SIZE, i.e., a filesystem that must use a small block size for some reason won't cause a lot of additional internal fragmentation.

    But to my mind, the biggest benefit to subpages is the opportunity to eliminate some redundant state information that is currently shared between pages and buffer_heads. To be sure, I've been less than successful to date at communicating the importance of this simplification, but in this case, the code should be the proof.

    Variable-size pages will deliver immediate benefits to filesystems such as Ext2 and Ext3, in the form of larger volume size limits and more efficient transfers. As a side effect, we will probably need to implement tail merging in Ext2/3 to control the resulting increased internal fragmentation, but that is another story, for another mailing list.

    Variable size pages should fit together nicely with the current work being done on large (2 and 4 MB) page handling, and especially, it will be nice for architectures like MIPS that can optimize variable sized pages in hardware.

    Some bullet points:

    • Rationalize state info: state represented only in struct page, not struct page + list of struct buffer_head
    • 1K filesystems aren't a special case any more
    • More efficient IO path, esp for 1K fs
    • Net removal of code by simplifying the vfs block IO library (new code is added to page cache access functions).
    • Makes the page locking unit match the filesystem locking unit for most filesystems
    • Generalizes to superpages
    • Performance. It's a little more efficient. Eliminates one class of objects, allowing us to concentrate more on the remaining class.
    • Large file blocks (multiple physical pages)
    • Eliminate buffer heads. Final use of buffer heads is as data handle for filesystem metadata (IO function of buffer heads will be taken over by BIO). Basically, we can just substitute struct page for struct buffer_head. Can make this transition gradual, after establishing one buffer head per page.
    • Memory pressure now acts on only one class of object, making balancing more even.

    Relies on:

    • Active defragmentation

    How it works:

    • Page size is represented on a per-address space basis with a shift count. In practice, the smallest is 9 (512 byte sector), could imagine 7 (each ext2 inode is separate page) or 8 (actual hardsect size on some drives). 12 will be the most common size. 13 gives 8K blocksize for, e.g., alpha. 21 and 22 give 2M and 4M page size, matching hardware capabilities of x86, and other sizes are possible on machines like MIPS, where page size is software controllable
    • Implemented only for file-backed memory (page cache)
    • Special case these ops in page cache access layer instead of having the messy code in the block IO library
    • Subpage struct pages are dynamically allocated. But buffer_heads are gone so this is a lateral change.

  3. Physical block cache

    This item is not strictly concerned with memory management, but as it impacts the same subsystems, I have included it in this note.

    In brief, a physical block cache lets the vfs efficiently answer the question: "given a physical data block, tell me if and where it is mapped into any address_space on the same volume". This need not be a big change to the existing strategy: the normal lookup and other operations remain available. However, the vfs gets the additional responsibility of maintaining a special per-volume address_space coherently with the multiple file-backed address_spaces on the volume.

    In fact, we already have such per-volume address spaces, and there really isn't that much work to do here, in order to test the effects of making the two types of address_space coherent with one another. One way of looking at this is, full coherence in this area would complete the work of unifying the page and buffer caches, started some years ago.

    Having discussed this idea with a few developers, I've been warned that difficult issues will arise with some of the more complex filesystems, such as Ext3. Fortunately, a prototype physical block cache can be adequately tested with Ext2 alone, which doesn't have a lot of issues. If this proves to deliver the performance benefits I expect, further work would be justified to extend the functionality to other filesystems.

    So what are the anticpated performance benefits? I've identified two so far:

    1. Physical readahead. That is, we can load a block into the page cache before we know which address_space, if any, it actually belongs to. Later, when we do know, additionally entering it into its proper address space is efficient. This will help with the traversal of many small files case, which Linux currently handles poorly.
    2. Raid 5. The biggest performance problem with Raid 5 stems from the fact that for small, isolated writes it is forced to read a few blocks to compute every new parity block, and in the process suffers large amounts of rotational latency. A big cache helps with this a great, however, the size of cache we could expect to find in, e.g., a high end scsi drive, is not adequate to eliminate the bad effects, and in any event, bus saturation becomes a very real problem. We could also have a separate, physical block cache, but this wastes memory and causes unnecessary copying. Being able to implement the big cache directly in the page cache is thus a big advantage in terms of memory savings, and reduced data copying. There is also a latency advantage,

Summary

Please note that all of the above is unofficial, experimental work. However, I do believe that all three of these items have the potential to deliver substantial improvements in terms of reliability, efficiency and obvious correctness.

Thankyou for your indulgence in reading all the way down to here. The timeframe for this work is:

  • Starts as soon as 2.5 closes
  • Prototypes to be posted shortly thereafter

 

3. nf-hipac Packet Filtering
25 Jun 2003 - 2 Jul 2003 (17 posts) Archive Link: "[ANNOUNCE] nf-hipac v0.8 released"
Topics: BSD: OpenBSD, Networking, SMP
People: Michael BellionThomas HeinzDaniel EggerPekka SavolaRoberto NibaliFolkert van Heusden

Michael Bellion and Thomas Heinz announced:

We have released a new version of nf-hipac. We rewrote most of the code and added a bunch of new features. The main enhancements are user-defined chains, generic support for iptables targets and matches and 64 bit atomic counters.

For all of you who don't know nf-hipac yet, here is a short overview:

nf-hipac is a drop-in replacement for the iptables packet filtering module. It implements a novel framework for packet classification which uses an advanced algorithm to reduce the number of memory lookups per packet. The module is ideal for environments where large rulesets and/or high bandwidth networks are involved. Its userspace tool, which is also called 'nf-hipac', is designed to be as compatible as possible to 'iptables -t filter'.

The official project web page is: http://www.hipac.org
The releases can be downloaded from: http://sourceforge.net/projects/nf-hipac

Features:

  • optimized for high performance packet classification with moderate memory usage
  • completely dynamic: data structure isn't rebuild from scratch when inserting or deleting rules, so fast updates are possible
  • very short locking times during rule updates: packet matching is not blocked
  • support for 64 bit architectures
  • optimized kernel-user protocol (netlink): improved rule listing speed
  • libnfhipac: netlink library for kernel-user communication
  • native match support for:

    • source/destination ip
    • in/out interface
    • protocol (udp, tcp, icmp)
    • fragments
    • source/destination ports (udp, tcp)
    • tcp flags
    • icmp type
    • connection state
    • ttl

  • match negation (!)
  • iptables compatibility: syntax and semantics of the userspace tool are very similar to iptables
  • coexistence of nf-hipac and iptables: both facilities can be used at the same time
  • generic support for iptables targets and matches (binary compatibility)
  • integration into the netfilter connection tracking facility
  • user-defined chains support
  • 64 bit atomic counters
  • kernel module autoloading
  • /proc/net/nf-hipac/info:

    • dynamically limit the maximum memory usage
    • change invokation order of nf-hipac and iptables

  • extended statistics via /proc/net/nf-hipac/statistics/*

We are currently working on extending the hipac algorithm to do classification with several stages. The hipac algorithm will then be capable of combining several classification problems in one data structure, e.g. it will be possible to solve routing and firewalling with one hipac lookup. The idea is to shorten the packet forwarding path by combining fib_lookup and iptables filter lookup into one hipac query. To further improve the performance in this scenario the upcoming flow cache could be used to cache recent hipac results.

Folkert van Heusden was very happy to see this, and asked if there were any chance of a 2.5 port. Thomas replied, "It should not be that hard to port nf-hipac to 2.5 since most of the code (the whole hipac core) is not "kernel specific". But since we are busy planning the next hipac extension we don't have the time to do this ourselves. Maybe some volunteer is willing to implement the port."

Elsewhere, Daniel Egger asked, "Is this library actually usable for applications which need to control the firewall or is it equally braindead to libiptables?" And Michael said:

The library _is_ intended to be used by other applications than the nf-hipac userspace tool, too. It hides the netlink communication from the user who is only required to construct the command data structure sent to the kernel which contains at most one single nf-hipac rule. This is very straightforward and the kernel returns detailed errors if the packet is misconstructed.

Taking a look at nfhp_com.h and evt. nf-hipac.c gives you some clue on how to build valid command packets.

Elsewhere, Pekka Savola asked for some performance statistics, and Michael replied:

We have done some performance tests with an older release of nf-hipac. The results are available on http://www.hipac.org/

Apart from that Roberto Nibali did some preliminary testing on nf-hipac. You can find his posting to linux-kernel here: http://marc.theaimsgroup.com/?l=linux-kernel&m=103358029605079&w=2

Since there are currently no performance tests available for the new release we want to encourage people interested in firewall performance evaluation to include nf-hipac in their tests.

Pekka asked, "One obvious thing that's missing in your performance and Roberto's figures is what *exactly* are the non-matching rules. Ie. do they only match IP address, a TCP port, or what? (TCP port matching is about a degree of complexity more expensive with iptables, I recall.)" Roberto Nibali replied:

When I did the tests I used a variant of following simple script (http://www.drugphish.ch/~ratz/genrules.sh).

There you can see that I only used a src port range. In an original paper I wrote for my company (announced here: http://www.ussg.iu.edu/hypermail/linux/kernel/0203.3/0847.html) I did create rules that only matched IP addresses, the results were bad enough ;).

Meanwhile I should revise the paper as quite a few things have been addressed since then: For example the performance issues with OpenBSD packet filtering have mostly been squashed. I didn't continue on that matter because I fell severely ill last autumn and first had to take care of that.

Close by, Pekka said:

We've also conducted some tests with bridging firewall functionality, and we're very pleased with nf-hipac's performance! Results below.

In the measurements, tests were run through a bridging Linux firewall, with a netperf UDP stream of 1450 byte packets (launched from a different computer connected with gigabit ethernet), with a varying amount of filtering rules checks for each packet.

I don't have the specs of the Linux PC hardware handy, but I recall they're *very* highend dual-P4's, like 2.4Ghz, very fast PCI bus, etc. Shouldn't be a factor here.

1. Filtering based on source address only, for example:
   $fwcmd -A $MAIN -p udp -s 10.0.0.1   -j DROP
   ...
   $fwcmd -A $MAIN -p udp -s 10.0.3.255 -j DROP
   $fwcmd -A $MAIN -p udp               -j ACCEPT

  Results:
  rules     | plain NF               | NF-HIPAC
            | sent       | got thru  | sent       | got thru  |
      (n.o) |   (Mbit/s) |  (Mbit/s) |   (Mbit/s) |  (Mbit/s) |
  -------------------------------------------------------------
          0 |     956,00 |    953,24 |     956,00 |    953,24 |
        512 |     956,00 |    800,68 |     956,46 |    952,81 |
       1024 |     956,00 |    472,78 |     956,46 |    952,81 |
       2048 |     955,99 |    170,52 |     956,46 |    952,86 |
       3072 |     956,00 |     51,97 |     956,46 |    952,85 |

2. Filtering based on UDP protocol's source port, for example:
   $fwcmd -A $MAIN -p udp --source-port 1    -j DROP
   ...
   $fwcmd -A $MAIN -p udp --source-port 1024 -j DROP
   $fwcmd -A $MAIN -p udp                    -j ACCEPT

  Results:
  rules     | plain NF               | NF-HIPAC
            | sent       | got thru  | sent       | got thru  |
      (n.o) |   (Mbit/s) |  (Mbit/s) |   (Mbit/s) |  (Mbit/s) |
  -------------------------------------------------------------
          0 |     955,37 |    954,33 |     956,46 |    952,85 |
        512 |     980,68 |    261,41 |     956,46 |    951,92 |
       1024 |        N/A |       N/A |     956,47 |    952,86 |
       2048 |        N/A |       N/A |     956,46 |    952,85 |
       3072 |        N/A |       N/A |     956,46 |    952,85 |

N/A = Netfilter bridging can't handle this at all, no traffic can pass the bridge.

So, plain Netfilter can tolerate about a couple of hundred rules checking for addresses and/or ports on a gigabit line.

With HIPAC Netfilter, packet loss is very low, less than 0.5%, even with the maximum number (of tested) rules, the same amount as without filtering at all.

Michael said, "Great, thanks a lot. Your tests are very interesting for us as we haven't done any gigabit or SMP tests yet." He and Pekka went over more ideas for benchmarks to run.

 

4. Explanations Of Various Kernel Trees
25 Jun 2003 - 28 Jun 2003 (6 posts) Archive Link: "Is their an explanation of various kernel versions/brances/patches/? (-mm, -ck, ..)"
Topics: Clustering, Forward Port, Virtual Memory
People: Peter C. NdikuweraBrian JacksonSamuel FloryWilliam Lee Irwin IIIHugh DickinsStephen HemmingerCon KolivasRik van RielAlan CoxAndrea ArcangeliChris WrightDave JonesAndrew Morton

Orion Poplawski noticed that a lot of people had their own kernel trees, ranging from Alan Cox's -ac tree, to Andrew Morton's -mm tree. Orion asked if there was any information about what all these different trees were for.

Peter C. Ndikuwera said, "http://kernelnewbies.org/faq/index.php3#trees has a few of them. Maybe you could alert the web maintainers to the entries in this thread? :-)"

Brian Jackson also replied to Orion, saying:

here goes my knowledge of the different patchsets:

for the most part all of them are testing grounds for patches that someday hope to be in the vanilla kernel

  • mm - Andrew Morton - vm related testing ground for dev tree
  • ck - Con Kolivas - desktop/interactivity patches
  • kj - Kernel Janitors - testing ground for kernel cleanups on development trees
  • mjb - Martin J Bligh - scalability stuff
  • wli - William Lee Irwin - other vm related stuff for dev tree that Andrew Morton may not have time for
  • ac - Alan Cox - lately it's been a testing ground for new ide
  • lsm - Chris Wright - Linux Security Modules, provides a lightweight, general purpose framework for access control
  • osdl - Stephen Hemminger, ? maybe enterprise stuff
  • laptop - Hanno B?ck - unproven laptop type patches
  • aa - Andrea Arcangeli - stable series vm stuff
  • dj - Dave Jones - cleanups/AGP
  • rmap - Rik van Riel - reverse mapping vm for 2.4
  • pgcl - William Lee Irwin - ?

Others? Oh yes. Maybe this is something that should be tracked on a webpage somewhere.

Samuel Flory pointed out that Alan's -ac tree "is often the test ground for new 2.4 fixes, and features." And William Lee Irwin III said that his pgcl tree was for "Page clustering. A vague attempt at a forward port of Hugh Dickins' 2.4.7 patch for the same purpose, WIP. I'd say it's more of one patch than a patch set." And someone else also pointed out the existence of -dis, for laptop-related patches, and -jp, for security and performance issues.

 

5. Status Of Serial ATA In 2.4
26 Jun 2003 - 28 Jun 2003 (3 posts) Archive Link: "Serial ATA driver for 2.4.18."
Topics: Disks: IDE, Serial ATA
People: Adarsh DaheriyaAlan CoxAndre Hedrick

Adarsh Daheriya asked where to find "the siimage SATA driver for 2.4.18 kernel," and Alan Cox said, "The current one depends on the 2.4.20/2.4.21 IDE rework. I have no plans to backport it although if you desperately need it you could I guess pay someone." And Andre Hedrick put in, "I have one for sale buy you will pay a price for my time and work in the past to get it. Nothing is free in this economy today."

 

6. Checklist For Submitting Patches
26 Jun 2003 (3 posts) Archive Link: "Kernel patch release checklist available"
People: Peter ChubbDavid MosbergerWilly Tarreau

Peter Chubb announced:

After being burnt a few times in forgetting something that I should have done when releasing a patch against the kernel, I've created a Kernel Patch Release Checklist at

http://www.gelato.unsw.edu.au/IA64wiki/PatchReleaseChecklist

If you want to you can add new stuff I haven't thought of to this list: but you need to register on the Wiki to do so.

David Mosberger was very happy to see this, and suggested some additional information to add to the document. And Willy Tarreau also suggested it go in the Documentation directory of the kernel sources.

 

7. QEMU 0.4 Released
26 Jun 2003 (1 post) Archive Link: "[ANNOUNCE] QEMU 0.4 release"
People: Fabrice Bellard

Fabrice Bellard announced:

The QEMU x86 CPU emulator version 0.4 is available at http://bellard.org/qemu. QEMU can now launch an unpatched(*) Linux kernel and give correct execution performances by using dynamic compilation.

QEMU requires no host kernel patches and no special priviledge in order to run a Linux kernel. QEMU simulates a serial console and a NE2000 network adapter. X11 applications such as xterm have been launched.

QEMU can be used for example for faster kernel testing/debuging and virtual hosting.

 

8. New SnoopyPro Logfile Dumper
27 Jun 2003 (1 post) Archive Link: "[Announce] Linux command line Snoopy Pro logfile dumper"
Topics: USB, Version Control
People: Michael Still

Michael Still said:

I had two maths exams last week. This of course means that I had to find something to distract me. That thing was whipping up a SnoopyPro logfile dumper for the command line. This was motivated by generalised frustration with the SnoopyPro user interface.

For those wondering, SnoopyPro is a Source Force hosted USB traffic dumper for Windows. It's useful when reverse engineering USB device drivers.

This version of the dumper only implements the URB types which I immediately needed. Adding additional URBs isn't hard, but I didn't have any samples. Feel free to mail me usblogs, and I'll add them to the decoder.

The only really cool feature in this version is that it implements "repeated URB sequence suppression", so if the Windows driver says to the USB device "hey, you still there" every second for 60 seconds, and there is no other traffic between the machine and that device, then the output will only show one of those interactions, and let you know it hid 59 more. This feature can be turned on and off with the -r command line option.

You can get the GPL'ed CVS version of the source code from:
http://www.stillhq.com/extracted/usblogdump.tgz

There is sample output et cetera at:
http://www.stillhq.com/cgi-bin/getpage?area=usblogdump

The next step is to modify the display of the URBs so that they're closer to the Linux data structures.

 

9. ATA-Over-SCSI Driver Update
30 Jun 2003 - 2 Jul 2003 (7 posts) Archive Link: "ata-scsi driver update"
Topics: Disks: IDE, Disks: SCSI, Serial ATA, Version Control
People: Jeff GarzikJurgen Kramer

Jeff Garzik announced:

maintenance update, nothing terribly new or exciting. mostly error handling improvements and cleanups (and some bug fixes just for fun).

GNU diff, versus 2.4.21 release: ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.4/2.4.21-atascsi1.patch.bz2

BK repos: bk://kernel.bkbits.net/jgarzik/atascsi-2.[45]

The 2.5 repo is a bit out of date WRT the latest scsi api, but the ata-scsi driver itself is 100% in sync with its 2.4 counterpart. (due to the large number of changes in 2.5 scsi, the 2.5 driver is a fork of the 2.4 driver)

detailed changes:

  • add autogenerated docbook docs
  • add atapi (ifdef'd out, due to lack of err handling)
  • better ata probing, including better err handling during probe
  • more piix pci ids. bump up ich5 sata max speed to udma6.
  • beginnings of SYNCHRONIZE CACHE support for ATA drives
  • better SCSI emulation for ATA drives
  • cleanups, simplifications, minor bug fixes
  • a huge search-n-replace job, s/ata_host/ata_port/

A couple new host drivers coming next, along with atapi error handling...

Jurgen Kramer tried it out and (after a few twists) got it working. But he said, "my DVD-ROM doesn't show here. It should be on scsi1 (or is ATAPI support not in yet?) What also shows is that ata1 is not being configured for maximum possible speed. Ata1 should be set to UDMA/100. The SATA drive is configured properly though." Jeff replied, "Correct, ATAPI isn't supported yet." And he added, "Both ATAPI and PATA cable detection should be working in the next release (a week or two from now)."

 

10. Preferred GCC Version For Kernel Compilation
2 Jul 2003 (3 posts) Archive Link: "gcc 2.95.4 vs gcc 3.3 ?"
People: Adrian BunkAlan CoxRobert L. Harris

Robert L. Harris asked if it was OK to use GCC 3.3 for kernel compilation, and Adrian Bunk replied:

gcc 3.3 is relatively new and _much_ less tested than 2.95. A new gcc might either contain bugs or it might unleash bugs in the kernel that weren't visible before (e.g. via better optimizations).

Usually gcc 3.3 works fine (and my PC at home runs a 2.4.21 compiled with 3.3) but if you want stability in production envvironments 2.95 (or the unofficial 2.96 >= 2.96-74) is the recommended compiler.

Alan Cox also told Robert that GCC 3.3 would probably successfully compile the kernel itself, but "some drivers still don't build with it."

 

11. Linux In Film-Making
2 Jul 2003 (2 posts) Archive Link: "Linus goes to Hollywood!"
People: Bill HueyAndre Hedrick

Andre Hedrick gave a link to an eWeek article (http://www.eweek.com/article2/0,3959,1159212,00.asp) telling how Sinbad: The Legend Of The Seven Seas (http://us.imdb.com/Title?0165982) was the first film ever created using only Linux. Bill Huey remarked, "Yeah, it's a pretty neat thing that movies are being produced by Linux clusters/workstations these days."

 

12. Benchmarks Comparing ext2 And ext3
2 Jul 2003 (1 post) Archive Link: "ext2 vs ext3"
Topics: Big Memory Support, FS: ext2, FS: ext3, Virtual Memory
People: Martin J. Bligh

Martin J. Bligh reported:

Andrew asked for updated numbers ... is about the same on kernbench, still significantly slower on SDET (about 1/4 of the speed of ext2), though much better than it was.

Kernbench: (make -j N vmlinux, where N = 16 x num_cpus)

                              Elapsed      System        User         CPU
               2.5.73-mm3       45.36      111.71      565.71     1493.75
          2.5.73-mm3-ext3       45.59      114.12      565.72     1489.50

       853     5.2% total
       570    11.4% default_idle
        72     3.6% page_remove_rmap
        58   580.0% fd_install
        38   292.3% __blk_queue_bounce
        24     1.7% do_anonymous_page
        23     4.5% __copy_to_user_ll
        20    13.1% __wake_up
        14   700.0% __find_get_block_slow
        13     6.6% do_page_fault
        13     9.2% __down
        12     8.6% kmem_cache_free
        12     0.0% journal_add_journal_head
        11    26.8% __fput
        10     0.0% find_next_usable_block
        10     0.0% do_get_write_access
...
       -12   -21.8% copy_page_range
       -21    -6.0% __copy_from_user_ll
       -28   -68.3% may_open
       -58  -100.0% generic_file_open

DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered trademarks of the Standard Performance Evaluation Corporation. This benchmarking was performed for research purposes only, and the run results are non-compliant and not-comparable with any published results.

SDET 128 (see disclaimer)

                           Throughput    Std. Dev
               2.5.73-mm3       100.0%         0.1%
          2.5.73-mm3-ext3        23.1%         4.4%

    168834   222.4% total
    142610   375.4% default_idle
     10901     0.0% .text.lock.transaction
      3674     0.0% do_get_write_access
      3345     0.0% journal_dirty_metadata
      3227  5867.3% __down
      1548   710.1% schedule
      1514  1916.5% __wake_up
      1306     0.0% start_this_handle
      1268     0.0% journal_stop
       831     0.0% journal_add_journal_head
       627  2985.7% __blk_queue_bounce
       522     0.0% journal_dirty_data
       441     0.0% ext3_get_inode_loc
       305  30500.0% prepare_to_wait_exclusive
       277   513.0% __find_get_block_slow
       265     0.0% ext3_journal_start
       238     0.0% find_next_usable_block
       213   116.4% __find_get_block
       209     0.0% ext3_do_update_inode
       157  15700.0% unlock_buffer
       147     0.0% journal_cancel_revoke
       141     0.0% ext3_orphan_del
       136     0.0% ext3_orphan_add
       130     0.0% ext3_reserve_inode_write
       128   209.8% generic_file_aio_write_nolock
       126     0.0% journal_unmap_buffer
       123  12300.0% blk_run_queues
       120    94.5% __brelse
       108     0.0% ext3_new_inode
...
      -102   -22.1% remove_shared_vm_struct
      -104    -8.1% copy_page_range
      -108  -100.0% generic_file_open
      -110   -31.9% free_pages_and_swap_cache
      -113   -92.6% .text.lock.highmem
      -115   -49.8% follow_mount
      -151   -69.6% .text.lock.dcache
      -182   -59.3% .text.lock.dec_and_lock
      -182  -100.0% ext2_new_inode
      -194   -11.6% zap_pte_range
      -196   -32.8% path_lookup
      -223   -34.7% atomic_dec_and_lock
      -237  -100.0% grab_block
      -262   -22.9% __d_lookup
      -283   -27.5% release_pages
      -843   -21.6% page_add_rmap
     -2259   -26.3% page_remove_rmap

 

13. kexec For 2.5.74 Released
2 Jul 2003 (1 post) Archive Link: "[KEXEC][ANNOUNCE] kexec for 2.5.74 available"
Topics: Kexec
People: Andy Pfiffer

Andy Pfiffer announced:

An UNTESTED patch for kexec for 2.5.74 is now available. This patch is based upon the stable 2.5.{6[789],7*} versions.

I will be away fragging for a few days at http://www.pdxlan.com/ and not responding to email.

More info here:
http://developer.osdl.org/~andyp/bloom/Code/Linux/Kexec/

Unified full kexec patch for 2.5.74 is here:
http://developer.osdl.org/~andyp/kexec/2.5.74/kexec2-2.5.74-full.patch

Source tarball of the matching user-mode utility for kexec 2.5.74:
http://developer.osdl.org/~andyp/kexec/2.5.74/kexec-tools-1.8-2.5.74.tgz

Unstable 2.5.69 kexec patches from Eric Biederman are available here:
http://www.xmission.com/~ebiederm/files/kexec/

 

 

 

 

 

 

We Hope You Enjoy Kernel Traffic
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License, version 2.0.