Kernel Traffic #182 For 1�Sep�2002

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1514 posts in 7271K.

There were 423 different contributors. 221 posted more than once. 164 posted last week too.

The top posters of the week were:

1. Config Policy

6�Aug�2002�-�24�Aug�2002 (128 posts) Archive Link: "Linux 2.4.20-pre1"

Topics: Disks: SCSI, Kernel Build System

People: Greg Banks,�Peter Samuelson,�Sam Ravnborg,�Linus Torvalds

Disks

In the course of discussion, Linus Torvalds made some comments (quoted below) about his requirements for a config system. In the middle of the thread, Peter Samuelson had posted a patch against 2.4.20-pre1, to fix up the kconfig language, which, as Greg Banks said in his reply, had been "held together with spit and string." But Greg felt Peter's changes might be too invasive for the 2.4 tree even if the patch was great. Peter replied that his changes were "trivial enough, and easy enough to test, that I think it could go in 2.4, yes. Obviously xconfig would need to be dealt with in sync with the others, which I'm not doing during the prototyping / idea-mongering stage." But Greg said, "I think you're underestimating the Gordian knot that is the CML1 corpus."

The patch itself (among other things) made the need for '$' in front of dependency names optional instead of required. At one point Peter explained, "The main motivation for dropping the '$' was to make possible the "" == "n" semantics." To this, Greg said, "Changing the existing semantics, regardless of how broken we all agree they are, is asking for a world of trouble." He went on:

To pick an example, in 2.5.29 drivers/ide/Config.in:17 is

dep_tristate ' SCSI emulation support' CONFIG_BLK_DEV_IDESCSI $CONFIG_ATAPI $CONFIG_BLK_DEV_IDE $CONFIG_SCSI

But at this point in the menu tree for 14 of 17 architectures, CONFIG_SCSI has not yet been defined. The result is that CONFIG_BLK_DEV_IDESCSI only works in "make config" and "make allyesconfig" precisely because of the semantic that you wish to change.

Peter said this was a bug in his code, and if Greg posted a list of all the ones he found, Peter would go through and patch them. Greg said there were thousands of occurrences, spread throughout 17 architectures. Sam Ravnborg suggested, "How about extending the language (side effect) to automagically append (EXPERIMENTAL) or (OBSOLETE) to the menu line, if dependent on those special tags?"

Peter and Greg both agreed this was a good idea, an Greg pointed out that CML2 had used that implementation as well. But Greg added, "The trouble is actually achieving that in shell-based parsers where shell code cannot tell whether $CONFIG_EXPERIMENTAL has been used in a condition." Sam said, "Remembering the CML2 war there were no serious objections about shifting away from shell based parsers (but certainly a lot about the alternative selected)." He asked, "Where comes the requirement that we shall keep the existing shell based config parsers?" And Linus replied:

I use them exclusively.

It is far and away the most convenient parsing - just to do "make oldconfig" (possibly by making changes by hand to the .config file first).

As far as I'm personally concerned, the shell parsers are the _only_ parser that really matter. So if you want to replace them with something else, that something else had better be pretty much perfect and not take all that long to build.

2. Generating Random Numbers

17�Aug�2002�-�23�Aug�2002 (73 posts) Archive Link: "[PATCH] (0/4) Entropy accounting fixes"

Topics: PCI, Random Number Generation, SMP, USB

People: Oliver Xymoron,�Linus Torvalds,�Alan Cox

Real-Time

Oliver Xymoron announced:

I've done an analysis of entropy collection and accounting in current Linux kernels and founds some major weaknesses and bugs. As entropy accounting is only one part of the security of the random number device, it's unlikely that these flaws are compromisable, nonetheless it makes sense to fix them.

Net effect: a typical box will claim to generate 2-5 _orders of magnitude_ more entropy than it actually does.

Note that entropy accounting is mostly useful for things like the generation of large public key pairs where the number of bits of entropy in the key is comparable to the size of the PRNG's internal state. For most purposes, /dev/urandom is still more than strong enough to make attacking a cipher directly more productive than attacking the PRNG.

The following patches against 2.5.31 have been tested on x86, but should compile elsewhere just fine.

I've tried to cover some of the issues in detail below:

Broken analysis of entropy distribution

(I know the topic of entropy is rather poorly understood, so here's a couple useful pieces of background for kernel folks:

Cryptanalytic Attacks on Pseudorandom Number Generators Kelsey, Schneier, Wagner, Hall www.counterpane.com/pseudorandom_number.pdf

Cryptographic Randomness from Air Turbulence in Disk Drives D. Davis, R. Ihaka, P.R. Fenstermacher http://world.std.com/~dtd/random/forward.ps)

Mathematically defining entropy

For a probability distribution P of samples K, the entropy is:

E = sum (-P(K) * log2 P(K))

For a uniform distribution of n bits of data, the entropy is n. Anything other than a uniform distribution has less than n bits of entropy.

Non-Uniform Distribution Of Timing

Unfortunately, our sample source is far from uniform. For starters, each interrupt has a finite time associated with it - the interrupt latency. Back to back interrupts will result in samples that are periodically spaced by a fixed interval.

A priori, we might expect a typical interrupt to be a Poisson process, resulting in a gamma-like distribution. It would also have zero probability up to some minimum latency, have a peak at minimum latency representing the likelihood of back-to-back interrupts, a smooth hump around the average interrupt rate, and an infinite tail.

Not surprisingly, this distribution has less entropy in it than a uniform distribution would. Linux takes the approach of assuming the distribution is "scale invariant" (which is true for exponential distributions and approximately true for the tails of gamma distributions) and that the amount of entropy in a sample is in relation to the number of bits in a given interrupt delta.

Assuming the interrupt actually has a nice gamma-like distribution (which is unlikely in practice), then this is indeed true. The trouble is that Linux assumes that if a delta is 13 bits, it contains 12 bits of actual entropy. A moment of thought will reveal that binary numbers of the form 1xxxx can contain at most 4 bits of entropy - it's a tautology that all binary numbers start with 1 when you take off the leading zeros. This is actually a degenerate case of Benford's Law (http://mathworld.wolfram.com/BenfordsLaw.html), which governs the distribution of leading digits in scale invariant distributions.

What we're concerned with is the entropy contained in digits following the leading 1, which we can derive with a simple extension of Benford's Law (and some Python):

    def entropy(l):
        s=0
        for pk in l:
            if pk: s=s+(-pk*log2(pk))
        return s

    def benford(digit, place=0, base=10):
        if not place:
            s=log(1+1.0/digit)
        else:
            s=0
            for k in range(base**(place-1), (base**place)):
                s=s+log(1+1.0/(k*base+digit))
                print k,s

        return s/log(base)

    for b in range(3,16):
        l=[]
        for k in range(1,(2**(b-1))-1):
            l.append(benford(k,0,2**(b-1)))
        print "%2d %6f" % (b, entropy(l))

Which gives us:

3 1.018740
4 2.314716
5 3.354736
6 4.238990
7 5.032280
8 5.769212
9 6.468756
10 7.141877
11 7.795288
12 8.433345
13 9.059028
14 9.674477
15 10.281286

As it turns out, our 13-bit number has at most 9 bits of entropy, and as we'll see in a bit, probably significantly less.

All that said, this is easily dealt with by lookup table.

Interrupt Timing Independence

Linux entropy estimate also wrongly assumes independence of different interrupt sources. While SMP complicates the matter, this is generally not the case. Low-priority interrupts must wait on high priority ones and back to back interrupts on shared lines will serialize themselves ABABABAB. Further system-wide CLI, cache flushes and the like will skew -all- the timings and cause them to bunch up in predictable fashion.

Furthermore, all this is observable from userspace in the same way that worst-case latency is measured.

To protect against back to back measurements and userspace observation, we insist that at least one context switch has occurred since we last sampled before we trust a sample.

Questionable Sources and Time Scales

Due to the vagarities of computer architecture, things like keyboard and mouse interrupts occur on their respective scanning or serial clock edges, and are clocked relatively slowly. Worse, devices like USB keyboards, mice, and disks tend to share interrupts and probably line up on USB clock boundaries. Even PCI interrupts have a granularity on the order of 33MHz (or worse, depending on the particular adapter), which when timed by a fast processor's 2GHz clock, make the low six bits of timing measurement predictable.

And as far as I can find, no one's tried to make a good model or estimate of actual keyboard or mouse entropy. Randomness caused by disk drive platter turbulence has actually been measured and is on the order of 100bits/minute and is well correlated on timescales of seconds - we're likely way overestimating it.

We can deal with this by having each trusted source declare its clock resolution and removing extra timing resolution bits when we make samples.

Trusting Predictable or Measurable Sources

What entropy can be measured from disk timings are very often leaked by immediately relaying data to web, shell, or X clients. Further, patterns of drive head movement can be remotely controlled by clients talking to file and web servers. Thus, while disk timing might be an attractive source of entropy, it can't be used in a typical server environment without great caution.

Complexity of analyzing timing sources should not be confused with unpredictability. Disk caching has no entropy, disk head movement has entropy only to the extent that it creates turbulence. Network traffic is potentially completely observable.

(Incidentally, tricks like Matt Blaze's truerand clock drift technique probably don't work on most PCs these days as the "realtime" clock source is often derived directly from the bus/PCI/memory/CPU clock.)

If we're careful, we can still use these timings to seed our RNG, as long as we don't account them as entropy.

Batching

Samples to be mixed are batched into a 256 element ring buffer. Because this ring isn't allowed to wrap, it's dangerous to store untrusted samples as they might flood out trusted ones.

We can allow untrusted data to be safely added to the pool by XORing new samples in rather than copying and allowing the pool to wrap around. As non-random data won't be correlated with random data, this mixing won't destroy any entropy.

Broken Pool Transfers

Worst of all, the accounting of entropy transfers between the primary and secondary pools has been broken for quite some time and produces thousands of bits of entropy out of thin air.

Linus Torvalds was skeptical. To Oliver's claim of two to five orders of magnitude more entropy, Linus replied:

On the other hand, if you are _too_ anal you won't consider _anything_ "truly random", and /dev/random becomes practically useless on things that don't have special randomness hardware.

To me it sounds from your description that you may well be on the edge of "too anal". Real life _has_ to be taken into account, and not accepting entropy because of theoretical issues is _not_ a good idea.

Quite frankly, I'd rather have a usable /dev/random than one that runs out so quickly that it's unreasonable to use it for things like generating 4096-bit host keys for sshd etc.

In particular, if a machine needs to generate a strong random number, and /dev/random cannot give that more than once per day because it refuses to use things like bits from the TSC on network packets, then /dev/random is no longer practically useful.

Theory is theory, practice is practice. And theory should be used to _analyze_ practice, but should never EVER be the overriding concern.

So please also do a writeup on whether your patches are _practical_. I will not apply them otherwise.

Oliver replied, "My box has been up for about the time it's taken to write this email and it's already got a full entropy pool. A 4096-bit public key has significantly less than that many bits of entropy in it (primes thin out in approximate proportion to log2(n))." He went on:

Let me clarify that 2-5 orders thing. The kernel trusts about 10 times as many samples as it should, and overestimates each samples' entropy by about a factor of 10 (on x86 with TSC) or 1.3 (using 1kHz jiffies).

The 5 orders comes in when the pool is exhausted and the pool xfer function magically manufactures 1024 bits or more the next time an entropy bit (or .1 or 0 entropy bits, see above) comes in.

He concluded, "The patches will be a nuisance for anyone who's currently using /dev/random to generate session keys on busy SSL servers. But" [...] "with the old code, they were fooling themselves anyway. /dev/urandom is appropriate for such applications, and this patch allows giving it more data without sacrificing /dev/random." And he added that only people using /dev/random to generate session keys on busy SSL servers would find his patches a nuisance.

Linus took a look at the code, and said, "No, it appears to be a nuisanse even for people who have real issues, ie just generating _occasional_ numbers on machines that just don't happen to run much user space programs." He said that Oliver's code threw out a lot of entropy sources that should have been kept. He spent another twenty minutes looking at the code and replied to himself, saying:

Hmm.. After more reading, it looks like (if I understood correctly), that since network activity isn't considered trusted -at-all-, your average router / firewall / xxx box will not _ever_ get any output from /dev/random what-so-ever. Quite regardless of the context switch issue, since that only triggers for trusted sources. So it was even more draconian than I expected.

Are you seriously trying to say that a TSC running at a gigahertz cannot be considered to contain any random information just because you think you can time the network activity so well from the outside?

Oliver, I really think this patch (which otherwise looks perfectly fine) is just unrealistic. There are _real_ reasons why a firewall box (ie one that probably comes with a flash memory disk, and runs a small web-server for configuration) would want to have strong random numbers (exactly for things like generating host keys when asked to by the sysadmin), yet you seem to say that such a user would have to use /dev/urandom.

If I read the patch correctly, you give such a box _zero_ "trusted" sources of randomness, and thus zero bits of information in /dev/random. It obviously won't have a keyboard or anything like that.

This is ludicrous.

Alan Cox interjected:

The current policy has always been not to trust events that are precisely externally controllable. Oliver hasn't changed the network policy there at all.

Its probably true there are low bits of randomness available from such sources providing we know the machine has a tsc, unless the I/O APIC is clocked at a divider of the processor clock in which case our current behaviour is probably much saner.

Oliver also replied to Linus, saying Linus' points were not false, but that anyone who had the problem of zero trusted sources of entropy on their system with his patch, would have had the same problem before. His patch only made that explicit. But Linus said:

Be realistic. This is what I ask of you. We want _real_world_ security, not a completely made-up-example-for-the-NSA-that-is-useless-to-anybody- else.

All your arguments seem to boil down to "people shouldn't use /dev/random at all, they should use /dev/urandom".

Which is just ridiculous.

But elsewhere, he qualified, "I suspect that Oliver is 100% correct in that the current code is just _too_ trusting. And parts of his patches seem to be in the "obviously good" category."

3. SCTP Under Linux

19�Aug�2002�-�22�Aug�2002 (4 posts) Subject: "no subject"

People: David S. Miller,�Henning P. Schmiedehausen

Jack Bloch asked if there were any plans to do an SCTP (Stream Control Transmission Protocol) implementation as described in RFC 2960 (http://www.faqs.org/rfcs/rfc2960.html) under Linux. David S. Miller replied, "It's done, I'm going to merge it in the next week or so into 2.5.x Search the list archives for the SCTP project site as I don't have the URL handy." Henning P. Schmiedehausen gave a link to http://www.sctp.de/, and Philipp Matthias gave a link to the SCTP project page (http://www.sf.net/projects/lksctp) on Sourceforge.

4. Hyperthreading

21�Aug�2002�-�26�Aug�2002 (20 posts) Archive Link: "Hyperthreading"

Topics: Hyperthreading, SMP

People: James Bourne,�Hugh Dickins,�Kelsey Hudson,�Alan Cox,�Ingo Molnar

Timothy A Reed asked what kernel configuration options were needed in order to make use of hyperthreading. James Bourne replied:

As long as you have a P4 and use the P4 support you will get hyperthreading with 2.4.19 (CONFIG_MPENTIUM4=y). 2.4.18 you have to also turn it on with a lilo option of acpismp=force on the kernel command line.

You might want to balance IRQs across the cpus. Ingo Molnar has created patches for this, which I've put on my website at http://www.hardrock.org/kernel/.

hyperthreading will give you some performance boostes, but *only* if you have many runable processes a majority of the time, or very heavily threaded applications running on the system. (an example would be running 4 setiathome clients on a dual processor system).

Hugh Dickins added, "You do need CONFIG_SMP and a processor capable of HyperThreading, i.e. Pentium 4 XEON; but CONFIG_MPENTIUM4 is not necessary for HT, just appropriate to that processor in other ways." James said he'd been under the impressiong that the P4 XEON was the only processor capable of hyperthreading. Kelsey Hudson replied, "This is currently correct, although I believe Intel has plans to release a Hyperthreading-capable version of its desktop P4." There followed some speculation that other processors were capable of hyperthreading, but had it disabled. Alan Cox remarked at one point:

If you want to know the full HT capabilities of the processor you need to read cpuid 1 and check ebx bits 16-23.

There has been some interesting speculation as to whether you can enable HT by undocumented mtrrs on cpus that have "ht" but claim not to be doing HT. Clearly the value returned is settable somewhere but I've seen no proof yet than you can enable HT on non PIV Xeons this way.

5. ALSA Update For 2.5

21�Aug�2002�-�23�Aug�2002 (3 posts) Archive Link: "[PATCH] ALSA 0.9.0rc3"

Topics: I2C, Ioctls, PCI, SMP, Sound: ALSA, USB, Version Control

People: Jaroslav Kysela,�Christoph Hellwig,�David S. Miller

Jaroslav Kysela announced:

Linus, please, apply these patches with latest ALSA code to 2.5 tree:

Plain patch:
------------

ftp://ftp.alsa-project.org/pub/kernel-patches/alsa-1.489.1.1.patch.gz
ftp://ftp.alsa-project.org/pub/kernel-patches/alsa-1.501.patch.gz

BK send/receive format (including nice per file comments/changelogs):
---------------------------------------------------------------------

ftp://ftp.alsa-project.org/pub/kernel-patches/alsa-1.489.1.1.bk.gz
ftp://ftp.alsa-project.org/pub/kernel-patches/alsa-1.501.bk.gz

BK linux-sound repository
-------------------------

bk pull http://linux-sound.bkbits.net/main

ChangeSets: 1.489.1.1 and 1.501

Web: http://linux-sound.bkbits.net

Description: ------------

ALSA 0.9.0rc3 release

* fixes for x86-64
* fixed ioctl32 wrapper
* removed compatibility __NO_VERSION__ defines
* C99-like structure initializers for all code
* Control interface
  - fixed read() behaviour
* PCM interface
  - added support for more PCM formats (up to 512)
    - added 24-bit formats for USB audio
  - removed drain call from the snd_pcm_close() function, data are
    always dropped
  - added support for Scatter-Gather DMA
  - added SBUS DMA support by David S. Miller <davem@redhat.com>
* Timer interface
  - fixed kmod behaviour for system timers
  - fixed read() behaviour
* RawMidi interface
  - fixed read() behaviour
* Sequencer interface
  - reset the timer at continue if not initialized yet
  - check the possible infinite loop in priority queues
  - fixed deadlock at snd_seq_timer_start/stop
* intel8x0 driver
  - fixed pci id of AMD8111
* via686 driver
  - added Scatter-Gather DMA support
  - fixes in AC97 codec initialization
* opti92x/93x driver
  - overall fixes
* via8233 driver
  - fixed playback of mono samples
  - added Scatter-Gather DMA support
* EMU10K1/Audigy driver
  - added Scatter-Gather DMA support for playback
  - added workaround for capture (ring buffer pointer)
* NM256 driver
  - fixed the lock up on NM256 ZX (Dell Latitude Cpt)
* CS46xx driver
  - added support for new DSP images
  - experimental rear and SPDIF outputs
* added snd-usbaudio and snd-usbmidi driver
* ymfpci driver
  - fixed GPIO read/write
  - added snd_rear_switch option
* ice1712 driver
* fixed SMP dead-lock (CS8427 I2C code)
* HDSP driver
  - overall code improvement
* CS4236 driver
  - new ISA PnP ID
* CS4281 driver
  - added the power management code
* PPC Keywest
  - fixed the initialization of driver
* serial-u16550
  - added support for generic adapter
* renamed dt0197h -> dt019x driver
* ac97 code
  - added VIA and Conexant codecs
  - added AD1981A codec from Analog Devices
  - added the ids for ITE chips
  - separated codec specific initialization

Christoph Hellwig asked, "Any chance you could stop that BK megachangeset and instead do one changeset per cvs commit?" Jaroslav replied, "I'll do more frequent syncing with the kernel tree in the future (I assume per week), but creating changesets per CVS commit is too overkill from the maintaince view. Everybody interested in ALSA development might watch our CVSLOG mailing list (archived) or use our CVS."

6. Preventing Multiple Oopsen From Overwriting Each Others

22�Aug�2002 (3 posts) Archive Link: "[PATCH] printk simultaneous oops disentangling"

Topics: SMP

People: David Howells,�Benjamin LaHaise

David Howells posted a patch and said, "Here's a patch to stop multiple simultaneous oopses on an SMP system from interleaving with and overwriting bits of each other. It only permits lock breaking if the printk lock is held by the same CPU." Benjamin LaHaise objected, "This is still wrong. It should attempt to acquire the locks with a timeout before trampling on them, as there may be a printk or other console output in progress on the other cpu." But he thought better of it a few minutes later, and said instead, "The patch is actually right, but bust_spinlocks still blindly stops on locks that may not need to be stomped on."

7. First-Come-First-Served Locking For POSIX And Flock Locks

22�Aug�2002 (1 post) Archive Link: "[PATCH] An option to make fcntl & flock locks fair"

Topics: POSIX

People: Matthew Wilcox,�Shlomi Fish

Matthew Wilcox posted a patch and explained, "Shlomi Fish asked about including first-come, first-served style locking for posix and flock locks. After some back-and-forth, we came up with the following patch which seems unintrusive enough to bother including. Personally, I doubt the utility of this, but someone might have an application for it, and the code's already written."

8. Header Files Touched During Compilation

22�Aug�2002 (3 posts) Archive Link: "is kernel compilation supposed to change header file timestamps?"

Topics: Kernel Build System

People: Chris Friesen,�Sam Ravnborg,�George Anzinger

Chris Friesen said, "I noticed the other day that on a kernel compile, the timestamps of some files are changed. The funny thing is that all the changed ones are header files, but not all header files are modified. Is this expected behaviour?" Sam Ravnborg replied, "I assume you are compiling a 2.4 kernel, in which case this is expected behaviour. For the 2.5 kernel kbuild has been changed such that header files are no longer 'touched' during the compile process." And George Anzinger added that in the 2.4 case, "it has to do with how dependencies are propagated from header file to header file (i.e. where a header file includes another)."

9. Serial Driver Maintainership And Status

22�Aug�2002�-�23�Aug�2002 (2 posts) Archive Link: "serial driver maintaner"

Topics: Maintainership

People: Stuart MacDonald,�Russell King

Someone asked about the serial driver. The maintainer hadn't updated the web page in a long time, and the poster had some hardware he/she wanted to support. But he/she didn't want to do the work unless there was some chance it might be accepted into the driver. Without a maintainer, that looked doubtful. Stuart MacDonald replied, "Ted doesn't seem to be maintaining it anymore. If you look in the linux-kernel archive you'll find that Russell King is doing a rewrite for 2.5/6 anyway." He added, "Update the driver, make a patch and send it to the list. If it's good likely it will be included. You may want to check out linux-serial also."

10. Relaxing ext3 Error Handling To Avoid False Positives

22�Aug�2002�-�23�Aug�2002 (3 posts) Archive Link: "[Patch 1/2] 2.4.20-pre4/ext3: Handle dirty buffers encountered unexpectedly."

Topics: Bug Tracking, FS: ext3

People: Stephen Tweedie

Stephen Tweedie posted a patch and explained:

Ext3's internal debugging has always assumed that it was illegal for there to be parallel IO on a buffer-head which it is trying to modify. That's reasonable --- if there is an IO collision, we end up with IOs hitting disk out-of-order wrt the journal, so we lose recovery guarantees.

However, there are two cases where the test is a little over-zealous. If user space is performing inherently non-transactional writes (eg. tune2fs adding a label to a live filesystem and writing to the buffered device superblock location) then we can hit the ext3 assertion.

More seriously, since 2.4.11 the page cache can lock a buffer_head for read even if the bh is already under journal control. The tune2fs bug is very rare: there have been no reports of it in Bugzilla or ext3-users lists, and only one on 2.5 on linux-kernel. But now, a dump(8) on a live filesystem can also give rise to the same condition, and in testing, dump + fs activity reproduces the assertion-failure VERY rapidly.

This patch changes the jbd get-write-access code to take the buffer_head lock before testing the uptodate and dirty state of a bh, and relaxes the handling of unexpectedly-dirty buffers to be a printk warning, not a fatal error. The lock will cure the dump(8) interaction, and the warning means that we will still spot out-of-order writes, while not taking the whole kernel down if we collide with a tune2fs(8).

The patch also removes a small potential hole in the recovery guarantees. It is not safe for a transaction to steal buffers from checkpoint mode until after that transaction has committed. Otherwise, a reboot at the wrong moment might find the old copy of the buffer in the journal had been removed from the recovery set before the new copy was written.

11. New IPMI Driver For 2.4 And 2.5

22�Aug�2002 (1 post) Archive Link: "[patch] New version of IPMI driver"

People: Corey Minyard

Corey Minyard announced:

I've split up the driver, creating working 2.4.19 and 2.5.31 versions of the driver (and even tested them!) and split the emulation code into a separate patch.

I also noticed that 2.5.31 timer interrupts occur at 1ms instead of 10ms, so it should provide acceptable speed without high-res timers. 2.4 without high-res timers or interrupts will still be very slow.

I have not yet tested interrupts, because I don't have a card that supports them (it's on its way). However, that's pretty straightforward.

The web page is http://home.attbi.com/~minyard

Please, try it out and tell me what you think. Again, I'm shooting for getting this in the mainstream kernel.

12. Submitting Documentation Patches

23�Aug�2002 (5 posts) Archive Link: "Documentation Maintainer"

Topics: FS

People: Alan Cox,�Alexander Viro

Vincent Hanquez wanted to submit a documentation patch for some filesystem code, and asked who the current maintainer was. Alan Cox said, "Generally the maintainer of the code the documentation covers, or the author on the file. If you aren't sure send it to the list." Someone else said that for filesystem docs, Alexander Viro was the place to send patches.

13. Status Of DRM Driver In 2.4 And 2.5

23�Aug�2002 (5 posts) Archive Link: "[PATCH] Intel 830m backport (2.5 -> 2.4)"

Topics: Virtual Memory

People: Federico Di Gregorio,�Christoph Hellwig,�Marcelo Tosatti,�Alan Cox,�Rik van Riel

Federico Di Gregorio posted a patch and announced:

this is my first try at a kernel patch, i hope i am doing everything right; if not, please just tell me. (i sent this patch to both the drm maintainer and the linux-kernel ML. should i send 2.4 patches directly to marcelo? mm..)

anyway, this is just a backport of the 2.5 DRM driver for Intel 830M to the 2.4 series. It is against 2.4.19 but, consisting only of added files it should work clean on later kernels (tested on 2.4.20pre). The patch is quite big (67252 bytes) and can be downloaded from:

http://people.initd.org/fog/linux-2.4.19-i830.diff

But Christoph Hellwig said:

Please don't do this. The 2.5 drm code is a piece of shit and even crappier than the one in 2.4.

Alan, is there any chance you could send marcelo the -ac drm code?

Alan Cox invited Christoph to untangle the drm code from its rmap macro dependencies and send it to Marcelo Tosatti himself. But Rik van Riel said that those dependencies had been merged into 2.4 months before. Christoph said:

I've uploaded a patch that updates the mainline drm code to -ac, fixes all compiler warnings and removes the remaining LINUX_VERSION_CODE checks after most have already been removed in -ac.

It's at http://verein.lst.de/~hch/misc/linux-2.4.20-pre4-drm.patch

14. BitKeeper License Change

23�Aug�2002�-�24�Aug�2002 (4 posts) Archive Link: "RFC: BK license change"

Topics: Patents, Version Control

People: Larry McVoy,�David Parsons,�Sam Ravnborg

Larry McVoy announced:

No, we're not GPLing it but we are making a few adjustments and wanted to make sure that it was an improvement, not a regression, in the eyes of the free users. Sorry for the intrusion, I'll be as brief as possible.

You can read the new license at http://www.bitkeeper.com/bkl.txt but I'll summarize the changes here.

3(a) Propagation to openlogging.org. The old license insisted that you log your changes within 7 days; several people pointed out that they are spending their dotcom dollars sitting on an island hacking the kernel and they may not have connectivity every 7 days. Or something. We upped the limit to 21 days, that should be enough, I have to believe that you check your mail every three weeks if you are doing work.

3(c) Maintaining Open Source. Our intent was that the free use of BitKeeper was for the purpose of helping the open source community; it was not to provide commercial users a free product. We have had a number of cases where managers up to VPs have told their engineers "just don't put anything useful in the checkin comments and then we can use it for free". Not what we had in mind. So we're adding a clause which says that we reserve the right to insist that you make your repositories available on a public port within 15 days of the request.

We understand that lots of legit open source users have very good reasons for not wanting their changes made public, e.g., they are working on a security fix. We are absolutely not going to ask these sorts of repositories be forced out in the open and if you are concerned about that we can work out some sort of written agreement to that effect. We're very much committed to supporting open source development, in particular the Linux kernel and even more specifically Linus, he's a critical resource.

The only people we're going after are those people who are clearly not part of the open source community. We thought about saying we would only enforce this if they were working on source which did not have an open source license and rejected it for the following reason: there are commercial companies working on open source, using BitKeeper to do so, and not sharing their changes for as long as they can to get a competitive edge in the marketplace. There is nothing wrong with that under the terms of the GPL, but we don't have to support what we view as commercial activity for free. Open means open, it's about sharing, not money, in our opinion.

It's a hard nut to crack, you can't just say "it's free if you are doing everything out in the open" because there are legit reasons for hiding. There also commercial reasons for hiding and our view is that if that is what you are doing, you should be paying for the tools. BK is free as a way to help people help each other.

4.4. Remove the $20,000 support clause. We had a clause that said that we could shut you down if you cost us more than $20K in support. This was a widely hated clause and we're aware of that. It was there as a way to try and shut down those people who were really commercial. Since the previous change will effectively do that, we don't need this clause. That removes the fear that we'll shut down bkbits or the kernel's use of BK.

That's it on the licensing stuff. Since I'm here, here's some BK status.

We're in discussions with a very Linux friendly hosting service (4000 Linux servers hosted) to move bkbits.net and openlogging.org to their site in exchange for BK licenses. This should make the bkbits.net service have more bandwidth and the benefit of a an extremely well connected and well run hosting environment. We don't need the bandwidth, BK is super stingy with bandwidth, but it's cool to have bkbits.net in an air conditioned, UPSed, multi peered environment instead of my office. We're psyched about this, it's a good thing.

We're working on closing the first commercial deal which we can tie to the use of BK by the kernel team. If this actually happens, I'm going to take $25K of the deal and "give" it to Linus as "BK bucks" which he can spend. What means is that he has $25K to spend on BK features that he wants. This is above and beyond stuff that we're doing already, it's a way to give him the power to insist that we do some work that we wouldn't do otherwise. In general, we'd like to make a policy of doing this sort of thing. To date, we can't credit the open source use of BK with any commercial business. If that changes, that's good for us but it should also be good for the kernel.

David Parsons said of item 3(c), "This addendum is somewhat [1] annoying, because I switched over to BK for _everything_ a couple of years ago and now I've got a moderately large body of stuff that is NOT open source (my resume, my dns, little proofs of concept projects that I did for people. I've not made one red penny off any of this [particularly since the economy has gone south and put me out of work for the past year. But I'm still not opensourcing my resume.]) that's under bitkeeper. If I upgrade to a bk that uses the new license, then I get to play the exciting game of ``break the new license and defraud my former employers'' [2], which is about as appealing as Linus's alternative approach to resolving software patent issues."

Sam Ravnborg had no comments about the license change, but did say:

I have a feature request. The view of changesets on bkbits is usefull, but the sorting does not give the full picture.

Follow this example:
bk pull http://linux.bkbits.net/linux-2.5

Linus do a bk pull from my repository

When accessing bkbits via the web interface, the canges are listed sorted after the time I did the modifications, not when Linus actually did the bk pull, so they may be preceeded by maybe 100 cset's.

Is it possible somehow to sort the cset(s) according to the time they were applied to the local tree, and not when they were originally committed?

Larry replied:

If this is a correct statement of what you want, we're building it:

Instead of seeing events in time order of creation, you want to see the events in order of arrival in a particular repository.

I agree that the current view is useless when what you want to know is when did this change finally make it into the tree?

We're working on a "stack" of incoming events. BK/Web will use this to give you the display you want and bk undo will be able to use this to roll your repository backwards by "popping" the stack. You could do

        while true
        do      bk undo -sf
        done

and when it gets done, you'll have no repository, it will have popped it away. bk unpull will just be come a special case of popping the stack.

15. Some Benchmarks From 2.4 And 2.5

24�Aug�2002 (1 post) Archive Link: "dbench test"

People: Paolo Ciarrocchi

Paolo Ciarrocchi reported:

I've just ran a few test "dbench" based against:

Ok, I know that dbench is not a "good" test, but it should be at least a good stress test. I got neither oops nor BUG().

Each test has been ran twice. Here it goes the results:

2.4.18
Istances Throughput
8        25.1022
16       20.3833
24       18.0078
32       13.6657

2.4.18 -0.24pre3 (64MiB of cc)
Istances Throughput
8        28.5116
16       27.5003
24       24.6963
32       16.423

2.4.19
Istances Throughput
8        25.5343
16       20.7133
24       16.2473
32       14.2351

2.5.31
Istances Throughput
8        30.6827
16       18.2236
24       14.6759
32       12.7659

16. Status Of khttpd In 2.4 And 2.5

25�Aug�2002�-�26�Aug�2002 (3 posts) Archive Link: "[PATCH] khttpd crash fix, take 3"

Topics: Web Servers

People: Dan Kegel,�Christoph Hellwig,�Alan Cox

Dan Kegel posted a patch against 2.4 and said:

Alan Cox accepted my recent patch to fix a checker warning in khttpd, but not my earlier patch to fix an oops in khttpd.

That earlier patch must have hit some bogon filter... hmm. Yes. It contained extraneous whitespace and style changes, was complex, and had a poor description. So, here's a cleaner one with a better description.

This patch fixes four problems in khttpd:

  1. An oops in DecodeHeader where Buffer[CPUNR] is NULL, happened whenever a worker thread was restarted after being stopped. (The worker thread frees its buffer on exit, but the manager thread neglected to allocate a buffer for the worker thread when restarting it.)
  2. A bug that caused worker threads to be spuriously restarted once on startup (this made the previous bug much worse).
  3. The end-user had to do a "sleep 1" after stopping the daemon before restarting it. This was not documented, and was rather confusing.
  4. There was no entry in /usr/src/linux/Documentation for khttpd, and beginning users sometimes could not find the documentation.

(An earlier version that fixes only the first two is at http://marc.theaimsgroup.com/?l=linux-kernel&m=102068445316516&w=2 ) I can separate this patch into three or four pieces if need be.

Please let me know if this patch gets blocked by any bogon filters...

Christoph Hellwig asked, "BTW: would you step up as khttpd maintainer? It seems no ones else cares for it and it's always good to have someone to drop patches/complaints at," but there was no reply. Elsewhere in the midst of a different thread under the Subject: Linux 2.4.20-pre4-ac2 () , Christoph remarked, "khttpd is gone in 2.5" .

17. Enhancements To md Multipath In 2.4

26�Aug�2002 (1 post) Archive Link: "[PATCH] Enhancement to md multipath in 2.4"

Topics: Disk Arrays: RAID, Ioctls

People: Lars Marowsky-Bree,�Jens Axboe

Lars Marowsky-Bree posted a patch and announced:

Jens Axboe did most of the work on this; I only stressed it a bit and fixed some bugs in it. As he is currently on vacation, I would still like to present it to you and solicit comments on it.

It compiles and passes my test script, so it can't be all wrong, I hope ;-) It certainly isn't worse than the current code.

I've also done a small patch to mdadm to allow access to the new functionality provided.

The enhancements include:

Patch attached.

Of course, this is still subject to the general comments about the block device error handling in 2.4.

18. Allow Loop Devices To Fail On Demand In 2.4

26�Aug�2002 (1 post) Archive Link: "[PATCH] loop device failing on demand (2.4)"

People: Lars Marowsky-Bree,�Jens Axboe

Lars Marowsky-Bree posted a patch and said:

The attached small patch allows to "fail" a loop device on demand. Any further request to the loop device will simply fail.

Even though it of course doesn't simulate the failures one might see in the field, it is kind of handy for automated tests, for example for multipath I/O.

Done by Jens Axboe. The reason why I am sending it: I need it most and he is on vacation.

19. Status Of Tux2

26�Aug�2002�-�28�Aug�2002 (7 posts) Archive Link: "TUX2 fiulesystem"

Topics: FS, Patents, Web Servers

People: Frederic Roussel,�Daniel Phillips,�Hank Leininger

Frederic Roussel asked:

Mr Daniel Phillips started the TUX2 filesystem project some time ago. The links to `tux2' are either dead or quite old.

Does any kernel developer know about the status of that project ?

Daniel Phillips replied:

It's well down my list of priorities because of uncertainties due to the U.S. patent system.

Does anybody want to know if patent chill exists, and is it hurting open source? The answer is yes.

Someone asked what the patent issues were surrounding Tux2, and Hank Leininger replied:

I'm guessing Daniel is referring to NetApp/WAFL issues. NetApp's WAFL filesystem (IIRC) implements something which is kinda sorta if-you-squint- your-eyes philosophically similar to tux2's phase tree. Only

  1. It isn't *really* all that similar
  2. Daniel has prior art going back to the 1980's
  3. NetApp has more lawyers on staff than Daniel does

Daniel, did I get it vaguely right?

Daniel Phillips replied, "That about sums it up."

20. Hotplug Scripts Updated

26�Aug�2002 (2 posts) Archive Link: "[ANNOUNCE] 2002-08-26 release of hotplug scripts"

Topics: Disks: SCSI, Hot-Plugging, USB, Virtual Memory

People: Greg KH,�David Brownell

Greg KH announced:

I've just packaged up the latest Linux hotplug scripts into a release, which can be found at:

http://sourceforge.net/project/showfiles.php?group_id=17679

Or from your favorite kernel.org mirror at:

kernel.org/pub/linux/utils/kernel/hotplug/hotplug-2002_08-26.tar.gz (ftp://ftp.us.kernel.org/pub/linux/utils/kernel/hotplug/hotplug-2002_08-26.tar.gz)

I've also packaged up a Red Hat 7.3 based rpm:

kernel.org/pub/linux/utils/kernel/hotplug/hotplug-2002_08-26-1.noarch.rpm (ftp://ftp.us.kernel.org/pub/linux/utils/kernel/hotplug/hotplug-2002_08-26-1.noarch.rpm)

The source rpm is available if you want to rebuild it for other distros at:

kernel.org/pub/linux/utils/kernel/hotplug/hotplug-2002_08_26-1.src.rpm (ftp://ftp.us.kernel.org/pub/linux/utils/kernel/hotplug/hotplug-2002_08_26-1.src.rpm)

The main web site for the linux-hotplug project can be found at:

http://linux-hotplug.sf.net/

which contains lots of documentation on the whole linux-hotplug process. There are also links to kernel patches, not currently in the main kernel tree, that provide hotplug functionality to new subsystems (like CPU, SCSI, Memory, etc.)

The main changes in this release are the following:

Here's the changes (and who made them) from the last release:

Changes from David Brownell

  • load_drivers(): variables are local, and doesn't try usbmodules unless the $DEVICE file exists (it'd fail)
  • update hotplug.8 manpage to mention Max'patch
  • patch from Max Krasnyanskiy, now usb hotplugging also searches /etc/hotplug/usb/*.usermap

Changes from Fumitoshi UKAI

  • etc/hotplug/hotplug.functions: grep -q redirect to /dev/null closes: debian Bug#145484

21. Hyperthreading In 2.5

26�Aug�2002�-�28�Aug�2002 (4 posts) Archive Link: "[patch] "fully HT-aware scheduler" support, 2.5.31-BK-curr"

Topics: Hyperthreading, SMP, Version Control

People: Ingo Molnar,�Rusty Russell,�Jun Nakajima,�Linus Torvalds

Ingo Molnar posted a patch and announced:

symmetric multithreading (hyperthreading) is an interesting new concept that IMO deserves full scheduler support. Physical CPUs can have multiple (typically 2) logical CPUs embedded, and can run multiple tasks 'in parallel' by utilizing fast hardware-based context-switching between the two register sets upon things like cache-misses or special instructions. To the OSs the logical CPUs are almost undistinguishable from physical CPUs. In fact the current scheduler treats each logical CPU as a separate physical CPU - which works but does not maximize multiprocessing performance on SMT/HT boxes.

The following properties have to be provided by a scheduler that wants to be 'fully HT-aware':

the attached patch (against 2.5.31-BK-curr) implements all the above HT-scheduling needs by introducing the concept of a shared runqueue: multiple CPUs can share the same runqueue. A shared, per-physical-CPU runqueue magically fulfills all the above HT-scheduling needs. Obviously this complicates scheduling and load-balancing somewhat (see the patch for details), so great care has been taken to not impact the non-HT schedulers (SMP, UP). In fact the SMP scheduler is a compile-time special case of the HT scheduler. (and the UP scheduler is a compile-time special case of the SMP scheduler)

the patch is based on Jun Nakajima's prototyping work - the lowlevel x86/Intel bits are still those from Jun, the sched.c bits are newly implemented and generalized.

There's a single flexible interface for lowlevel boot code to set up physical CPUs: sched_map_runqueue(cpu1, cpu2) maps cpu2 into cpu1's runqueue. The patch also implements the lowlevel bits for P4 HT boxes for the 2/package case.

(NUMA systems which have tightly coupled CPUs with a smaller cache and protected by a large L3 cache might benefit from sharing the runqueue as well - but the target for this concept is SMT.)

some numbers:

compiling a standalone floppy.c in an infinite loop takes 2.55 seconds per iteration. Starting up two such loops in parallel, on a 2-physical, 2-logical (total of 4 logical CPUs) P4 HT box gives the following numbers:

2.5.31-BK-curr: - fluctuates between 2.60 secs and 4.6 seconds.

BK-curr + sched-F3: - stable 2.60 sec results.

the results under the stock scheduler depends on pure luck: which CPUs get the tasks scheduled. In the HT-aware case each task gets scheduled on a separate physical CPU, all the time.

compiling the kernel source via "make -j2" [under-utilizes CPUs]:

2.5.31-BK-curr: 45.3 sec

BK-curr + sched-F3: 41.3 sec

ie. a ~10% improvement. The tests were the best results picked from lots of (>10) runs. The no-HT numbers fluctuate much more (again the randomness effect), so the average compilation time in the no-HT case is higher.

saturated compilation "make -j5" results are roughly equivalent, as expected - the one-runqueue-per-CPU concept works adequately when the number of tasks is larger than the number of logical CPUs. The stock scheduler works well on HT boxes in the boundary conditions: when there's 1 task running, and when there's more nr_cpus tasks running.

the patch also unifies some of the other code and removes a few more #ifdef CONFIG_SMP branches from the scheduler proper.

(the patch compiles/boots/works just fine on UP and SMP as well, on the P4 box and on another PIII SMP box as well.)

Rusty Russell was happy to see this, as it meant he wouldn't have to do the implementation himself. But for Ingo's statement that "Tasks should attempt to 'stick' to physical CPUs, not logical CPUs," Rusty replied:

Linus disagreed with this before when I discussed it with him, and with the current (stupid, non-portable, broken) set_affinity syscall he's right.

You don't know if someone said "schedule me on cpu 0" because they really want to be scheduled on CPU 0, or because they really *don't* want to be scheduled on CPU 1 (where something else is running). You can't just assume they are equivalent if they are the same physical CPU.

My modified set_affinity syscall (which takes a "include/exclude" flag) allows the arch to make this decision (eventually) since you know what the user wants (it also means that you know what to do if they give you a short bitmap, or a new cpu comes online/goes offline).

Ingo replied that he didn't make assumptions on why a particular CPU was chosen to receive a process. He said, "There's also a fair amount of code in the kernel that relies on binding threads to particular CPUs, the patch does not break that in any way." And as far as Linus Torvalds' opinion, Ingo countered, "actually, affinity still works just fine, users can bind tasks to logical CPUs as well. What i meant was the affinity logic of the scheduler (ie. affinity decisions done by the scheduler), not the externally visible affinity API." This made sense to Rusty, who promised to go read the patch more carefully.

22. uClinux Update For 2.5

27�Aug�2002 (1 post) Archive Link: "[PATCH]: linux-2.5.31uc1 (MMU-less patches)"

People: Greg Ungerer

Greg Ungerer announced:

A new 2.5.31 MMU-less patch, linux-2.5.31uc1. Just a minor update, couple of small fixes.

Get it at the usual place:

http://www.uclinux.org/pub/uClinux/uClinux-2.5.x/

23. Some Developer Interaction

27�Aug�2002 (8 posts) Archive Link: "[PATCH] XFree v4.2.x DRM/DRI Support for 2.4.20-pre4"

People: Christoph Hellwig,�Willy Tarreau,�David Lang,�Randy Dunlap,�Marc-Christian Petersen

Marc-Christian Petersen posted a patch, and Christoph Hellwig said:

Don't do this. Alan already has a sane version in his tree which I've made ready for and sent to Marcelo. It wouldn't hurt if you read lkml..

The patch you posted is the crap directly from the XFfree repo and backs out kernel changes. It might be enough for a random collection of junk patches but certainly does not meet the quality criteria for official kernels.

Willy Tarreau replied:

why do you always feel the need to discourage people who offer their contribution ? Your two first sentences are quite enough to let Marc-Christian understand that his patch isn't as good as YOURS. The rest of the mail is pure gratuitous insults, just like every other mail you send these times (except those in which you compliment yourself). Since a few weeks, each time I see a mail from you, before opening it, I ask to myself "well, who is he killing today ?".

Perhaps you're fed up with crap in the kernel, but IMHO that's not this way that you'll get rid of it. This list is a developper's list, so it tends to be constructive by nature. So please be a little more tolerant with other people, particularly when they are contributing.

Christoph replied, "It's not MY patch. It's Alan & Arjans works, and I stated that clearly in the thread a few days ago, where someone posted a patch to bring the XFree crap in. I expect from someone who thinks of himself as kerneltree maintainer that he atleast follows lkml, and watching the most important secondary tree (-ac) won't hurt either." David Lang said:

for crying out loud, earlier this week we had a post from some of the network maintainers chastising someone becouse they only sent the patch to the kernel list and not to the network list becouse many of the developers don't read the kernel list.

if core kernel developers are telling people they don't read L-K then a new person sending in a patch and not reading L-K all the time is very reasonable. you can't have it both ways.

as for the -ac being the most important secondary tree, that's a matter of opinion, in many cases it is, but in many cases a lot of stuff shows up in it that never will make it to the main tree as well.

And Willy also said to Christoph, "I'm sorry not to agree with you, but with the high number of messages, not everyone has the time to catch them all. It has happened that I missed a thread for several days, and noticed it while being quite advanced in the discussion (OK, I'm not a kernel tree maintainer, but I'm interested in what's being done). And I didn't notice this XFree patch either, and I read nearly all messages. You're lucky if you have all this time to spare here, really." But Randy Dunlap remarked, "Yes, Christoph must spend as much time per day as Alan does on lk email and patches, but that's a good thing. I certainly don't spend as much time as they do."

24. XFS Update For 2.5

27�Aug�2002 (1 post) Archive Link: "[PATCH] XFS core for 2.5.32"

Topics: FS: XFS, MAINTAINERS File, POSIX, Version Control

People: Christoph Hellwig

Christoph Hellwig announced:

This patch includes only the core functionality of the SGI XFS filesystem for Linux 2.5.32. It does NOT include changes for Posix ACLs, dmapi, kdb or other code included in the XFS CVS tree.

The patch adds the self-contained XFS code and makes almost no modifications to existing kernel code. Diffstat output with new files stripped:

 Documentation/Changes              |   16
 Documentation/filesystems/00-INDEX |    2
 MAINTAINERS                        |    8
 fs/Config.help                     |   66
 fs/Config.in                       |    9
 fs/Makefile                        |    1
 include/linux/sched.h              |    1
 include/linux/sysctl.h             |    2
 kernel/ksyms.c                     |    1

Please send any comments to the patch or xfs code to linux-xfs@oss.sgi.com. We know that there are still issues left that need addressing, but feel free to add your items.

The patches can be found at:

ftp://ftp.kernel.org/pub/linux/kernel/people/hch/patches/v2.5/2.5.32/linux-2.5.32-xfs.patch.gz
ftp://ftp.kernel.org/pub/linux/kernel/people/hch/patches/v2.5/2.5.32/linux-2.5.32-xfs.patch.bz2

25. Kernel 2.5 Status For August 28

27�Aug�2002 (1 post) Archive Link: "[STATUS 2.5] August 28, 2002"

Topics: FS: NFS

People: Guillaume Boissiere

Guillaume Boissiere posted his latest 2.5 Status summary (http://www.kernelnewbies.org/status/Status-28-Aug-2002.html) , adding, "Much action this week with the inclusion of Asynchronous IO, the beginning of NFS v4 and the core of the new input layer."

26. Various Consolidated Patches

27�Aug�2002�-�28�Aug�2002 (7 posts) Archive Link: "2.5.32-mm1"

Topics: Big Memory Support, FS: ext3, Kernel Build System, Virtual Memory

People: Andrew Morton

Andrew Morton announced:

URL: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.32/2.5.32-mm1/

Since 2.5.31-mm1:

func-fix.patch
  gcc-2.91.66 does not support __func__

ext3-htree.patch
  Indexed directories for ext3

misc.patch
  page_alloc.c fixlets

tlb-speedup.patch
  Reduce typical global TLB invalidation frequency by 35%

buffer-slab-align.patch
  Don't align the buffer_head slab on hardware cacheline boundaries

zone-rename.patch
  Rename zone_struct->zone, zonelist_struct->zonelist.  Remove zone_t,
  zonelist_t.

per-zone-lru.patch
  Per-zone page LRUs

per-zone-lock.patch
  Per-zone LRU list locking

l1-max-size.patch
  Infrastructure for determining the maximum L1 cache size which the kernel
  may have to support.

zone-lock-alignment.patch
  Pad struct zone to ensure that the lru and buddy locks are in separate
  cachelines.

put_page_cleanup.patch
  Clean up put_page() and page_cache_release().

anon-batch-free.patch
  Batched freeing and de-LRUing of anonymous pages

writeback-sync.patch
  Writeback fixes and tuneups

ext3-inode-allocation.patch
  Fix an ext3 deadlock

ext3-o_direct.patch
  O_DIRECT support for ext3.

discontig-paddr_to_pfn.patch
  Convert page pointers into pfns for i386 NUMA

discontig-setup_arch.patch
  Rework setup_arch() for i386 NUMA

discontig-mem_init.patch
  Restructure mem_init for i386 NUMA

discontig-i386-numa.patch
  discontigmem support for i386 NUMA

cleanup-mem_map-1.patch
  Clean up lots of open-coded uese of mem_map[].  For ia32 NUMA

zone-pages-reporting.patch
  Fix the boot-time reporting of each zone's available pages

enospc-recovery-fix.patch
  Fix the __block_write_full_page() error path.

fix-faults.patch
  Back out the initial work for atomic copy_*_user()

spin-lock-check.patch
  spinlock/rwlock checking infrastructure

copy_user_atomic.patch

kmap_atomic_reads.patch
  Use kmap_atomic() for generic_file_read()

kmap_atomic_writes.patch
  Use kmap_atomic() for generic_file_write()

throttling-fix.patch
  Fix throttling of heavy write()rs.

dirty-state-accounting.patch
  Make the global dirty memory accounting more accurate

rd-cleanup.patch
  Cleanup and fix the ramdisk driver (doesn't work right yet)

discontig-cleanup-1.patch
  i386 discontigmem coding cleanups

discontig-cleanup-2.patch
  i386 discontigmem cleanups

writeback-thresholds.patch
  Downward adjustments to the default dirty memory thresholds

buffer-strip.patch
  Limit the consumption of ZONE_NORMAL by buffer_heads

daniel-rmap-speedup.patch
  Hashed locking for rmap pte_chains

rmap-speedup.patch
  rmap pte_chain space and CPU reductions

wli-highpte.patch
  Resurrect CONFIG_HIGHPTE - ia32 pagetables in highmem

27. Advanced Tracing API Intended To Replace ptrace

28�Aug�2002 (1 post) Archive Link: "advanced tracing API"

People: David Howells

David Howells announced:

I've written an advanced tracing API as a potential replacement for ptrace. It isn't quite complete yet, but sufficient functionality should exist to implement strace.

It works by adding a new system call that deals with file descriptors with "special" files attached (much as sysvipc shm does). The fds are, however, exposed and can be polled. Each fd manages a thread group.

It has full support for threads created with CLONE_THREAD.

Documentation is included in the trace-2532 patch.

Comments would be appreciated.

It is available as a pair of patches to 2.5.32 plus a test/demo program:

ftp://infradead.org/pub/people/dwh/orn-2532.diff.bz2
ftp://infradead.org/pub/people/dwh/trace-2532.diff.bz2
ftp://infradead.org/pub/people/dwh/trctl2.c

Apply the orn-2532 and then the trace-2532 patches to a 2.5.32 kernel, build and install. The trctl2 program needs access to the header files from the patched kernel at the moment.

Run trctl2 under the patched kernel. It will fork off an "inferior" process and begin trapping and displaying certain events from it. The inferior process will then create a set of threads which will then also be managed by the "debugger". These threads can be hit with signals to make events happen.

28. Benchmark Comparing IPv4 And IPv6

28�Aug�2002 (1 post) Archive Link: "IPV4 and IPV6 tcp_stream comparison"

Topics: Networking, SMP

People: Mala Anand

Mala Anand announced, "I did a comparison test of IPV4 and IPV6 using 2.4.17 kernel for IPV4 and 2.4.17 kernel+USAGI-linux24-s20020415-2.4.17.diff patch running netperf3, tcp_stream 1 adapter, 2 adapters test on UNI, SMP kernels using a 2-way machine. The test setup/results/profile can be found at: http://www-124.ibm.com/developerworks/opensource/linuxperf/netperf/results/may_02/netperf3_ipv6_2.4.17resutls.htm"

29. Linux 2.4.20-pre5

28�Aug�2002 (3 posts) Archive Link: "Linux 2.4.20-pre5"

Topics: FS: EFS, FS: devfs, FS: ext3, Hot-Plugging, I2O, Ioctls, Modems, Networking, PCI, Power Management: ACPI, Real-Time, USB, Web Servers

People: Marcelo Tosatti,�Hugh Dickins,�Scott Feldman,�Tom Rini,�David S. Miller,�Tim Waugh,�Rusty Russell,�James Morris,�Rob Radez,�Alan Cox,�Neil Brown,�Hanna Linder,�Jeff Garzik,�Alexey Kuznetsov,�Brian Beattie

Marcelo Tosatti announced 2.4.20-pre5 and said:

Mainly merging with Alan and others.

Summary of changes from v2.4.20-pre4 to v2.4.20-pre5
============================================

<andersen@codepoet.org>:
  o 2.4.20-pre[234] hosed /proc/partitions fix

<bhavesh@avaya.com>:
  o Fix scheduler's RT behaviour

<danc@mvista.com>:
  o PPC32: Add support for SBS Palomar 4 board

<davem@pizda.ninka.net>:
  o SPARC64: Initial Cheetah-plus support, not fully debugged yet

<dwmw2@redhat.com>:
  o Another JFFS2 oops fix

<dz@cs.unitn.it>:
  o latest version of i8k module

<engebret@us.ibm.com>:
  o Re: [PATCH] PPC64 update to 2.4.19-rc1

<hch@lst.de>:
  o Merge ETHTOOL_GDRVINFO support for several pcmcia net drivers
  o update drm to XFree 4.2 version
  o use -iwithprefix to find gcc headers
  o fix theoretical race init pagefault init survive path

<james@cobaltmountain.com>:
  o drivers_usb_usb-uhci.c, typo: the the, missing 'of'
  o drivers_usb_auerswald.c, typo: the the
  o net/ipv4/netfilter/ip_conntrack_core.c: Fix comment typo
  o net/ipv4/netfilter/ip_nat_core.c: Fix comment typo

<jani@iv.ro>:
  o tridentfb bitdepths in Config.in

<jgarzik@tout.normnet.org>:
  o Correct xdr_shift_buf prototype in inc/linux/sunrpc/xdr.h to match
implementation (s/unsigned int/size_t/).

<jsiemes@web.de>:
  o net/ipv4/ipconfig.c: Add support for multiple nameservers

<jwoithe@physics.adelaide.edu.au>:
  o Support for Buffalo 40GB USB hard disk

<kisza@sch.bme.hu>:
  o net/ipv6/netfilter/ip6_tables.c: Fix extension header parsing bugs

<mark@alpha.dyndns.org>:
  o USB: ov511 1.61 for 2.4

<paulus@au1.ibm.com>:
  o PPC32: add support for the IBM "Spruce" reference platform
  o PPC32: clean up the interrupt handling on the APUS platform

<sct@redhat.com>:
  o 2.4.20-pre4/ext3: Handle dirty buffers encountered
  o 2.4.20-pre4/ext3: Fix "buffer_jdirty" assert failure
  o 2.4.20-pre4/ext3: Fix the "dump corrupts filesystems"
  o 2.4.20-pre4/ext3: Fix buffer alias problem
  o 2.4.20-pre4/ext3: Truncate leak fix
  o 2.4.20-pre4/ext3: Fix out-of-inodes handling
  o 2.4.20-pre4/ext3: fsync optimisation
  o 2.4.20-pre4/ext3: Fix truncate restart error
  o 2.4.20-pre4/ext3: Performance fix for O_SYNC behaviour

<solar@openwall.com>:
  o net/unix/af_unix.c: Set ATIME on socket inode

Alan Cox <alan@lxorguk.ukuu.org.uk>:
  o SBUS: extern->static inline
  o these were wrong - they've been right in -ac for ages
  o add config.in for new synclink mp
  o parisc config.in
  o note the initrd vanishing bug and block size issue
  o docs for isapnp update in pre4
  o make synclink vars static
  o fix wrap handling in ieee1394
  o fix warning in i2o
  o set DMA mask in i2o
  o typo fixes for aic7xxx
  o ixj wrong definition
  o zorro proc should use loff_t too
  o hppa also needs a weird kstat
  o only egcs had this problem so dont pad on 2.95+
  o cache align the irq stat
  o sparc64 fix pcibios for changes in pre4
  o new dmi entries
  o long standing khttpd fix
  o generic part of rw trylocks
  o update parport ifdefs for HPPA
  o resend - HIL input bus
  o down_write_trylock
  o fix EFS on cd crash
  o add hppa to fbcon data
  o quieten the latency message
  o ppc64 missing ioctl32 gunk
  o hppa like ia64 doesnt use the old ipc structs
  o new sem_getcount means this cna go
  o more typo fixes
  o typo fixes ctd
  o fix the via rhine
  o fix bttv_read type error
  o fix detected_devices type error
  o isdn gcc warning fixer
  o vt.c clean up ifdefs
  o update /proc description
  o journalling docs
  o PCI fixes
  o docs for ldm update
  o ps2esdi - wrong bit
  o driver for AMD watchdog
  o add synclink_mp
  o saner error return for hotplug
  o i2o typo fix
  o e1000 - return without code
  o decruft smodem
  o fix pci_release/request_regions bugs
  o fix __FUNCTION__ in irda-usb

Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>:
  o arch/i386/lib/checksum.S: Handle zero length

Brian Beattie <beattie@beattie-home.net>:
  o patch for 2.4 scanner.h add device ids

David S. Miller <davem@nuts.ninka.net>:
  o arch/sparc64/defconfig: Update
  o include/linux/sunrpc/svcsock.h: Make sk_flags a long
  o include/linux/sunrpc/svcsock.h: sk_flags must be a long for bitops
  o SPARC: Update for changed pcibios_enable_device args
  o include/linux/sunrpc/xdr.h: Kill xdr_zero_buf decl, fix xdr_shift_buf args
  o arch/sparc64/mm/ultra.S: Fix branch condition in __cheetah_flush_tlb_range
  o include/asm-sparc/types.h: No need to make dma64_addr_t 64-bits on sparc32
  o SPARC64: Fix obscure cheetah+ hangs
  o TIGON3: Add missing udelay when clearing SRAM stats/status block
  o SPARC64: Fix DRM to use new not old drivers
  o net/unix/af_unix.c: Set msg_namelen in unix_copy_addr properly, define
MODULE_LICENSE
  o net/ipv4/tcp_diag.c: Avoid unaligned accesses to tcpdiag_cookie
  o SPARC64:setup_arch Flush correct I-cache line when patching irqsz_patchme
  o SPARC64: Ultra-III+ bug fix and better bad trap logging

Greg Kroah-Hartman <greg@kroah.com>:
  o USB: documentation updates
  o USB: ov511 driver update to the latest version
  o USB: pegasus driver update to the latest version
  o microtek driver update to the latest version
  o wacom driver update to fix incorrect data problem
  o USB: minor cleanups and __FUNCTION__ fixes
  o USB: fix some USB 2.0 hub bugs
  o update to latest version of rtl8150 driver
  o minor printer driver fixes
  o stv680 driver update to latest version
  o USB: usb-ohci bug fix for slow machines and cardbus bug fix
  o USB: uhci incorrect bit operations and FSBR timeout fixes
  o added Configure.help entry for the ACPI PCI Hotplug driver
  o PCI Hotplug: fixed oops when accessing pcihpfs

Hanna Linder <hannal@us.ibm.com>:
  o path_lookup for 2.4.20-pre4

Hugh Dickins <hugh@veritas.com>:
  o M386 flush_one_tlb invlpg

James Morris <jmorris@intercode.com.au>:
  o [NETFILTER]: ip{,6}_queue.c cleanups and fixes

Jeff Garzik <jgarzik@mandrakesoft.com>:
  o Fix 8139cp 64-bit DMA support
  o Update e1000 net driver for two small ethtool fixes

Marcelo Tosatti <marcelo@plucky.distro.conectiva>:
  o Revert broken cpqarray statistics change in previous -pre
  o Readded context_swtch to kernel_stat structure
  o Changed EXTRAVERSION to -pre5

Neil Brown <neilb@cse.unsw.edu.au>:
  o SUNRPC 1 of 3 - The new "sk_flags" word in struct svc_sock
  o SUNRPC 2 of 3 - Fix two problems with multiple concurrent
  o SUNRPC 3 of 3 - Call svc_sock_setbufsize when socket

Rob Radez <rob@osinvestor.com>:
  o SPARC32: Sparc32 compile fixes with CONFIG_PCI enabled

Rusty Russell <rusty@rustcorp.com.au>:
  o [PATCH] duplicate declarations #2
  o 2.5: kconfig missing OBSOLETE (2_3) again
  o Documentation_filesystems_devfs_README, typo: the the
  o Trivial Patch to SonyCD535 documentation
  o drivers_net_rcpci45.c, typo: the the
  o drivers_net_pcmcia_xircom_cb.c, typo: the the,
  o Re: pci_alloc_consistant gfp flag fix
  o drivers_net_winbond-840.c, typo: the the
  o list_for_each_entry

Scott Feldman <scott.feldman@intel.com>:
  o e100 net driver update 1/3
  o e100 net driver update 2/3
  o e100 net driver update 3/3
  o e1000 net driver update 1/5
  o e1000 net driver update 2/5
  o e1000 net driver update 3/5
  o e1000 net driver update 4/5
  o e1000 net driver update 5/5

Tim Waugh <twaugh@redhat.com>:
  o 2.4.20-pre4: parportbook thinko

Tom Rini <trini@kernel.crashing.org>:
  o PPC32: separate finding and parsing the info from the boot wrapper
  o PPC32: implement hooks for extra PCI fixups needed on some platforms
  o PPC32: Add hooks for Abatron BDI2000 debugger, extra compile flags

30. uClinux Update For 2.5

28�Aug�2002 (1 post) Archive Link: "[PATCH]: linux-2.5.32uc1 (MMU-less patches)"

People: Greg Ungerer

Greg Ungerer announced:

A new MMU-less patch, linux-2.5.32uc1. You can get it at:

http://www.uclinux.org/pub/uClinux/uClinux-2.5.x/

A few minor fixes:

31. i386 Individual CPU Selection

28�Aug�2002 (1 post) Archive Link: "[PATCH 3 / 4] i386 individual CPU selection"

Topics: SMP

People: Luca Barbieri

Luca Barbieri posted a patch and explained, "This patch changes the CPU selection mechanism so that each CPU is an independent y/n choice. The advantage of this is that the user knows exactly and has full control over the range of CPUs supported by the kernel. Without this patch it's not clear, for example, how to build a kernel that will work on both K6s and WinChips. In addition to the processor selection, a choice is added for the CPU that the kernel should be optimized, which is used for the -mcpu switch."

32. i2c Updates For 2.4

28�Aug�2002 (1 post) Archive Link: "[patch 1/5] 2.4.20-pre5 i2c updates"

Topics: I2C

People: Albert Cranford

Albert Cranford posted a patch and said:

Attached are i2c patches that bring the kernel to the latest released and tested version. Updates include:

Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.