Kernel Traffic #119 For 21�May�2001

By Zack Brown

linux-kernel FAQ (http://www.tux.org/lkml/) | subscribe to linux-kernel (http://www.tux.org/lkml/#s3-1) | linux-kernel Archives (http://www.uwsg.indiana.edu/hypermail/linux/kernel/index.html) | kernelnotes.org (http://www.kernelnotes.org/) | LxR Kernel Source Browser (http://lxr.linux.no/) | All Kernels (http://www.memalpha.cx/Linux/Kernel/) | Kernel Ports (http://perso.wanadoo.es/xose/linux/linux_ports.html) | Kernel Docs (http://jungla.dit.upm.es/~jmseyas/linux/kernel/hackers-docs.html) | Gary's Encyclopedia: Linux Kernel (http://members.aa.net/~swear/pedia/kernel.html) | #kernelnewbies (http://kernelnewbies.org/)

Table Of Contents

Mailing List Stats For This Week

We looked at 1227 posts in 4904K.

There were 423 different contributors. 193 posted more than once. 161 posted last week too.

The top posters of the week were:

1. Fast User-Space Web Server

27�Apr�2001�-�9�May�2001 (19 posts) Archive Link: "X15 alpha release: as fast as TUX but in user space"

Topics: Microkernels, PCI

People: Ingo Molnar,�Fabio Riccardi,�Davide Libenzi

In response to the recent announcement of the X15 web server, a user-space server that outperformed the in-kernel TUX, Ingo Molnar said:

This should put the accusations to rest that Linux got the outstandingly high SPECweb99 scores only because the webserver was in kernel-space. It's the 2.4 kernel's high performance that enabled those results, having the web-server in kernel-space didnt have much effect. TUX was and remains a testbed to test high-performance webserving (and FTP serving), without the API-exporting overhead of userspace.

[i suspect the small performance advantage of X15 is due to subtle differences in the SPECweb99 user-space module: eg. while the TUX code was written, tested and ready to use mmap()-enabled TUXAPI_alloc_read_objectbuf(), it wasnt enabled actually. I sent Fabio a mail how to enable it, perhaps he can do some tests to confirm this suspicion?]

doing a TUX 2.0 SPECweb99 benchmark on the latest -ac kernels, 86% of time is spent in generic parts of the kernel, 12% of time is spent in the user-space SPECweb99 module, and only 2% of time is spent in TUX-specific kernel code.

doing the same test with the original TUX 1.0 code shows that more than 50% of CPU time was spent in TUX-specific code.

what does this mean? In the roughly 6 months since TUX 1.0 was released, we moved much of the TUX 1.0 -only improvements into the generic kernel (most of which was made available to user-space as well), and TUX itself became smaller and smaller (and used more and more generic parts of the kernel). So in effect X15 is executing 50% TUX code :-)

(there are still a number of performance improvement patches pending that are not integrated yet: the pagecache extreme-scalability patch and the smptimers patch. These patches speed both X15 and TUX up.)

(there is one thing though that can never be 'exported to user-space': to isolate possibly untrusted binary application code from the server itself, without performance degradation. So we always have to be mentally open to the validity of kernel-space services.)

Fabio Riccardi, the primary developer of X15, replied:

Linux 2.4 is surely one of the most advanced OSs ever happened, especially from the optimization point of view and for the admirable economy of concepts on which it lies. I definitively hope that X15 helps reinforcing the success to this amazing system.

TUX has definitively been my performance yardstick for the development of X15, but I had many sources of inspiration for the X15 architecture. Maybe the most relevant are the Flash Web Server (Pai, Druschel, Zwaenepoel), several Linus observations on this list about (web) server architecture and kernnel services, and the reading of the Hennessy & Patterson architecture books. Last but not least, aside from some heated discussions, research in microkernel architecture has taught us many lessons on how to achieve an efficient model of interaction across separate addressing spaces.

If i have to make some sort of educated guess and point at where the current bottleneck lies for web server performance, I would say that it is somewhere between the memory subsystem and the PCI bus.

With zero-copy sendfile data movement is not an issue anymore, asynchronous network IO allows for really inexpensive thread scheduling, and system call invocation adds a very negligible overhead in Linux. What we are left with now is purely wait cycles, the CPUs and the NICs are contending for memory and bus bandwidth. It would be really interesting to see where the network shifts now that faster machines are becoming available.

On my whish list for future kernel developments I would definitively put disk asynchronous IO and a more decent file descriptor passing implementation. I'll detail this in subsequent messages.

I'll surely check out the impact of Ingo's patches on TUX performance sometime this week.

I'd also like to reiterate my request for help for testing X15 on higher end server architectures.

X15 is still very young alpha code and I can surely improve its performance in many ways.

Later, Fabio explained the binary-only licensing of X15:

Our intention is to release X15 with an open source license.

This will happen as soon as the codebase stabilizes a bit, that is when we go beta (in two - three weeks).

At the moment we just don't have the time...

The reason why I released the alpha binary version is that several people would not believe that a user-space server with this level of performance would be possible at all and several statements that I made on this list were challenged.

Besides I really appreciate the feedback that I received so far from Ingo and others, and I'd be very curious to know if anybody did any performance evaluation at all.

Ingo found a bug in X15, in which it would serve cached copies of files while TUX would serve the new file immediately. Fabio replied that this was a known problem, but that he'd been too lazy to fix it. A couple days later he'd fixed it with no performance penalty, and gave a link to the new version (http://www.chromium.com/X15-Alpha-3.tgz) , but Ingo now said:

i noticed another RFC anomaly in X15. It ignores the "Connection: close" request header passed by a HTTP/1.1 client. This behavior is against RFC 2616, a server must not override the client's choice of non-persistent connection. (there might be HTTP/1.1 clients that do not support persistent connections and signal this via "Connection: close".)

the rule is this: a request is either keepalive or non-keepalive. HTTP/1.0 requests default to non-keepalive. HTTP/1.1 requests default to keepalive. The default can be overriden via the "Connection: Keep-Alive" or "Connection: close" header fields.

if you fix this, does it impact SPECweb99 performance in any way?

Fabio thanked Ingo heartily for all the feedback, and was really impressed by Ingo's bug-hunting skills. He fixed the problem, and reported no change in the SPECweb99 results. Elsewhere, Ingo reported:

yet another anomaly i noticed. X15 does not appear to handle pipelined HTTP/1.1 requests properly, it ignores the second request if two requests arrive in the same packet.

SPECweb99 does not send pipelined requests, but a number of RL web clients do. (Mozilla, apt-get, etc.)

Fabio pleaded ignorance on what pipelined requests were, and Davide Libenzi gave a link to a page (http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html) . Several days later Fabio replied:

I have uploaded a new release of X15 that hopefully solves all the RFC bugs. I say hopefully because I haven't had the opportunity to fully test the request pipelining. Is there anything to automatize such tests?

From what I could measure X15 is still a good 5% faster than TUX.

You can find the file at: http://www.chromium.com/X15-Alpha-4.tgz

BTW: Next release (in a week or so) will be a beta and it will include source code!

2. Problems With Via Chipsets

8�May�2001�-�9�May�2001 (7 posts) Archive Link: "Question: Status of VIA chipsets and 2.2 kernels"

People: Robert Cohen,�Gerhard Mack

Robert Cohen asked, "What with all the various problem reports flying around for via chipsets, Ive lost track of the state of play as regards via northbridges and south bridges. I am thinking of buying a machine with a via chipset and I wan't to know how stable it is likely to be with Linux. I would appreciate it if someone who know's whats going on can give a report on the state of play as regards to all the problems and their current status with 2.2 kernels (and 2.4 if their feeling energetic)." Gerhard Mack made a wry face, saying, "Ugh why VIA? They have been a constant source of trouble for me on both linux and windows. I have my doubts about their ability to get a chipset right in the first place." He suggested Asus A7M266 (AMD chipset) and Asus A7A266 (ALI chipset). Robert replied:

I'm wary of using an Ali chipset. They are even less common than the VIA so just havent had the exposure to root out problems.

Also the main feature I'm looking for is a machine with 768 Meg or 1G ram at a reasonable price. Hence I want to use 256 Meg dimms. I can't use an i815 chipset as this tops out at 512 Meg. The apollo pro board is one of the few that has 4 dimm slots allowing 1 Gig of memory. The athlons boards only have 3 dimm slots so top out at 768 Meg.

I'm wary of using DDR dram. The chipsets havent been round long enough to have much of a track record. And the ram is too expensive. Also the A7M266 is using a VIA 686b southbridge anyway which I thought was the source of the problems. Anyway these boards only tend to have 2 DDR dimm slots and the biggest DDR dimm that crucial sells is 256 Meg. So I would be limited to 512 Meg.

Maybe I have to bite the bullet and go with 512 Meg dimms. They only appear to be available in registered with ECC which makes them cost about twice as much per meg and which I wasnt sure that all boards support. What motherboards/chipsets to people recommend for machines with 1Gig+ ram.

3. Status Of Linux Kernel License

9�May�2001�-�13�May�2001 (9 posts) Archive Link: "Nasty Requirements for non-GPL Linux Kernel Modules?"

Topics: BSD, Patents

People: Linus Torvalds,�Scott C. Karlin,�Alan Cox

Scott C. Karlin, referring to the discussion covered in Issue�#85, Section�#9� (6�Sep�2000:�Non-GPLed Drivers) , specifically drew attention to Linus' quote, "whenever it's not GPL'd, all the module restrictions kick in. So it's going to be "legal" the same way any binary only module is "legal" - assuming all the nasty requirements are met." Scott asked what was meant, specifically, by "nasty requirements". He asked:

If I don't hear from Linus directly, can someone point me to a document, file, or mailing list thread where this might be spelled out.

Alan Cox replied:

If you want to do binary only then it depends solely how your lawyers intend to interpret the concept of 'linking'. Linus comments on the matter have no impact since the kernel isnt all his copyright and he has linked in code by bodies who are most definitely opposed to binary modules.

The same applies for source code under 'additional restrictions' as the GPL calls things disallowing stuff it allows.

If you are releasing modules with source under terms that are at least as free as the GPL (eg BSD without advertising clause) then nobody has any cares. We probably wouldnt merge it with the mainstream kernel due to the lack of patent trap protection in the BSD license but I suspect you dont want that anyway.

4. 2.4.4 Intentionally Breaks Source Compatibility With 2.4.3

11�May�2001 (13 posts) Archive Link: "Source code compatibility in Stable series????"

People: Rogier Wolff,�David S. Miller,�Andi Kleen

Rogier Wolff reported:

It seems that in 2.4.4 suddenly the function "skb_cow" no longer returns the modified skb, but it retuns and integer for succes/failure.

This means that for networking modules requiring this function, there is no source code compatibilty between 2.4.3 and 2.4.4.

David S. Miller replied, "And skb_datarefp went away too, in fact a ton of things changed. Just deal with it."

Elsewhere, Rogier mentioned, "it's always been said that source code compatiblity would be maintained. I'm a bit pissed that people just go about changing public source-level interfaces." David replied:

"when possible", we've made no such total souce level compat. guarentee. And more such changes are coming, for example the quota bugs can't be fixed without breaking source level compat. for the filesystems.

You may think and argue otherwise, but our ability to break source level compatibility is one of our strengths (see solaris rsh root owned socket bug of yesteryear for one example as to why).

At around this point Andi Kleen remarked, "2.4.4 is basically like 2.5.0 as far as networking is concerned, it includes major fundamental changes to the stack." David said, "Andi, please. Get over it. That code is 6 months old."

5. Alan Moves 2.4 -ac Patches and 2.2 To New Server

11�May�2001 (7 posts) Archive Link: "Linux 2.4.4-ac7"

Topics: Modems

People: Alan Cox

Alan Cox posted the latest -ac patch and announced, "Please note change of ftp site. ftp.kernel.org switched to using ECN and it seems NTL's cablemodem folks have problem firewalls between their Inktomi cache and the world. The -ac patches and future 2.2.20pre will be distributed from ftp.linux.org.uk until further notice." There was no discussion.

6. Porting User-Mode Linux

11�May�2001 (2 posts) Archive Link: "User-mode Linux ported to ppc"

Topics: User-Mode Linux, Version Control

People: Chris Emerson,�Jeff Dike

Chris Emerson announced:

User-mode Linux is now booting on PPC Linux - it can boot with a Debian root floppy image with init=/bin/sh and poke around. It mostly works, although there are still a few problems.

The current patch is available from http://www.tartarus.org/~chris/user-mode-linux/, made against recent UML CVS (see http://user-mode-linux.sourceforge.net).

Jeff Dike replied:

First off, I'd like to thank Chris for volunteering to undertake the first port of UML and seeing it through to the point where it's basically working. It's a nice demonstration, if any were needed, that UML isn't i386-only.

Based on what I've learned from this port, I'm writing up what amounts to a UML porting guide. It will be found at http://user-mode-linux.sourceforge.net/arch-port.html when I have something ready. It will be incomplete at first - I'll be filling it in as I go through the existing code and as I finish integrating Chris's code into my pool.

So, if anyone wants to port UML to another arch, have a look at that page (and continue looking as I fill it in :-). You'll see that it's not a huge amount of work. UML is fairly portable.

7. New SCSI Driver For the NCR Dual 700 Microchannel Card

12�May�2001�-�14�May�2001 (10 posts) Archive Link: "[NEW SCSI DRIVER] for 53c700 chip and NCR_D700 card against 2.4.4"

Topics: Disks: SCSI

People: James Bottomley,�Alan Cox,�Richard Hirst,�Andries Brouwer

James Bottomley announced:

Attached is a driver for the NCR Dual 700 Microchannel card. Since the chip engine of this card is the 53c700-66, which appeared in quite a few other SCSI cards as well, I've abstracted the chip function (in much the same way as the 8390 chip function is abstracted in network cards) so that it should be easy to link it into any other SCSI card using it. As you can see, the actual board specific code is about 150 lines.

The chip driver is full featured (sync (where supported), disconnects and tag command queueing). It will drive both single ended and differential interfaces and uses the new SCSI error handler.

I know we have two drivers that claim to do these chips (53c7xx and 53c7,8xx) but if you actually compile them for this chip, they are completely broken. The chip itself is extremely primitive (not having the table indirect mode, which is the backbone of most of the later drivers) so it makes much more sense to give it its own driver.

The chip driver is currently I/O mapped (because the only cards I know using the chip are I/O mapped), but could easily be made memory mapped as well, just let me know.

Andries Brouwer was very happy to hear about this development, and mentioned that he thought Richard Hirst had also done work on some of this as well. Alan Cox replied, "He did 53c710+. The 700 and 700/66 are much less capable devices. According to http://www.murphy.nl/~ard/systems/pws/pws/node18.html the NCR 53c700/66 is mapped at 0xCC0-0xCFF." Richard Hirst added, "I did 53c700 as well, in the parisc-linux tree. Sounds like James' driver is more featureful than mine though." Alan replied, "I'll skip feeding the driver on to Linus until the two of you figure out the best path then."

8. Linux Support For Microsoft Dynamic Disks

12�May�2001�-�14�May�2001 (12 posts) Archive Link: "Linux support for Microsoft dynamic disks?"

Topics: FS: NTFS, Microsoft

People: Andries Brouwer,�Anton Altaparmakov,�Jeff V. Merkey

Anton Altaparmakov asked if anyone was working on support for Windows 2000's dynamic disk format. Andries Brouwer replied, "I once collected some stuff from the Microsoft Knowledge Base. In http://www.win.tue.nl/~aeb/partitions/partition_types-1.html (hint: additions and corrections are welcome!) you find partition type 42 that marks a partition table as legacy. Unfortunately I do not have Windows 2000. (But I have DOS 4.01 :-)"

Jeff V. Merkey asked what Anton specifically wanted to know, and Anton replied:

What is the on disk layout of Win2k's dynamic disk, i.e. Logical Disk Manager (LDM) database structures? - The article "Inside Storage Managment, Part 2" by Mark Russinovich in Windows 2000 magazine (full text available freely at: http://www.win2000mag.com/Articles/Index.cfm?ArticleID=8303) describes in detail the logical layout of the LDM database, but it doesn't cover enough detail to go off and implement it in Linux (without a certain amount of reverse engineering).

Linux needs to understand the LDM database in order to support dual-boot Win2k (or XP) and Linux configurations where there are one or more dynamic disks present in the system and the user wants to access their NTFS partitions residing on the dynamic disk(s) from Linux.

Just saying "Don't use dynamic disks if you want to use Linux" is IMHO a Bad Thing(TM) as a user might have bought a computer with Win2k preinstalled on a dynamic disk or, even worse, might have been using Win2k only previously, and then the user wants to also install Linux on it. In these cases the user would have to reformat the whole system and start from scratch unless Linux supports dynamic disks...

There was some more discussion, but nothing conclusive.

9. Status Of "Linux Device Drivers" Upcoming Publication

15�May�2001 (4 posts) Archive Link: "Linux kernel programming for beginners"

People: Jonathan Corbet,�Eli Carter

Bohdan Vlasyuk asked about resources for beginning kernel hackers, and Jonathan Corbet replied, "if you can wait just a little longer, O'Reilly tells me that the second edition of Linux Device Drivers should hit the shelves on June 28. We're still working on the right license for the online release - if people have suggestions, I would be glad to receive them privately." Eli Carter asked Jonathan to please make an announcement on linux-kernel when the book became available.

Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.