Table Of Contents
|1.||14 Apr 1999 - 30 Apr 1999||(80 posts)||Responses To Mindcraft|
|2.||14 Apr 1999 - 29 Apr 1999||(11 posts)||Quota Bug Fixes|
|3.||23 Apr 1999 - 27 Apr 1999||(17 posts)||Filesystem For Audio CDs|
|4.||23 Apr 1999 - 26 Apr 1999||(3 posts)||Files Larger Than A Single Volume In CODA|
|5.||23 Apr 1999 - 27 Apr 1999||(5 posts)||ACPI For Linux|
|6.||24 Apr 1999 - 27 Apr 1999||(8 posts)||Dynamically Changing RAMdisk Size|
|7.||18 Apr 1999 - 29 Apr 1999||(19 posts)||Performance Finagling|
|8.||24 Apr 1999 - 27 Apr 1999||(5 posts)||'make xconfig' Problem|
|9.||24 Apr 1999 - 26 Apr 1999||(5 posts)||Debian Problem|
|10.||24 Apr 1999 - 26 Apr 1999||(5 posts)||Purpose Of ioremap()|
|11.||25 Apr 1999 - 26 Apr 1999||(3 posts)||Floating Point Inside The Kernel|
|12.||27 Apr 1999 - 29 Apr 1999||(6 posts)||Bug In A Fix|
|13.||29 Apr 1999||(2 posts)||Minor Legacies|
|14.||29 Apr 1999||(6 posts)||Possible Bug In TCP Stack|
|15.||29 Apr 1999 - 30 Apr 1999||(7 posts)||Fixes For Rarely Touched Code|
|16.||30 Apr 1999 - 4 May 1999||(5 posts)||Sangoma Wanpipe Still Not Ready|
|17.||29 Apr 1999 - 30 Apr 1999||(3 posts)||First Inkling Of 2.3 Coming Up|
Mailing List Stats For This Week
We looked at 849 posts in 3183K.
There were 375 different contributors. 149 posted more than once. 127 posted last week too.
The top posters of the week were:
Responses To Mindcraft
14 Apr 1999 - 30 Apr 1999 (80 posts) Archive Link: "NT reported faster than Linux 2.2.2 SMP"
Topics: Disk Arrays: RAID, Microsoft, Networking, Patents, SMP, Samba, Version Control
People: Alan Cox, Robert G. Savage, Ingo Molnar, Richard Gooch, Dean Gaudet, Stephen C. Tweedie, Greg Lindahl, Rik van Riel, Mitchell Blank Jr
There was some linux-kernel activity in response to the Mindcraft study. Not as much as might be expected, considering the general outrage on the net at such an obviously rigged test. Now that linux-kernel's Mindcraft threads are all pretty much dead, it seems that their main upshot was that documentation on Linux tuning will probably be forthcoming. In other words, Microsoft helped us out by reporting a documentation bug. Thanks, Bill! ;-)
Under the Subject: NT reported faster than Linux 2.2.2 SMP (http://www.kernelnotes.org/lnxlists/linux-kernel/lk_9904_02/msg00908.html) , Tom Holroyd gave links to a NewsAlert article (http://www.newsalert.com/bin/story?StoryId=CnXlbqbWbu0zuvta3o&FQ=linux&SymHdl=1&Nav=na-search-&StoryTitle=linux) and the study itself (http://www.mindcraft.com/whitepapers/nts4rhlinux.html) . He pointed out that Mindcraft had tuned NT but not Linux. Alan Cox replied, "they carefully went back to 2.2.2 which is the one with the known performance interactions with NT. I guess someone at MS did their research 8)"
Robert G. 'Doc' Savage said, "Mindcraft also used the v0.92 MegaRAID driver. An SMP race condition was fixed in v0.93 which was almost certainly available from the AMI web site long before the Mar 10-13 test."
Under the Subject: Linux vs. tNT ;-) Can sb explain this? (http://www.kernelnotes.org/lnxlists/linux-kernel/lk_9904_02/msg00920.html) , Carlos Costa Portela pointed to the Mindcraft study and a parody of it (http://www.gcs.bc.ca/bem/editorials/nts4rhlinux.shtml) (as of KT press time, the page seems dead, but they give the location of a temporary copy (http://linuxdvd.netpedia.net/bem/editorials/nts4rhlinux-test.shtml) ) that he hadn't realized was a parody. Mitchell Blank Jr pointed that out to him, and pointed to the LWN page covering Mindcraft (http://lwn.net/1999/features/MindCraft.phtml) . Mitchell mistakenly called it the Netcraft study (a typo -- he corrected himself a few minutes later), Ingo Molnar also corrected him and, as an aside, added that the April Netcraft study (http://www.netcraft.com/Survey/Reports/199904/platform.html) "shows that NT based server usage is starting to drop drastically after months of stagnation. [NT platform relative usage is down to 24.97%, thats a -0.66% drop from a month ago, pretty large considering the millions of hosts involved." He also pointed out that Microsoft's leaked Halloween II (http://www.opensource.org/halloween2.html) memo "admitted that their own NT Performance Group found Linux _2.0_ to be on equal footing regarding Samba performance :)" BTW, the Halloween documents live at http://www.opensource.org/halloween.html. If you haven't read them yet, go now.
Regarding the Netcraft study, Richard Gooch pointed out that MS market share has dipped and recovered before now. He pointed to the current Netcraft graphs (http://www.netcraft.com/Survey/Reports/current/graphs.html) and said, "what I find interesting from the graphs is that it looks like the M$ installed base is growing because of the general growth in the industry. Their market share has been static for 6+ months and they've had no significant growth for over a year. They're just riding the wave." He added, "If it was any other company, their market share would already have declined. The interesting question is when will they start to experience consistent declines, month after month?"
Under the Subject: Linux Tuning (http://www.kernelnotes.org/lnxlists/linux-kernel/lk_9904_03/msg00415.html) , someone pointed out the ZDNet reaction (http://www.zdnet.com/zdnn/stories/news/0,4586,2242246,00.html) to the Mindcraft study, which agreed that Linux tuning information is difficult to find. The poster agreed, and suggested that some kind of discussion on centralizing Linux tuning information would be a good idea. brent verner announced that he has started such a project at http://www.linux1.org/.
Tony Gale suggested that the document, Tuning Compaq Tru64 UNIX for Internet Servers (http://www.unix.digital.com/internet/tuning.htm) , would be a good thing to imitate. Greg Lindahl added that a way to automate such tuning information would be a good thing, and volunteered to do it once the information was documented. Various folks offered to host these projects. Rik van Riel set up a CVS tree for the potential documentation and pointed to a manual (http://www.nl.linux.org/~riel/CVS/CVS-linuxperf.txt) for its use. There followed a bit of discussion about how the project should be organized and what sorts of things it should include.
Under the Subject: 2.2.5 optimizations for web benchmarks? (http://www.kernelnotes.org/lnxlists/linux-kernel/lk_9904_03/msg00146.html) , Someone wanted some tuning help and there was a bit of discussion. Then (a day later) Stephen C. Tweedie gave a link to The Linux Scalability Project (http://www.citi.umich.edu/projects/citi-netscape/) , which is trying to tune the kernel for high server loads. He also reiterated one of the points in the prior discussion: that apache itself is where most of the benefits of tuning will be gained.
Apache developer Dean Gaudet replied, "Uh I dunno. Unless by tuning you mean "replace apache with something that's actually fast" ;)" and added "Really, with the current multiprocess apache I've never really been able to see more than a handful of percentage improvement from all the tweaks. It really is a case of needing a different server architecture to reach the loads folks want to see in benchmarks."
He went on to give an interesting recent history of apache:
That said, people might be interested to know that we're not dolts over at Apache. We have recognized the need for this... we're just slow. I did a pthread port last year, and threw it away because we had portability concerns. I switched to Netscape's NSPR library to solve the portability concern. That was last spring... then summer happenned and I found other things to do. In the interim IBM joined the apache team, and showed us how the NPL sucks (patent clause), and that using NSPR would be a bad thing.
Months went on in this stalemate... we finished 1.3.0, .1, ... We kept hoping netscape's lawyers would see the light and fix the NPL. That didn't look hopeful -- so IBM started up a small team to redo the threaded port, using everything I'd learned (without looking at my code... 'cause it was NPL tainted), and port to pthreads. Their goal: beat their own webserver (Go). This port is called apache-apr, and as of today someone posted saying they'd served 2.6 million hits from apache-apr over a 4 day period. Not a record or anything, but an indication of stability.
Oh, netscape fixed the patent clause. Or they're supposed to be releasing the fix. But we're down the road far enough now we won't turn back.
At this point apache-apr isn't in a state where we want zillions of people using it, because it's probably still full of bugs. But if you really want it, visit http://dev.apache.org/ and dig around in our cvs stuff... just don't expect hand holding.
Oh, to forestall anyone saying "apache should be a poll/event-loop style server to go the fastest"... yes, you're bright (but probably wrong, and if I digress any further I'll make myself puke). Apache will never be the fastest webserver, because that isn't our goal. Our goal is correctness, and useability. Performance at this level is mostly a marketing gimick.
Stephen asked if there were any apache tuning info, and Dean pointed to http://www.apache.org/docs/misc/perf-tuning.html. Greg Lindahl replied that the document didn't give any tips on how to tune Linux, and Dean replied, ""Tuning linux" is even more a black art and I wasn't about to write up everything. Plus it changes every couple kernel versions and libc versions anyhow. It's a nightmare to keep up to date any documentation surrounding linux internals." He went on, "I'm saying this from the point of view of the person who answers the apache bugdb mail regarding linux problems. 90% of what I end up having to say to people is "well gee it works fine on all machines I have access to, maybe it's your distribution/kernel version/libc version/phase of the moon/colour of your hair/..." there are just too many variables that make linux inconsistent. Take it as a light flame." Greg gently reminded him that the point of the discussion was to get something written up, and Dean's input would be appreciated.
Quota Bug Fixes
14 Apr 1999 - 29 Apr 1999 (11 posts) Archive Link: "Bugs in quota code (long) (new)"
People: Jan Kara, Andre Couture, Linus Torvalds, Mike Galbraith, Chris Evans
Jan Kara found and fixed a lot of bugs in the quota code. Chris Evans asked if the patches would make it into 2.2.7; Jan said he planned to submit them if there were no problems. He then posted a new patch, and Andre Couture replied with total success. Jan posted a new patch, which emptied his bug list and left only new features in his TODO file. Mike Galbraith pounded on it with sadistic fury, but his single processor machine stubbornly refused to crash. Jan said he'd wait a few more days for more bug reports before submitting the code to Linus Torvalds; and the thread was over.
Filesystem For Audio CDs
23 Apr 1999 - 27 Apr 1999 (17 posts) Archive Link: "audio fs emulation"
People: Riley Williams
Senko Rasic wrote a patch (against 2.2.x) to emulate a filesystem on audio CDs (http://fly.cc.fer.hr/~ptolomei/audiofs/) , showing each track as a separate file. He said it was slow and imperfect, but had no known bugs.
There followed some discussion of rounding out the features: Riley Williams asked about dual mode CDs: could the driver be made to handle both the data and audio tracks? He suggested that the 'mount' command could either show the audio tracks as existing in a separate directory, or be used once to mount the data tracks, and a second time to mount the audio.
Senko replied that his filesystem could currently not access data tracks, only display them. He dismissed the idea of using 'mount' twice, pointing out that VFS locks the mounted device. Bypassing VFS, he added, would be extremely ugly. As far as having mount show a separate directory for audio files, Senko wasn't too hopeful about that either: if it were even possible, he felt it would require a complete redesign of the isofs code.
Riley suggested just making isofs detect (rather than actually handle) the audio tracks; and then do a hidden loopback, using Senko's audiofs filesystem to access the tracks as files. Senko replied that the isofs implementation couldn't handle mounting audiofs on top of it via loopback. He did say it might be possible to mount isofs on top of audiofs.
There was a bit more discussion about naming conflicts and multi-session audio CDs, and the thread was over.
Files Larger Than A Single Volume In CODA
23 Apr 1999 - 26 Apr 1999 (3 posts) Archive Link: "Coda & big files"
Topics: Big File Support
People: Pavel Machek, Matti Aarnio
Pavel Machek wondered how CODA would handle files that were too big to fit on a single hard drive. His perusal of the code showed no implementation at all.
Matti Aarnio asked if this was only an implementation problem, or a fundamental design issue; and someone replied that it was a design issue: clients can connect and disconnect at will, so files are stored whole.
ACPI For Linux
23 Apr 1999 - 27 Apr 1999 (5 posts) Archive Link: "Re: ATX Power off & SMP-Kernel"
Topics: Power Management: ACPI, SMP
People: Alan Cox
In the course of discussion, someone asked if ACPI (Advanced Configuration and Power Interface) (http://www.teleport.com/~acpi/) would be implemented for Linux. Alan Cox replied, "I don't know. It is an open (but very hard to read spec)."
In fact, there is an ACPI4Linux (http://phobos.fachschaften.tu-muenchen.de/acpi/index.html) project in the works; but according to the web page, they could really use a hand.
Dynamically Changing RAMdisk Size
24 Apr 1999 - 27 Apr 1999 (8 posts) Archive Link: "initrd/ramdisk problems, differences 2.2.1 vs. 2.2.5"
People: Dave Cinege, Alan Cox, Linus Torvalds, Richard Gooch
Frank Bernard had a problem that turned out to be his fault, as he figured out eventually and explained to the list. In the course of the discussion, Dave Cinege gave a pointer to his initrd-archive (ftp://ftp.psychosis.com/linux/initrd-arch/) patch, which allows ramdisk size to change dynamically. Richard Gooch was impressed, and asked if Dave had submitted the patch to Linus Torvalds. Dave replied, "A few times since Jan 1998. Never heard back. I finally got a 'no' from Alan Cox for inclusion in 2.0.37 and 2.2.2 not long ago. (Code OK, but not interesting enough. : P)"
18 Apr 1999 - 29 Apr 1999 (19 posts) Archive Link: "cache killer memory death test - 2.0 vs 2.2 vs arca - programs inside"
Topics: Virtual Memory
People: Andrea Arcangeli, Harvey J. Stein, Stephen C. Tweedie
Harvey J. Stein did a fairly repeatable test of several kernels under heavy load, and found that the 2.2.5arca12 performed worse than vanilla 2.2.5 (2.0.36 also performed poorly). Andrea Arcangeli was thrilled to have a test case, and said, "I just know _exactly_ why arca12 has a worse worst-case and bad interactive feel while larging tons of data to disk. The culprit is buffer.c and the reason 2.2.5 is not stalling is due the set_writetime misfeature that on the other side is harming a lot performances in the stock kernel." But he replied to himself a couple hours later, saying "in the last minutes I've found also some stupid design in my shrink_mmap() of "all" 2.2.5_arca*.bz2 that may lead to bad performances and maybe stalls. Unfortunately I was using the bad design of shrink_mmap() as a feature :(. So with the design fixed, the system runs less smoothly under swap, but if you are not swapping heavily this new patch should perform better than arca12." He posted a 2.2.6_arca1 patch (ftp://e-mind.com/pub/linux/arca-tree/2.2.6_arca1.bz2) .
At this point another thread started, with the Subject: [big performances boost for DataBases] Re: cache killer memory death test - 2.0 vs 2.2 vs arca - programs inside (http://www.kernelnotes.org/lnxlists/linux-kernel/lk_9904_04/msg00399.html) . Andrea summed up his own tests, and said, "In your program you are killing the caching behaviour that my buffer code does on dirty buffers (the thing that doesn't work on the stock kernel due flushtime mess). The fact is that you are doing many gdbm_sync in the middle of your proggy." He posted a 2.2.6_andrea2 patch (ftp://e-mind.com/pub/andrea/kernel/2.2.6_andrea2.bz2) (sic).
While downloading 2.2.6_andrea2, Harvey replied with results for arca1 (it was faster than 2.2.6, but not by much), discussed his results and asked about the change from "arca" to "andrea". Andrea replied with much detail on the technical issues of the arca1 results, and explained the name change, saying, "Because you all call me Andrea ;). All my friends instead call me `arca' since I was at mid school (from the surname obviously ;). And I also seen that there's just some software company called arca and so I preferred to call it andrea to avoid any kind of mess. But the name won't make differences at all, the patch-set is the same."
Harvey tried out 2.2.6_andrea2, and said, "Smokin! 2.2.6andrea2 was so incredibly fast that I didn't really have much time to test it interactively, but it was great." Andrea replied, "I am very happy to hear that ;). Now it would be also interesting to see a DBMS benchmark."
Stephen C. Tweedie came in, with, "Do you have an idea of what exactly makes the difference? It looks as if the buffer write scheduling is making a large difference," and Andrea replied, "It depends what you are benchmarking. If you are benchmarking a piece of code that rewrite in the same place many times, yes, definitely my redesign of the flushtime handling is the the big difference (it will completly avoids tons of not needed I/O). But all my other VM ideas/code are very connected with performances."
(Andrea also added, "BTW, I have news about RB-trees. I did some benchmark and they are _definitely_ slower in the buffer cache for query (now find_buffer is far from the top of the profiling output). Now I am using the hash function of the stock kernel, but I'll move to the mul method shortly." )
In a different reply to Harvey's test of 2.2.6_andrea2, Andrea found some other problems, and "I also discovered an interesting bug in my page-mapping handling in 2.2.6_andrea2 (I simply forgot to increase the map_count of the page at fork time... ;). The bug can't lead to corruption/crash or any kind of other harm because I designed the map_count handling in a safe way, but the bug was obviously harming performances :(. So if 2.2.6_andrea2 was fast 2.2.6_andrea3 can be an order of magnitude faster in shrink_mmap." He included a 2.2.6_andrea3 patch (ftp://e-mind.com/pub/andrea/kernel/2.2.6_andrea3.bz2) . Much later, he replied to himself, saying 2.2.6_andrea3 had a bug. He posted a small patch to fix it, and said he would release 2.2.6_andrea4 shortly.
Now this thread died, and two days later, the conversation continued on the original thread. Harvey had tried 2.2.6_andrea4, and found it to be slightly worse than 2.2.5arca12; but he still found 2.2.6_andrea2 to beat them all. He concluded, "It seems like a2 " (andrea2) " is the best at both avoiding disk activity & wisely using the disk when necessary. A4 seems to be slightly more efficient when using memory but much worse at using the disk when necessary."
Andrea replied, "With changes between A2 and A4 I did penalized the mmapped memory in favour of caching memory. This allowed the dbase program to alloc the buffer he needed, but this mean that it swapped out far more data to disk," and added, "I'll revert to the andrea2 behaviour (thanks for the information)."
He went on, "I also released a 2.2.7_andrea1.bz2 but don't try it!!! because since I merged 2.2.7 my VM tree is unstable under high swap load. I don't think it's a 2.2.7 issue (since I overviewed all the patch and I couldn't find anything wrong in it) but maybe a missed bit in the merging or some more subtle bug. I hope to have a clue shortly." About 7 hours later he replied to himself, with, "I fixed the bug. I forgot one of my lru_refile_unmapped_cache() in unuse_pte ;). So when delete_from_swap_cache() was running, it was doing a list_del(page->lru) it had page->lru pointing to the old ->lru not-valid-anymore values..." He added, "Now everything seems rock solid also after a swapoff -a," and posted a new 2.2.7_andrea2 patch (ftp://e-mind.com/pub/andrea/kernel/2.2.7_andrea2.bz2) .
'make xconfig' Problem
24 Apr 1999 - 27 Apr 1999 (5 posts) Archive Link: "make xconfig problem in 2.2.6"
People: Andre Couture, Andrzej Krzysztofowicz
Andre Couture said, "It seem that using "make xconfig" in 2.2.6 does not allow to select/unselect the Mach64 Frame Buffer. The line is not shaded but also not selectable." Andrzej Krzysztofowicz replied, "Xconfig is written basing on silent assumption that each variable should be defined in only one place. This assumption is broken in the drivers/video/Config.in file, however...For i386 architecture you can try my xconfig patch: ftp://rudy.mif.pg.gda.pl/pub/People/ankry/linux-patches/2.2/xconfig/patch-xconfig-990407.gz. It fixes this problem partially ignoring most of options which cannot be ever sellected for current architecture. However, for ppc/m68k/sparc/sparc64 the problem is more serious and cannot be removed in this way."
24 Apr 1999 - 26 Apr 1999 (5 posts) Archive Link: "2.2.6 breaks one-way cable modem (sb1000)"
Topics: Modems, Networking
People: Illuminatus Primus, Christoph Lameter, Steven N. Hirsch
Illuminatus Primus said, "After upgrading from 2.2.0 to 2.2.6, it seems that packets that arrive on an interface using the same address as another previously-existing interface get discarded." Christoph Lameter said to do an "echo 0 >/proc/net/ipv4/xxx/rp-filter" to switch the discarding off. Steven N. Hirsch couldn't find the problem in 2.2.6ac1, and asked if the default had changed in the vanilla kernels.
Illuminatus posted again, having tracked the culprit to a snippet in /etc/init.d/netbase in Debian Slink. Steven hadn't seen the problem in ac1 because he used Red Hat, which left rp_filter alone.
Purpose Of ioremap()
24 Apr 1999 - 26 Apr 1999 (5 posts) Archive Link: "When to use ioremap?"
People: Philip Blundell, Alan Cox, Jeff Garzik
Jeff Garzik was exploring the docs, and found that in Documentation/IO-mapping.txt, it said that reading from IO space used readl(), while writing used ioremap(). He wanted to know why writing required ioremap(). Philip Blundell replied, "0xc0000 is in the ISA memory space. This area is special in that (on PC machines at least) you can access it without ioremap, and in fact ioremap will just return the original address. It's still good practice to use ioremap for this sort of memory though."
Alan Cox added, "Using ioremap for all cases is almost certainly going to be required in 2.3.x. The fact its not always needed right now is one of the biggest obstacles to supporting a more sane range of memory sizes."
Floating Point Inside The Kernel
25 Apr 1999 - 26 Apr 1999 (3 posts) Archive Link: "Double or float in kernel modules"
People: Richard B. Johnson
Ravi Wijayaratne asked if and how he could use a double or float in a kernel module. Richard B. Johnson said:
This is becoming a FAQ.
You can't use the floating point unit inside the kernel. The reason is that the state of the FPU is undefined within the kernel. Of course, in principle, anything is possible, however, you would have to:
- Lock the kernel.
- Save the current FPU state (this also does 'finit').
- Do the math.
- restore the FPU state.
- Ulock the kernel.
I don't have the ix86 book here, but it's about 130 or so clocks to save the FPU state and slightly less to restore it. You can do a lot of integer math in those clocks.
It is also difficult to have the 'C' compiler do what you want when you want it, so to make sure the FPU state was safe, you probably would have to do everything in inline-asm. There is the additional problem of handling FPU exceptions within the kernel.
So, as you can see, you don't want to use the FPU within the kernel. Also, some ix86 processors don't have FPU units. When encountering a FPU opcode, the kernel traps to an exception handler which emulates the FPU.
Bug In A Fix
27 Apr 1999 - 29 Apr 1999 (6 posts) Archive Link: "Linux 2.0.37pre11"
Topics: Debugging, SMP
People: Stephen C. Tweedie, Alan Cox, Patrick J. LoPresti
Alan Cox posted the latest 2.0 patch, and Patrick J. LoPresti said he was getting memory problems on both of his SMP machines after upgrading. He confirmed that the problem did not exist with 2.0.37pre10. Alan traced the problem to a patch from Stephen C. Tweedie that aimed at fixing an mm race condition, and said he'd back it out for the next release.
29 Apr 1999 (2 posts) Archive Link: "2.2.7: USB subsystem"
People: Ulrich Windl, Linus Torvalds
Ulrich Windl said, "Viewing the USB patches, I get the impression that shell scripts like linux/drivers/usb/stopusb should go to the scripts (sub)directory... (Or include them in the documentation)"
Linus Torvalds replied, "Well, they should probably be removed, they're really only there to help early development.."
Possible Bug In TCP Stack
29 Apr 1999 (6 posts) Archive Link: "Behaviour of OOB in TCP ?"
Topics: BSD, Networking
People: Oren Laadan, Andi Kleen, Craig Milo Rogers
Oren Laadan wanted some comments on Out Of Band (OOB) data:
Consider the following scenario:
A and B have a TCP connection. A sends B four bytes, then an OOB byte, then another four bytes, and finally another OOB. Only then process B reads the data. (without OOB inline).
Under BSD the first OOB byte is lost forever, and process B will read 8 bytes before stopping at the second OOB, and nothing else (but naturally the second OOB itself with MSG_OOB flag in recv()).
Under Linux, the first OOB byte is inserted into the stream (!) and thus process B will get 9 bytes before the second OOB. This would still happen EVEN IF process B ALREADY read the first OOB byte but did not yet consume the data prior to it (thus - the first OOB is received twice: once as OOB and once in stream).
The reason for the difference is the following: In BSD, whenever an OOB byte arrives it is immediately taken out of the stream (unless OOB_INLINE is specified) so there is no need for special handling in the regular receive routine. This byte is saved in a special field (so it can be retrieved later by the process). In Linux, it is not removed from the stream, but instead a special field holds its sequence number, and the receive subroutine takes care of not copying this byte to the user. Thus, when a second OOB arrives before the entire data prior to the first OOB is consumed, the field is overwritten by the new value, and the old byte remains as a regular byte in the stream (EVEN IF IT WAS CONSUMED BY THE PROCESS, as long as data prior to it in the stream is still buffered).
It seems like the behaviour the TCP stack in Linux broken (or I missed something in the RFC). In that case, the fix would naturally be to change the code to either (1) work like BSD and remove the byte from the stream or (2) keep multiple OOB pointers (which is expensive and complicated).
Andi Kleen replied, "TCP has no real out-of-band data. It has urgent data. If you see it as urgent data the Linux behaviour makes sense. I don't think it is worth the effort to change it to be 100% BSD compatible. The bug is really that the sockets API maps urgent data to something called out-of-band data," and added, "I'll document the behaviour in my network man pages."
Elsewhere, Craig Milo Rogers offered the following historical digression:
The problem is that TCP was developed using the concept of "urgent" data, rather than "out of band" data. The "urgent" flag was intended to mark the presense of urgent data, not to delimit the urgent data; that was supposed to take place in the stream itself. The BSD protocol designers violated that design criterion.
The BSD designers (Bill Joy et al.) were working on a kernel inteface that could encompass many different network transport layers, such as TCP/IP and ISO/OSI. They designed portable applications to run on top of that interface, and they wanted to use "out-of-band" data as a unifying concept, not "urgent" data. They even went so far as to implement OOB data for TCP in the BSD network stack, using nonstandard TCP options, and released a version of Unix with it, without coordinating with the official IP/TCP developers, first. This move was widely viewed with disfavor in the IP/TCP development community.
In response, the BSD developers said "we have found a way to implement our software on TCP without using our nonstandard options". The fact that they did this by subtly breaking TCP itself was not immediately understood by the IP/TCP developers.
Of course, BSD could have achieved OOB data by using a second TCP port (or something even more extreme, such as a UDP port). This would have worked well with the TCP spec, but would have consumed a second port; even in 1981, it was recognized that the 16-bit port space was a bit tight if you made frequent connections. Or, if rsh/rlogin had used a count or escape sequence to delimit out-of-band data... but I believe that they were trying to avoid the inefficiences of that approach.
Fixes For Rarely Touched Code
29 Apr 1999 - 30 Apr 1999 (7 posts) Archive Link: "[PATCH] missing lock_kernel() and assorted races"
Topics: Version Control
People: David S. Miller, Alan Cox, Alexander Viro
Alexander Viro posted a patch to fix a number of problems. David S. Miller replied, "the architectures where you made the most changes don't even compile in 2.2.7 and haven't been updated in nearly half of a year :-)" and Alan Cox added, "and some of the bugs appear to be in the relevant CVS trees too."
Sangoma Wanpipe Still Not Ready
30 Apr 1999 - 4 May 1999 (5 posts) Archive Link: "2.2.6 and Sangoma WANPIPE"
People: Alan Cox
Camm Maguire had some problems with the Sangoma wanpipe S508 cards on the routers he was using: the line would freeze after a day or so. Alan Cox replied, "This is why the code is still in -ac and I've not sent it to Linus. You aren't the only people with this problem and the new code. Once I get updates from Sangoma that survive testing then it'll go on to Linus."
First Inkling Of 2.3 Coming Up
29 Apr 1999 - 30 Apr 1999 (3 posts) Archive Link: "IDE chipset support"
Topics: Disks: IDE
People: Andre Hedrick
Andre Malafaya asked if and when the Alladin M1543 IDE chipset would be supported by Linux, and Andre Hedrick said:
I have just discussed this issue with Linus yesterday. In order to get support for the new Super Socket 7's, there was a need to perform some exotic pre-init. This may be moved to the pci-quirk list, but I just asked Martin M. about it late last night.
Unless I can reduce the exotic nature of the new code and mainstream it into what is required..........2.3.X is the introduction in the very near future. I will not specify a time scale, as I have not asked if I could disclose that information. It is not a great secret, but I do not have the authority to make such announcements.
I am trying to find time to restructure the code, but I have very little. I proposed my work as a candidate for pre-patch-2.2.8-1/2/3, but I I may have to withdrawl that request for the next pre-patch series.
Since It is first on the list for 2.3.0, and with success and stablity proven, it will be back ported.
We Hope You Enjoy Kernel Traffic
Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License, version 2.0.