Kernel Traffic
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Latest | Archives | People | Topics
Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic

Kernel Traffic #114 For 16 Apr 2001

By Zack Brown

linux-kernel FAQ | subscribe to linux-kernel | linux-kernel Archives | | LxR Kernel Source Browser | All Kernels | Kernel Ports | Kernel Docs | Gary's Encyclopedia: Linux Kernel | #kernelnewbies

Table Of Contents


Last week I was pretty laid up with a cold, coupled with various new-job type tasks, and the combination made it pretty tough to get KT out. I had some illusions almost to the day of publication, but eventually I had to face facts. I was going to miss another issue. Long-time KT readers may remember that I missed an issue in August 1999, when I had to go to Linux World in San Jose. I also seem to have missed an issue in November of 1999, but I don't remember what was up at that time. The next missed issue was in January 2000, since I was working on a Linuxcare web site relaunch. So counting last week's, I've missed four issues since starting KT in January 1999.

Mailing List Stats For This Week

We looked at 1597 posts in 6541K.

There were 615 different contributors. 278 posted more than once. 153 posted last week too.

The top posters of the week were:

1. 64-Bit Major/Minor Device Numbers And PID Allocations

24 Mar 2001 - 4 Apr 2001 (95 posts) Archive Link: "Larger dev_t"

Topics: Clustering, Disk Arrays: LVM, Disks: IDE, Disks: SCSI, FS: devfs, Ioctls, Networking, USB

People: Linus TorvaldsMitchell Blank JrAlbert D. CahalanH. Peter AnvinAlan CoxAndre HedrickWichert AkkermanJamie LokierUlrich DrepperJohn ByrneAndries Brouwer

Andries Brouwer argued in favor of changing the dev_t variable to have 64 bits instead of 16. He pointed out that this was already the case in user-space, because glibc used a 64-bit dev_t. One of the main uses for dev_t was to hold the major and minor numbers of device files in /dev. Linus Torvalds replied vehemently:

The fact that glibc is a quivering mass of bloat, and total and utter crap makes you suggest that the Linux kernel should try to be as similar as possible?

Not a very strong argument.

There is no way in HELL I will ever accept a 64-bit dev_t.

I _will_ accept a 32-bit dev_t, with 12 bits for major numbers, and 20 bits for minor numbers.

If people cannot fit their data in that size, they have some serious problems. And for people who think that you should have meaningful minor numbers where the bit patterns get split up some way, I can only say "get a frigging clue". That's what you have filesystem namespaces for. Don't try to make binary name-spaces.

And I don't care one _whit_ about the fact that Ulrich Drepper thinks that it's a good idea to make things too large.

John Byrne suggested getting rid of major and minor numbers entirely, and Linus replied that this would eventually be the case inside the kernel, but that user-space still required the concept of major and minor device numbers, just to accomodate all the legacy UNIX features that expected it. There was no reply to this, but Andries did reply to Linus' first post. He argued that a large dev_t would really make things a lot simpler, because the whole question of dev_t wrapping back to zero would really be moot. A 64-bit value would just take a HUGE amount of time to wrap, and so a lot of kernel code could ignore that possibility entirely. Wichert Akkerman replied that eventually, a 64-bit value would wrap, and so Andries' solution would only delay the problem. Mitchell Blank Jr pointed out, "64 bits is enough to fork 1 million processes per second for over 500,000 years. I think that's putting the problem off far enough." But Jamie Lokier pointed out that a million processes per second, or even five hundred million, would be a pretty small number, given the various clustering projects likely to be underway in the near future, and that on such a system even a 64-bit process ID space could wrap within a couple weeks.

Elsewhere, Linus also replied to Andries idea that a big dev_t would simplify kernel code. He said this would force programs to "carry along the overhead of having a 64-bit field." He went on:

The fact is, that there are programs out there that use "int" for pids.

It's equally true that changing "pid_t" will require that you recompile every single app that might have a kernel interface to the current 32-bit pid_t.

AND you just created tons of problems for things like the non-obvious stuff like

ioctl(fd, FASETOWN, arg);

because "arg" is defined to be a single word.

In short, you've just broken existing binaries in ways that will be _damn_ hard to debug (they magically start breaking only after the pid-space has wrapped the first 32 bits).

And that's a DOCUMENTED interface. Never mind all the undocumented stuff that assumes (for all the reasonable historical reasons) that "pid" fits in an "int". Tell me there aren't applications like that, and I'll laugh in your face.

In short, both your arguments are totally bogus. Your "simpler" function is in fact a horrible rats nest and a source of subtle bugs that you apparently never even thought about.

And that's without ever actually mentioning the word "bloat" and "data cache usage".

Andries pointed out that in user space, dev_t was already 64 bits because of glibc. So, he implied, Linus' bug scenario was bogus, since user-space already used the 64-bit solution. Albert D. Cahalan crowed, "In your dreams!!!!" and pointed out tons of user-space code that truncated dev_t. Linus also said to Andries, "Now you're back to the argument that "glibc is bloated, so we might as well be too". The fact is, that I don't like that argument. I don't buy into that kind of philosophy. If somebody else made a mistake, that doesn't force me to do the same mistake." He added, "I have a holy crusade. I dislike waste. I dislike over-engineering. I absolutely detest the "because we can" mentality. I think small is beautiful, and the guildeline should always be that performance and size are more important than features." He also went into some depth:

The fact is, that there are two uses for dev_t's:

And let's take a look at /dev. Do a "ls -l /dev" and think about it. Every device needs a unique number. Do you ever envision seeing that "ls -l" taking about 500 billion years to complete? I don't. I don't think you do. But that's how ludicrous a 64-bit device number is.

So in /dev, there are two problems: we are getting painfully close to major numbers with 8 bits, and we've run out of minors several times. In fact, a lot of the reason for the dearthness of major numbers is the fact that we use multiple majors for some stuff that really wants many minors.

So 8 bits for major is actually fairly close to perfectly livable - or at least would be if we had more minors. And there is no question about it: you need a lookup table for major numbers. Which means that 32 bits of major numbers is ridiculous. As is 20. Which is why I suggested 12. A nice size, that is reasonable in real life, and that can easily be used for table lookups. It's also sixteen times larger than what we have today, which would probably be acceptable in itself.

For minors, we have the problem of "dynamic" devices. The main one probably being pty's, in fact. It's easily conceivable to have thousands of pty's - I suspect that for various other reasons most system administrators would prefer to farm things out so that it isn't _needed_, but clearly we want at _least_ 16 bits here. 20 bits is reasonable.

And remember: for the future, what we want to move towards is _name_ lookup, not device number lookup. Stupid SCSI people have wanted to partition the minor number for a long time, and that's always been idiotic. If you have a sparse name-space, you should use names, not numbers.

So people who want to see /dev/scsi/bus0/dev12/lun4/part0, use devfs or something, don't try to make the number space be sparse. Sparse numbers are a stupid idea for _anything_ but maybe CPU design (I'm willing to concede that a 256-bit address space might be useful on a CPU level, because a CPU really cannot afford to do name lookups when looking up addresses, even if it has been tried).

In short, a 64-bit dev_t is unnecessary. And according to the maxim of "don't go overboard just because you _can_", I don't want to see it.

Also, I have looked at your argument for "simplicity", and I dismiss it. I do not believe that the cases you claim are "simpler" are really simpler. I showed that your pid_t example was completely unrealistic, and as far as I can tell your "dev_t" example absolutely _hinges_ on the fact that it makes anonymous dev_t allocation simpler.

And that falls flat on its face simply because it's _already_ so simple that it doesn't matter.

H. Peter Anvin piped up at this point, saying that the real problem was having a 64-bit value that was very dense. Trying to pack larger numbers into a small space, he said, was a big problem, while having a sparce number space would not be a problem. He added, "The IPv4->IPv6 transition people have looked at the issues of number spaces and how much harder they get to keep dense when the size of the numberspace grows, because your lookup operation becomes so much more painful. Any time you have to take a larger number space and squeeze it into a smaller number space, you get some serious pain." He also went on, "Part of the reason we haven't -- quite -- run out of 8-bit majors yet is because I have been an absolute *bastard* with registrants lately. It would cut down on my workload if I could assign majors without worrying too much about whether or not that particular driver is really going to be made public." He concluded, "64 bits is obviously excessive, but I really don't feel comfortable saying that only 12 bits of major is sufficient. 16 I would buy, but I don't think 16 bits of minor is sufficient. Given that, it seems to me -- especially since dev_t isn't exactly the most accessed data type in the universe -- that the conceptual simplicity of keeping the major and minor separate in individual 32-bit words really is just as well. YES, it's overengineering, but the cost is very small; the cost of underengineering is having to go through yet another painful transition. Unfortunately, the Linux community seems to have some serious problems with getting system-wide transitions to happen, especially the ones that involve ABI changes. This needs to be taken into account." Linus replied:

The real problem, in my opinion, is needing device numbers in the first place, for stuff that really shouldn't use them.

I don't want to make allocation easy. In fact, I want to make it _harder_. I like it being painful, because it should not be done.

I've seen _way_ too many instances of "let's create a special device" for no good reason. For example, all the crap about mice was (and is) a mistake. And that's the least of the problems. Some devices on the device list are there mainly as just a way to hook in an ioctl or something. It's sad, and it's wrong.

And I'm sorry, but I do NOT want to envision a future where you can say "ok, majors in the range 512-576 are PPC-specific, and you can go wild". Yes, it would make your job easier. But it would make for a BAD SYSTEM, which is what _I_ care about.

We should encourage people to not need major numbers. It's easy. The driver exports a /proc entry in /proc/driver/xxx or similar . Or the driver writer says "if you want to use this device, use devfs", and exports the name there.

Don't get the issue of "it would make my life easier" override the issue of "it's the wrong thing to do".

Another example: all the stupid pseudo-SCSI drivers that got their own major numbers, and wanted their very own names in /dev. They are BAD for the user. Install-scripts etc used to be able to just test /dev/hd[a-d] and /dev/sd[0-x] and they'd get all the disks. Deficiencies in the SCSI layer made it impossible for a driver writer to be nice to the user, so instead they got their own major numbers.

But again, you're arguing for _more_ badness. While I'm of the opinion that we _already_ have too many major numbers, and we should realize that, and not make it worse.

A 64-bit dev_t only makes it _easier_ to continue to be stupid about things.

And that, btw, is the hallmark of "bloat". Bloat is not about being big. Bloat is about being slow and stupid and not realizing that it's because of design mistakes.

The discussion veered off a bit here. Alan Cox voted for a 32-bit dev_t, and also disagreed with some of Linus' points. Among them was Linus' statement that "Deficiencies in the SCSI layer made it impossible for a driver writer to be nice to the user, so instead they got their own major numbers." To this, Alan replied, "Not deficiencies in the SCSI layer, there is no way the scsi layer can handle high end raid controllers. In fact one of the reasons we can beat NT with some of these controllers is because NT does exactly what you suggest with scsi miniport driver hacks and it _sucks_. Its an ugly hack." Linus disagreed, and said it could be done easily and quickly, with no performance hit. He explained, "With a simple "queue" mapping for the SCSI majors. Just look up which queue to use for requests to which major, and you're done. The actual IO may by-pass the SCSI layer altogether."

Linus added, "I'm absolutely not advocating using the SCSI layer for the high-end-disks. Rather the reverse. I'm advocating the SCSI layer not hogging a major number, but letting low-level drivers get at _their_ requests directly." Andre Hedrick asked, "Am I hearing you state you want dynamic device points and dynamic majors?" Linus replied:

Yes and no.

We need static structures for user space - from a user perspective it makes a ton more sense to say "I want to see all disks" than it does to know that you have to do /dev/hd*, /dev/sd* plus all the extra magic combinations that can happen (USB etc).

So in a sense what I'm arguing for is for _stricter_ device numbers to the outside world.

But internally, it would be reasonably easy to make a mapping from those user-visible numbers to a much looser version.

One example of this is going to happen very early in 2.5.x: the whole "partitioning" stuff is going to go away from the driver, and into the ll_rw_block layer as just another disk re-mapping thing. We already do those kinds of re-mappings for LVM reasons anyway, and partitioning is not something a disk driver should know about, really.

And that kind of partitioning mapping automatically means that we'd need to remap minor numbers, and do it on a per-major basis (because the partitioning mapping right now is not actually the same between SCSI and IDE: IDE uses six bits of partitioning, while SCSI uses just four bits). And once you do that, you might as well start "remapping" major numbers too.

So let's say that you have two separate SCSI controllers - they would both show up on major #8, and different minor numbers. Right now, for example, controller 1 might have one disk, with minors 0-15 (for the whole disk and 15 partitions), and controller 2 might have two disks using minors 16-47.

As it stands now, the SCSI layer needs to do the remapping, and because the SCSI layer does the remapping, nothing but SCSI layer devices can use major #8.

But once you start doing partition mapping in ll_rw_block.c, you might as well get rid of the notion that "SCSI is major 8". You could easily have many different drivers, with many different queues, and remap them all to have major 8 (and different minors) so that it looks simple for a user that just wants to see SCSI disks.

Which is not to say that the same disk might not show up somewhere else too, if anybody wants it to. The _driver_ should just know "unit x on queue y", and then the driver might do whatever it wants (it might be, for example, that the driver actually wants to show multiple controllers as one queue, if the driver really wants to for some reason). And it should be possible to have two drivers that really have no idea at ALL about each other to just share the same major numbers.

The discussion meandered along for awhile and then petered out.

2. Linux Use In Business Environment

29 Mar 2001 - 2 Apr 2001 (9 posts) Archive Link: "Linux connectivity trashed."

Topics: FS: NFS

People: Richard B. JohnsonJ. A. MagallonJesse PollardJ.A. MagallonRoger Larsson

Richard B. Johnson reported:

This is for information only.

Last week a standard RH distribution of Linux was rooted from what looks like a Russian invasion. The penetration used the method taught in the CERT Advisory CA-2000-17.

The intruder(s) then attempted to perform additional penetrations from this site. One of the sites attacked was alleged to be Raytheon. Raytheon makes products for national security such as guided missiles.

I was told that Raytheon is now suing this company. Therefore all Linux machines are being denied access to the Internet.

The penetration occurred because somebody changed our firewall configuration so that all of the non-DHCP addresses, i.e., all the real IP addresses had complete connectivity to the outside world. This meant that every Linux and Sun Workstation in this facility was exposed to tampering from anywhere in the world. This appears to be part of a plan to remove all non-DHCP machines by getting them trashed. In other words, we were set up to take a hard fall because no machine that allows NFS mounts can be safely exposed to the outside world without blocking portmap.

There is a concerted effort to eliminate both Sun Workstations and Linux machines as tools in this facility. This happens as the "yuppies", who have never, ever, contributed to product development are Peter-Principled into positions of authority.

The email addresses of those who have declared that only Windows machines will be allowed access to the outside world are:

Thor T. Wallace
David Pothier

David Pothier was a beta tester for Windows/NT. Of course he wants all machines to be Windows and, naturally, under his control.

Thor Wallace is our new "security" administrator so I am told.

The only Linux advocate in a position of authority is:

Alex Shekhel

So, now I hooked up my lap-top, installed Windows.... and here I am. Only windows machines are allowed to access the outside world.

J.A. Magallon replied that the network administrators shouldn't have "spent their time configuring a firewall to MAKE HOLES where there are not any..." And Jesse Pollard drew attention to the fact that the admins had not told anyone that they were doing this either. Roger Larsson felt that if Raytheon could sue a company for what happened, that implied that a company could sue another for forwarding a virus as well. Several folks suggested that Richard find another job. Later, Richard said, "I have now gotten three linux machines back "on-the-air". The security people insist on doing "NAT", so these machine are now using a phony internal address, but we are up. Another crisis created and resolved."

3. Still Chasing Bad Bugs In 2.4

29 Mar 2001 - 2 Apr 2001 (4 posts) Archive Link: "EXT2-fs error"

Topics: FS: ext2

People: Alan CoxRogier Wolff

Someone reported seeing errors of the form, "EXT2-fs error (device ide2(33,3)): ext2_free_blocks: bit already cleared for block 1048576", while rm-ing a big directory. The person asked if these were serious, and Rogier Wolff replied that yes, they were very, very serious. Rogier's first guess was that the poster's hard disk, or more likely, his RAM, was simply broken. But Alan Cox said, "if it was 2.2 I'd believe it. 2.4 is still showing these kind of problems in software on many VIA chipset athlon boards and under extreme loads on other boxes too."

4. Cleaning Up The Kernel...Developers

1 Apr 2001 - 2 Apr 2001 (17 posts) Archive Link: "New directions for kernel development"

Topics: BSD, Networking

People: Linus TorvaldsGeorge BonserWayne BrownChris MeadorsAlan CoxDavid Riley

At 5 minutes past midnight on April 1, Linus Torvalds finally confessed:

Recently, I've been thinking a lot about where Linux development should head now that 2.4 is out. Specifically, I've been thinking about how we ought to make some cultural changes as well as technical changes. Now I'm not *entirely* sure what directions we should head in as we move towards 3.0, but I'd like to point out a few areas that need to be addressed as well as propose some possible solutions. Nothing is set in stone yet, but these are definitely issues we need to work on.

First off, I don't like a lot of the elitism that does on among Linux hackers. Just because you can tell what the following script does without executing it, doesn't mean that you're some kind of god.

#! /usr/bin/perl
@k = unpack "a"x5,'x_,d@';@o = unpack "a"x19,'Q8>tUxLm\@`Y%N@cIq]';
while ($i<19){print chr((ord($o[$i])-ord($k[$i++%5])+91)%91+32);}

Learning to hack Un*x is an impressive accomplishment, but it's closer kin to solving a Rubik?s cube than scaling Everest. If you think using Un*x makes you some kind of super genius who should be feared by mere mortals and end users, either get over it or start using *BSD. *BSD users (and developers) are all complete jackasses, so you'll fit right in.

Secondly, I'd like to address the issue of cleanliness. Quite frankly, the standards of personal hygiene practiced by many members of this community are simply unacceptable. As you all know, I am a fairly clean cut, well-kempt person (I know, I have a bit of a gut, but compared to Maddog, Nick Petreley or ESR, I'm a modern Adonis.), and in the Linux community that is something of an anomaly. Virtually all users of Linux (and all other forms of Un*x) are unkempt, longhaired, beast-bearded dirty GNU hippies, and I am sick and tired of having to deal with them.

The person I have the greatest problem with is that (in)famous communist RMS. Now, RMS may have been responsible for GNU, the GPL, GCC and many other contributions to the computing community, but his stance, as well as stench, displayed in his essays and actions, nauseates me. I mean, with that filth-ridden beard of his, where does he have room to demand that people refer to Linux as GNU / Linux? When he is as clean-shaven as I, he may claim that right, but until then, he should go back to playing his little flute and dropping acid like there?s no tomorrow. Honestly, if he doesn?t shut his mouth and go back to reading Marx, I?m going to shut it for him. I am sorry to sound so harsh, but a little hygiene every once in a while is a Good Thing(TM). Makes me wish I'd gone with a closed source license back in the day.

Next in line of dirty scuzz-balls I have to deal with, and probably the worst thorn in my side, is Alan Cox, the primary coder of my kernel's TCP/IP stack (ha, what a joke!) and all around dirty GNU hippy. Alan views toothpaste the same way a vampire views garlic. The man's wife (who I spent a few years with at the University of Helsinki) often calls me crying in the middle of the night to complain of the rank, unbearable stench the man exudes after sex. On several occasions at trade shows, exhibitions and beer bashes, I have nearly fainted from the torrent of rotten odor that pours from every inch of his toxic person. Along with the typical GNU hygiene (mis)habits he practices, he also bitches and whines about... well, everything. He lies a lot too; evidence for this can be seen in the fact he almost always wears cheap black sunglasses when talking to people he knows are better than him (such as myself).

And then we come to ESR. I won't reiterate the sewer-dweller like cleansing habits he practices as well, but I would like to focus on his general lifestyle. I like to refer to ESR as AGB or ?Arrogant Gas Baron.? The man?s flatulence is legendary. I honestly believe that given a meal of refried beans and a match, he could reach low earth orbit. If you have to meet with ESR for any reason, arrange for the meeting to be outdoors and try to stay upwind. And his flatulence isn?t limited to his posterior either. Frequently it comes out his mouth or even out of his keyboard. (Those of you who have read ?The Cathedral and the Bazaar? or ?Meditations of Sudden Wealth? will know exactly what I?m talking about here.) Additionally, he is a complete hillbilly. You know, the kind that goes to inner-city computer stores and buys 386s to set up as servers all over his house, with cigarette smoke-stained 14" monitors piled high upon his kitchen table. He has neither grace nor charm and can't last 15 seconds in conversation with educated company without drifting into a tirade on gun rights or the best methods for tanning road kill. Couple the above facts with his ruddy complexion (from drinking Jagermeister like it?s water) and his child-molester mustache and you?ve got the makings of one more person who pisses me off.

Well, that's it for now. Hopefully with these feelings off my chest and into the Open Source community, things will change for the better. I'd like just once to talk to a Linux user or advocate who washes and changes their clothes at least weekly. Until then, I will be rejecting patches from anyone whose grooming standards do not measure up.

Also, I have submitted this to slashdot with the title "A Proposed Remedy Involving Lingering Fud and Organizational Objections to Linux Systems." Be on the lookout for it.

After weighing the various issues, George Bonser replied:

So according to the below, Linux development should be considered an environmental hazard? Maybe it is in the best interest of Planet Earth that we should provide some safe place, possibly some South Pacific island, where we could set up a quarrantine camp for hardcore linux developers. All of their needs would be provided for, of course, since it is not their fault they are afflicted with this problem. Someplace near an undersea fiber link would most likely be best since their activities will need to be closely monitored at all times and instant global communication of their "progress" is vital.

Judging from the pasty faces of many Linux developers, sunshine might be the perfect remedy.

I say the UN should get right on this problem straight away!

There was no reply to this, but a dismayed Wayne Brown spoke for many when he replied to Linus' original post, "I just wish you'd bothered to make all this public sooner. Here I am, having spent the last couple of months growing a beard to enhance my hackerhood, and now, just when it's starting to look good, you want me to shave it off! That's the whole problem with Linux. It's about time you stopped using these undocumented hygiene interfaces and published some solid standards! (And don't give me that old excuse about having your pictures right in front of us -- "A real hacker wouldn't need documentation, he'd just read my face.")"

Several querulous troublemakers refused to face facts, and accused Linus of joking, or even of not being himself. David Riley traced the email to Washington state, and pointed out that it had been written in MS Outlook. He speculated that Linus was not even the author. Chris Meadors clung to this thin reed as well, saying, "Yeah, the quality of these jokes has really gone down hill. Last year we had forged headers and composed with Pine. This year we have someone with a dialup account using Outlook, with all it's ^Os and long lines of text. Bah."

5. 2.4.3 Compile Problems On Alpha And IA64

2 Apr 2001 (4 posts) Archive Link: "can not compile 2.4.3 on alpha"

People: Gustavo NiemeyerJeff GarzikAndrea Arcangeli

Andrejs Dubovskis got several compilation errors when trying to compile 2.4.3 on an Alpha machine. Gustavo Niemeyer reported, "This is happening on ia64 as well. The interface seems to have changed but some architectures were forgotten." Jeff Garzik clarified, "The interface changed and other architectures have not caught up yet :) Other archs pretty much always play catch-up to the x86 port." Andrea Arcangeli suggested trying a patch, but there was no reply.

6. Linux Terminal Type Documentation

3 Apr 2001 - 5 Apr 2001 (6 posts) Archive Link: ""linux" terminal type"

People: Ralf BaechleJames SimmonsAndries Brouwer

Mark Lehrer asked for information on the Linux terminal type, and Ralf Baechle replied, "Maybe cryptic but the most complete documentation of the linux terminal and it's relatives are probably /etc/termcap and the ncurses terminfo database. Aside of the code itself." James Simmons also recommended, "take a look at Since linux tries to emulate the Dec vt100 at this site you will find the vt100 manuals. They are quite good and the esc codes are well described in them." Ralf also mentioned that the man-page for 'console_codes' was also "finally" available, and Andries Brouwer pointed out that this man-page had been around since 1996.

7. Some Discussion Of Binary-Only Drivers

5 Apr 2001 - 9 Apr 2001 (11 posts) Archive Link: "Proper way to release binary-only driver?"

People: Brendan MillerAlan CoxEric W. BiedermanChristopher TurcksinJoel Jaeggli

Brendan Miller asked:

I have the need to distribute a binary-only driver (no flames, please), but I am not certain how to build it so that it can be used on multiple kernel versions. (Or is this impossible?)

I didn't find any "HOWTO (or recommendation) for proper binary-only driver release etiquette", so if there are some preferred means, please let me know.

I specifically had issues with the whole MODVERSIONS thing. I can include <linux/verion.h> and <linux/config.h> to get the right CONFIG_MODVERSIONS macro definitions, and then include <linux/modversions.h> as appropriate. The end result is a driver with symbols whose names are mangled to match the modversion-enabled mangling of a modversion-enabled kernel. This is good if I release on the same kernel version.

Obviously, if I use a different kernel the module refuses to load. My first guess was to get rid of the module-versioning stuff so that the symbols are not mangled, and this seems to work, except that I must use insmod -f module for kernels with a different version than what I built with.

So, if there are guides that I didn't find, or ones on this list that someone things I should use, please let me know.

Alan Cox replied that the only recommendation for building binary-only kernel modules was, don't. But he added that to do it successfully, "You must build with the same compiler, same tree and options as the kernel itself." Eric W. Biederman also said to Brendan, "The general recommendation (I have heard) is to include a bit of source that provides a kernel abstraction layer that will stay the same between kernels that your binary only driver can use, and to let users compile that with different kernels." But he added, "The source form driver is the recommended way to release kernel code. And I would highly encourage you to figure out how to do a source release, under the GPL so your driver can be included in the mainstream kernel. Without that your device will be at best only semi supported under linux." Finally, he said, "If what you are after is a way to release a driver that is not a hassle to add to an already working system, you will find a more receptive ear. I have heard some talk, that it would be a good idea to figure out how to standardize how to compile a kernel driver outside the kernel tree, so it could be trivial enough that anyone could do it. To date there are enough people around who don't have problems compiling their own kernel that this hasn't become a major issue." Christopher Turcksin replied:

I am finding myself exactly in this situation, and I've got a feeling that this won't be the last time either.

I expect that every future Linux driver I get involved with will be released under GPL. However, I think that the majority of our customers will be running a distribution that does not yet support a new driver, and even at Linux speeds, it'll take a long enough time that customers cannot afford to wait for the next release that includes the driver.

So the big issue for us appears to be how we support these customers.

There is no way that we can support customers who have custom kernels, but since they are 'in it' enough to compile their own kernel, I guess they're able to apply our patch and recompile it. I actually suspect that there aren't that many who do this anyway.

Where we find we have a problem is the number of different 'standard' kernels out there. We find that we need to support all releases since the last year or so for each distribution. And for each of those, we find that there are many different kernel versions (some bugfixes, some provide half a dozen different kernels with the CDs, aso.). And since we do not expect these customers to compile their own kernel, we see no option but to provide a precompiled binary driver. The numbers multiply quickly and building all of those becomes an interesting problem.

We had hoped that MODVERSIONS would allow us to provide a single (or at most a few) binary driver. Kernels with even minor version numbers are supposed to be stable (even if they are buggy) ie. not have wildly changing kernel interfaces.

In practice, that doesn't work. A driver compiled with 2.2.16 doesn't load with 2.2.16-5.0 (from RedHat 6.2) (just an example).

Alan Cox pointed out that the even numbered kernels had stable programmer interfaces (APIs), and that binary interfaces (ABIs) were unimportance. He said, "THe ABI thing is an irrelevance to free software. avoiding the ABI compatibility mess is one of the great things free software lets you do." Joel Jaeggli also replied to Christopher, suggesting:

take a look at how nvidia delivers the module for their video cards...

you build the module with your current kernel or download the one for your distribution (limited number)

4-front tenchnologies has also done a long-term good job on this with oss

8. Big Raw IO Performance Enhancement For 2.4

6 Apr 2001 (5 posts) Archive Link: "2 times faster rawio and several fixes (2.4.3aa3)"

Topics: Raw IO

People: Andrea Arcangeli

Andrea Arcangeli announced that he'd been able to speed up the raw IO code by 100%, and that there was still room for more improvements. Later, he posted another patch that showed even better results; there was a bit of discussion, and eventually he said the patch was ready for inclusion in 2.4

9. Status Of aic7xxx Driver

9 Apr 2001 - 11 Apr 2001 (7 posts) Archive Link: "Version 6.1.11 of the aic7xxx driver availalbe"

Topics: BSD: FreeBSD, Disks: IDE

People: Justin T. GibbsWakko WarnerAlan Cox

Justin T. Gibbs announced:

As always, the latest version of this driver is availalbe here:

This site now includes installation instructions, feature set, etc. The page is under construction - comments welcome.

For the impatient:




Wakko Warner asked, "So, what about on an alpha system. I've asked a few times what I could do, but you didn't help nor explain what you meant." Justin explained, "From talking to the maintainer of the QLogic driver, it appears that there is a generic issue with data mapping on the Alpha. The only way to correct this issue will be for someone to debug it." Alan Cox confirmed that this was true on some Alpha boxes, but added that on those boxes, aic7xxx wasn't the only thing to die. IDE DMA wouldn't work either, he said. He suggested, "The real test would be to run Justin's 2.2.19 patch driver and see if that works on Alpha." Elsewhere he added, "If the original aic7xxx driver works on your Alpha and Justin's does not both under 2.2 then I consider it very unlikely the bug is anywhere but Justin's driver." Wakko tried Alan's suggestion and found everything OK. The thread ended there.

10. Major System Slowdown Reproducible Under 2.4.3

10 Apr 2001 - 11 Apr 2001 (5 posts) Archive Link: "kswapd, kupdated, and bdflush at 99% under intense IO"

Topics: Big Memory Support, Disks: SCSI, FS: ext2

People: Jeff LessemPhil OesterAlan CoxRik van RielJan Harkes

Jeff Lessem reported that his 8 processor Dell P-III 700Mhz with 8GB of memory, with a 12 drawer JBOD with 5 disks in a raid 5 arrangement attached to an AMI Megaraid 438/466/467/471/493 controller with a total of 145GB of space, had been having some problems under 2.4.3, with the recent bigpatch fix for knfsd and quotas. He reported, "multiple instance of bonnie++ completely kill the machine. Once two or three bonnies are running kswapd, kupdated, and bdflush each jump to using 99% of a cpu and the machine becomes incredibly unresponsive. Even using a root shell at nice -20 it can take several minutes for "killall bonnie++" to appear after being typed and then run. After the bonnies are killed and kswapd, kupdated, and bdflush are given a minute or two to finish whatever they are doing, the machine becomes responsive again." Phil Oester confirmed seeing similar behavior running 2.4.3-ac2 on a Qmail server, made with a dual-processor PIII 650 with 1GB of RAM, and a SCSI sym53c895 with dual Quantum 9gb drives. He reported, "Any time I start injecting lots of mail into the qmail queue, *one* of the two processors gets pegged at 99%, and it takes forever for anything typed at the console to actually appear (just as you describe). But I don't see any particular user process in top using a great deal of cpu - just the system itself. In my case, however, I usually have to powercycle the box to get it back - it totally dies." Alan Cox confirmed seeing this case as well, and added, "Its partially still a mystery." He went on, "Under heavy I/O loads the cerberus test suite has been showing real disk corruption on all current trees until Ingo's patch today to fix the ext2 and minix problems combined with the earlier fixes for other races. In your case I suspect its the qmail thousands of files being created/deleted not the corruption but its hard to be sure" Rik van Riel also confirmed, "I've seen it too. It could be some interaction between kswapd and bdflush ... but I'm not sure what the exact cause would be." Jan Harkes gave his take:

Syncing dirty inodes requires in some cases page allocations. The existing code in try_to_free_pages calls shrink_icache_memory during free_shortage. So we are probably stealing the few pages that we managed to free up a bit earlier, exactly around the time that we're already critically low on memory.

The patch I sent you a while ago actually avoids this by triggering an extra run of kupdated but doesn't sync the dirty inodes in the more critical try_to_free_pages path.

I've been running it on machines with 24MB, 64MB and 512MB, haven't had any problems. It is noticeable that the nightly updatedb run flushes the dentry/inode cache. In the morning my email reader has to pull the email related inodes back into memory (maildir format). It doesn't have to do this the rest of the day. As far as I am concerned, this actually shows that the system is now adapting to the kind of usage that occurs.

There was no reply.







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.