Kernel Traffic #152 For 28 Jan 2002

By Zack Brown

Table Of Contents


I'm taking a little vacation to the land that time forgot, so there will be no Kernel Traffic for the next two weeks. I'll try to include all the list traffic in the next issue.

Mailing List Stats For This Week

We looked at 2171 posts in 9467K.

There were 616 different contributors. 302 posted more than once. 206 posted last week too.

The top posters of the week were:

1. Fundamental Change In Driver Handling For 2.5

13 Jan 2002 - 21 Jan 2002 (212 posts) Subject: "ISA hardware discovery -- the elegant solution"

Topics: FS: initramfs, FS: ramfs, Kernel Build System

People: Alan CoxEric S. Raymond

In the course of discussion, Alan Cox mentioned that "For 2.5 if things go to plan there will be no such thing as a "compiled in" driver. They simply are not needed with initramfs holding what were once the "compiled in" modules." Eric S. Raymond said:

This is something of a bombshell. Not necessarily a bad one, but...

Alan, do you have *any* *freakin'* *idea* how much more complicated the CML2 deduction engine had to be because the basic logical entity was a tristate rather than a bool? If this plan goes through, I'm going to be able to drop out at least 20% of the code, with most of that 20% being in the nasty complicated bits where the maintainability improvement will be greatest. And I can get rid of the nasty "vitality" flag, which probably the worst wart on the language. how soon is this supposed to happen?

Alan replied, "Its something to tackle after the rest of initramfs works. Even if not then the lsmod case can be made to work since its just a matter of putting the names in a segment for the linker to collate." Eric said:

Dang. This will make the CML2 inference engine work better in some funky corner cases, too. And its behavior will be easier to understand all around.

Sign me up. This will be a good change; I like it when I can make things better by taking features *out* of my code.

There was quite a bit of discussion about Alan's statement. Various folks were in favor, and various against. It also seemed as though not everyone understood the implications and ramifications involved. These were not fully explored in the thread, but I'd imagine it will be a controvercial issue for some time to come.

2. Maximum Number Of CPUs On SMP Systsems

16 Jan 2002 - 19 Jan 2002 (9 posts) Subject: "how many cpus can linux support for SMP?"

Topics: SMP

People: Thomas DuffyRalf Baechle

Barry Wu asked how many CPUs were supported under Linux, and Thomas Duffy explained, "there is a 32bit cpu mask, meaning 32 is the absolute max, although Ralf Baechle has extended it to 64 in order to support SGI origin 2000's, but realistically, linux can only do about 8 before falling on the ground... depends on your workload should be ok with 4 cpus." Ralf Baechle corrected, "Actually Kanoj and me hacked it to work with 128. The scalability was already frightening with 32 and even more so with 128 ..." but he added, "Around 4 procs is certainly the sweet spot currently."

3. Status Of 2.5

16 Jan 2002 - 24 Jan 2002 (23 posts) Subject: "[STATUS 2.5] January 17, 2001"

Topics: Access Control Lists, Code Freeze, Disk Arrays: EVMS, Disk Arrays: LVM, Disks: IDE, FS: JFS, FS: NFS, FS: ReiserFS, FS: devfs, FS: initramfs, FS: ramfs, Framebuffer, Kernel Build System, Klibc, Networking, Sound: ALSA, USB, User-Mode Linux, Virtual Memory

People: Guillaume BoissiereErik MouwJose Luis Domingo LopezHans ReiserAlan CoxJens AxboeJames SimmonsIngo MolnarRik van RielLinus TorvaldsAndre HedrickJean TourrilhesGreg KHJeff DikeKarim YaghmourPatrick MochelRussell KingKeith OwensBen LaHaiseDave JonesRichard GoochRobert LoveAndrew Morton

Guillaume Boissiere announced:

I've seen several times on this list people wondering what features were in the works for 2.5 and what the status of the development was. I did some grepping on the archive and put together a list of things that have been discussed / worked on for 2.5 over the past year or so.

It's probably pretty incomplete and full of errors at this point but I'll be happy to update it if you send me email.

  1. Merged New scheduler for improved scalability (Ingo Molnar)
  2. Merged Rewrite of the block IO (bio) layer (Jens Axboe)
  3. Merged New kernel device structure (kdev_t) (Linus Torvalds)
  4. Merged Initial support for USB 2.0 (Greg KH, others)
  5. Ready Add User-Mode Linux (UML) (Jeff Dike)
  6. Ready Add ALSA (Advanced Linux Sound Architecture) (ALSA team)
  7. Ready IDE layer update (Andre Hedrick)
  8. <1 month New kernel build system (kbuild 2.5) (Keith Owens)
  9. <1 month New kernel config system: CML2 (Eric Raymond)
  10. Beta New driver API for Wireless Extensions (Jean Tourrilhes)
  11. Beta New IO scheduler (Jens Axboe)
  12. Beta Add JFS (Journaling FileSystem from SGI) (JFS team)
  13. Beta New VM with reverse mappings (Rik van Riel)
  14. Beta Add preempt kernel option (Robert Love)
  15. Beta Add resheduling points to remove latency (Andrew Morton)
  16. Beta Build option for Linux Trace Toolkit (LTT) (Karim Yaghmour)
  17. Beta Better event logging for enterprise systems (evlog team)
  18. Ongoing Better support of high-end NUMA machines (NUMA team)
  19. Alpha Add Asynchronous IO (aio) support (Ben LaHaise)
  20. Alpha Integrate EVMS into kernel (EVMS team)
  21. Started Rewrite of the framebuffer layer (James Simmons)
  22. Started New driver model & unified device tree (Patrick Mochel)
  23. Started Rewrite of the console layer (James Simmons)
  24. Started More complete NetBEUI and 802.2 net stacks (Arnaldo C de M)
  25. Draft #2 New lightweight library (klibc) (Greg KH)
  26. Draft #3 Replace initrd by initramfs (hpa, Al Viro)
  27. Planning Change all drivers to new driver model (All maintainers)
  28. Planning Add thrashing control (Rik van Riel)
  29. Planning Remove all hardwired drivers from kernel (Alan Cox, etc)
  30. Planning Porting all input devices over to input API (James Simmons)
  31. Planning generic parameter/command line interface (Keith Owens)

I hope this is helpful. Enjoy!

Several folks pointed out that JFS (item 12) was actually from IBM, not SGI. Russell King also pointed out another 2.5 feature, in which 'serial.c' would be undergoing a restructuring. Dave Jones added that CPU clock/voltage scaling would be going into 2.5 (to which Erik Mouw offered (regarding the ARM code) "The basic support is stable. We need to sort out a nicer way to get the memory timing variables, but that's only an initialisation issue we can add later on." ). Jose Luis Domingo Lopez also replied to Guillaume's initial post:

Great !!!. I think this "todo list" was much needed. Let me suggest a couple of things that should find their way into 2.5.x sometime in the futute, and that I consider important:

There are other interesting things I don't follow as closely, and that could be available when 2.5.x is still under development, maybe not.

That is what I think are the most important things that _could_ be part of future 2.6.0, should maintainers consider it as a good idea. Please, this is only informative, not trying to "sell" you nothing.

At his point, Hans Reiser asked, "Have you heard anything about when Linus intends to code freeze? In my planning I am assuming Sept. 30 is way earlier than 2.6 would ship. I remember how long 2.4 took, and I simply assume 2.6 will be the same. At any rate, there is no way we'll be done earlier than September: it is a deep rewrite. Code looks so much better than the old code...., but it is completely new code." Alan Cox replied:

If Linus says september freezes in september and ships for christmas I will be most suprised. If he says september freezes the may after and ships the december after that I'd count it normal

Personally I'd really like to see the block I/O stuff straightened out. The neccessary VM bits done, device driver updates and a September freeze. I think it can be done, and I think the resulting kernel will be way way better for people with 1Gb+ of RAM, so much better that its worth making a clear release at that point.

Hans replied, "Let us encourage him to give us some warning, like 60 days of warning. Let us also encourage him to code freeze VM and VFS first not last (I think he agrees with this fortunately). I am not going to say anything about when I would like that freeze to hit, except that we won't be ready before September/October because I am finally able to take the time to do things right in the design and so I will. If he freezes in ~September, we'll have an experimental Reiser4 for him."

4. Cleaning Up mtrr.c

17 Jan 2002 (3 posts) Subject: "[patch] getting rid of suser/fsuser for good, first part"

People: David WeinehallDave Jones

David Weinehall said, "It is after all 2.5-time, and hence time for a spring-cleaning." He posted a patch to fix up various things; one file included in the patch was arch/i386/kernel/mtrr.c, to which Dave Jones said, "This file in particular needs more than just a spring clean imo. As extra support was added for the different MTRR lookalikes, it got messier and messier until it turned into the goop we have now. Doing a real cleanup on this has been on my TODO for months now. Hopefully I'll get around to it in the 2.5 timeframe." David took another look and agreed wholeheartedly.

5. Problems

18 Jan 2002 - 23 Jan 2002 (14 posts) Subject: " problem..."

Topics: Networking

People: H. Peter AnvinStephan von KrawczynskiLarry McVoyCraig I. HaganDaniel Phillips

H. Peter Anvin reported: stopped responding to most services some time this morning. Unfortunately, the system management port on the new server was never connected at the time the new server was installed, so we can't debug the problem remotely.

I am trying to reach people at ISC to get the management port connected and the system back to normal.

Elsewhere, under the Subject: now operating in "mirrors only mode" () , he said:

Since it looks like it is going to take some time to resolve the wiring problem affecting, I have put the system in "mirrors only mode." This means mirror sites can update, but no user traffic is allowed -- except obtaining the list of mirrors.

This wiring problem is an unfortunate consequence of having had to push the new server into service on a lot shorter notice than expected back in December. Once we can get it resolved these kinds of problems should hopefully not recur.

Elsewhere, under the Subject: Would anyone be willing to host a second site? () , he said:

The recent troubles we've had at pretty much highlight the issues with having an offsite system with no easy physical access. This begs the question if we could establish another primary site; this would not only reduce the load on any one site but deal with any one failure in a much more graceful way.

Anyone have any ideas of some organization who would be willing to host a second server? Such an organization should expect around 25 Mbit/s sustained traffic, and up to 40-100 Mbit/s peak traffic (this one can be adjusted to fit the available resources.)

If so, please contact me...

Stephan von Krawczynski offered, "I have a slightly different suggestion, that may be interesting. Years ago we did a DNS-project that allows to spread a domain to several _different_ ip locations based on the dns-_requesting_ ip. You may know such a technique from akamai (MS daughter). In fact we implemented and test-runned it years ago, but did not find any customer interested (in fact only real big customers _can_ be interested at all, and we didn't have the "connections"). Anyway the know-how is still here and can be used to help, if interested. The basic idea is, that this splits costs in running to several locations. These locations can (e.g.) be providers who may have some strategic interests. You may as well come up with a GNU project of spreading mirrors - meaning every provider be it small or big can have its own mirror, and _only_ his customers (depending on their IP or his AS) are using it. So if you have a major breakdown at the primary server, people will get no _new_ pages, but itself looks up throughout all IP-ranges that have a mirror "attached"."

Elsewhere, Larry McVoy suggested to H. Peter, "We've priced this" (H. Peter's bandwidth requirements) "lately and I think the cheapest you are looking at is around $6500/month for a 25Mbit connection. That's not a huge amount of money but it's enough that it shows up on people's radar screens as a line item, it's $80K/year, so there would have to be some justification." H. Peter replied, "No doubt. In this similar vein, it's probably good to point out to people just how much ISC's contribution to is actually worth..." Larry replied:

No kidding. I didn't realize they were hosting it, that's very nice of them.

Do you have any statistics on what percentage of the download traffic is whole kernels versus patches? If most of the traffic is whole kernels, I think I might be able to offer up a fix for that.

H. Peter did not reply to this, but Craig I. Hagan suggested, "this is something that hooking into a cache heirarchy (e.g. NLANR) might help resolve."

Elsewhere, under the Subject: is back online... () , H. Peter said, " is back online, and is now wired for proper remote management so we hopefully should avoid these problems in the future. I have also gotten a fair number of offers for a second-server site; I will need some time to sort through them all figure out which one is best for us. In general, my preference is for another site which looks as much as possible like the existing one, i.e. a dedicated server, preferrably of the same type."

Elsewhere, under the Subject: setting the record straight () , he said:

It has come to my attention there are some unfounded rumours about the outage from this past weekend, and I wanted to set the record straight:

In particular, the outage was *not* caused by any kind of failure in the new Compaq server hardware. It worries me a bit that this particular rumour was circulating, since it reflects badly on a donor who has just provided us with a very nice machine.

The approximate history of the failure is as such:

Back in December, we were already planning to replace the old server hardware after repeated problems, probably age-related. We were originally planning to put the new server in production after the holidays, however, when the old server really started to flake out on us we decided to push it in service early.

ISC was very accomodating and arranged for us to put it in service on short notice, despite several logistics problem. One of those problems was the lack of a configured port for the management card in the new server. As a result, it did not get wired up at that time.

This past Friday morning, the kernel on the server apparently stopped servicing user-space processes; the details aren't known, other than the fact that pings and TCP SYNs received replies, but we couldn't get any actual data across. Futhermore, on and off there was as much as 95% packet loss in pings, although the machine didn't stop responding to pings until it was power cycled on Sunday.

Due to a miscommunication between myself and the staff at ISC we weren't able to get the machine power cycled until Sunday (it did not return from the power cycle) and the management port connected until Monday. Once the management port got connected, it was a 5-minute job to bring the machine back to life.

Finally, I would like to thank the many people and organizations who have offered to host another server in one way or another. I'm going to be evaluating our options, with the goal of getting at least one additional server if at all possible. If so, my preference will definitely be to try to obtain identical hardware with the one we currently have.

Daniel Phillips asked which kernel version had been running on Friday, and H. Peter replied 2.4.16.

End of thread.

6. Reorganization

24 Jan 2002 (1 post) Subject: "2.5.3-pre5 - Configurator updates"

Topics: Kernel Release Announcement

People: Linus Torvalds

Linus Torvalds announced Linux 2.5.3-pre5, saying:

Due to a private flame-war about configuration options and behaviour, I just broke down and did the thing I had asked for from others for a long time - split up the huge unwieldly "" file into many smaller ones that are closer to the option they describe.

The split was largely automated, with one 25-thousand line file being split up into almost 100 smaller files. The placement was also entirely automated, and basically moved each entry to the same subdirectory where the config question for that entry was located (and if the question was duplicated over several places, the help was duplicated too).

This should make it _much_ easier to have things like per-architecture config entries that the rest of the world simply doesn't want to know about and has no interest in.

[ The automated nature of the split also showed that some questions are sometimes oddly placed. Some minimal movement was done to fix the worst offenders, but hopefully we can organize some of it more logically in the future. ]

However, I would ask that maintainers of config tools, drivers and architectures would check that their config help entries still exist, and seem to work. Of the tools, only the basic "make config" script has been updated to know about the change in location, so menuconfig/xconfig simply will not find the help right now.

Oh, non-x86 architectures probably need to fix up their config menu entries too, as I only did a minimal "move architecture-common entries to init/" edit (to not duplicate all the questions/help messages that are common to all architectures), and I'm absolutely positive that the menu structure didn't move correctly.

Other changes tend to get overwhelmed in the patch (happily it's not every day that you clean up the largest file in the whole archive), but ChangeLog appended so you can see what else happened.

There was no reply.







Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.