Kernel Traffic
Home | News | RSS Feeds | Mailing Lists | Authors Info | Mirrors | Stalled Traffic

Hurd Traffic #115 For 18�Mar�2002

By Paul Emsley

Mach 4 | Hurd Servers | Debian Hurd Home | Debian Hurd FAQ | debian-hurd List Archives | bug-hurd List Archives | help-hurd List Archive | Hurd Reference Manual | Hurd Installation Guide | Cross-Compiling GNUMach | Hurd Hardware Compatibility Guide

Table Of Contents

1. OSKit-Mach booting nits

8�Mar�2002�-�11�Mar�2002 (17 posts) Archive Link: "oskit-mach debugging log"

Topics: Bootloaders

People: Roland McGrath,�Douglas Hilton,�Daniel Wagner,�Marcus Brinkmann

Doug Hilton is in the process of trying to build an mach microkernel that will support SMP. Doug is trying to use the oskit-mach and is using remote debugging. He posted his most recent crash (related to cpu type assignement). Daniel Wagner advised him that he may need the "--enable-indirect-osenv" argument to OSKit's configure.

Doug was still at a loss as how to proceed and waited for Roland McGrath to reply... which he did:

I would not recommend either [using gcc-3.0 nor disabling the optimization], especially for the oskit. It's not unlikely that the linux driver code in the oskit won't compile properly if you try either one of these. The proven method is to use gcc 2.95.x and -O2.

Re: viewing the registers on the 2nd CPU using gdb:The stub talking on the serial line is only ever running on one CPU at a time. The base_critical_enter/base_critical_leave calls in oskit/kern/gdb_serial.c are there to ensure this stays reasonably sane. Each CPU that hits a trap that should go to gdb will call the oskit function `gdb_trap' (via kernel_trap). If the segments or IDT or whatnot is too fouled up to get to the trap handlers properly (through the locore.S code to kernel_trap in i386/i386/trap.c), then you are SOL. If kernel traps are working on that CPU, then it will get into teh gdb_serial.c code and do base_critical_enter. If another CPU is already in that code, the second CPU will spin until the first one finishes and calls base_critical_leave. So once you get a CPU talking to gdb, you will keep talking to the same CPU until gdb tells it to continue running (at which time the other CPU can come in and talk to gdb). (There are hooks into the gdb code that we could use to write SMP-aware code that would report each CPU to gdb as a separate thread so you could use the gdb thread commands to select the CPU. But that would be a bit of hacking, and it would rely on the IPIs working right and such--not really a way to debug the SMP setup itself.)

Doug came back with improved debugging - saying that there is a crash on the third time though, in ipc_port_release_send() (oskit-mach/ipc/ipc_port.c:1126).

Marcus Brinkmann diagnosed this as a bug in the boot script cleanup handler. He also recommended going back and trying to boot using serverboot (the old and more tried and trusted way). Crucially, Marcus added that oskit-mach did not like Doug's GRUB boot configuration.

Indeed, that seemed to be the case, shortly afterwards, Doug wrote back saying " mission accomplished. ${root} and $(task-create) are NULL. "

There was some discussion about the "-T typed" argument, Roland saying "We use -T typed so that root can be set using either the device type, e.g. device:hd0s2 or the part type, e.g. part:2:device:hd0."

Doug seemed to be happy: "It booted up in multiuser mode just spiffy. Thanks a lot for all the help guys! Whew! (A great weight is lifted) Now onto the important stuff! "

2. I/O permission bitmap

28�Feb�2002�-�11�Mar�2002 (12 posts) Archive Link: "I/O permission bitmap patch for oskit-mach"

People: Marcus Brinkmann,�Thomas Bushnell,�Roland McGrath

Marcus Brinkmann posted a close-to-final version of his patch to I/O permissions bitmaps for oskit-mach, saying:

Design flaw: Changes don't propagate immediately to other threads in the same task running on another processor. This applies to enable as well as disable operations! Espen Skoglund pointed out that we only need to propagate disable operations, enable operations could be picked up in the fault handler (for extra performance).

In real life, it might be easy for us to make sure that it doesn't happen by enabling the ports before creating other threads, and by never deleting them. This should work fine for the console server, and it should also be fine for single threaded apps like X.

Thomas Bushnell picked up on this, wondering why it was so: "Why not just suspend the threads around the operation and then resume them (forcing them to reload processor state)? "

Marcus replied:

Last time it was discussed, we said that we should probably use the same mechanism as is used for page maps, that is inter-CPU interrupts. The only reason I did not do it is that I have only a limited understanding of the CPU synchronization code, and I would not like to touch it. I also don't think that I have the time to learn enough about SMP right now to get this right.

I haven't considered alternatives, like suspending threads. I am not sure if that would work. I guess that if you do it within the I/O bitmap lock, it could be safe, although it seems to me that it is potentially much slower than the other approach (not that it would matter much).

If that approach actually works, we might put it in for the SMP case until we have a better solution?

This was OK for Thomas: " I'm happy to leave it as a documented bug, and not worry about it until SMP is working. ("Documented" means somewhere other than comments in the code and mailing list entries. :))) "

Roland McGrath reviewed the patch, made some minor comments and also added: "For the SMP problem, it would certainly be simple to hack the existing check_io_fault code to notice when the task's iopb has changed, reload it, and retry the instruction. But I am really not concerned about this problem cropping up in practice--certainly it won't before we finish making SMP work at all! :) "

Marcus asked: "Do you think the performance increase is worth the memory overhead of a zone, esp if you preallocate iopb's (each being 8192 bytes huge)?"

Roland replied to this: "What memory overhead are you talking about? The zone_t data structure is just a few words. The question is basically whether or not you have a free list for fast allocation. There is no unnecessary extra memory used, because the free list is returned when memory is tight (ZONE_COLLECTABLE). "

Marcus checked in his code.

3. Syslogging and Cores

4�Mar�2002�-�13�Mar�2002 (15 posts) Archive Link: "core file writing"

Topics: FS: procfs

People: Jon Arney,�Marcus Brinkmann,�Roland McGrath,�Neil Walfield,�Mark Kettenis

Users of GNU/Hurd will know, if they've poked around it a bit - that it the syslogging is thin on the ground. Jon Arney suggested a means to improve the Hurd's logging with a "log daemon for the Hurd for the sake of keeping track of hurd translator events such as filesystem errors, auth daemon problems, or other information relating to the operation of hurd daemons" .

Neil Walfield was more of a mind to fix up syslog (if necessary).

Jon's worry about using syslog was "if, say, 'pfinet' needed to log something and syslog were configured to write to UDP ports, pfinet would have to write to a UDP port causing perhaps a recursive loop of log calls. I realize that this is ultimately a pathological case, but it's worth considering."

But Marcus Brinkmann pointed out "If sending something over a socket causes the socket server to log a message, then you have a serious problem anyway."

Jon commented: "take an example look at Linux's kernel logging. Obviously a 'printk' inside a file 'write' function is dangerous even in Linux. However, the mechanism, while flawed in the general case is still useful in a wide variety of special cases. I hate to keep picking on Linux for my examples, but it is a useful and instructive O/S. I'd hate to throw the baby out with the bathwater in terms of design approach. "

With the Hurd, things are not so pathological, as Marcus pointed out: "The good thing is that we [Hurd-users] can turn down the server in an emergency, while the rest of the system is hopefully still up and running, and can at least attempt a sane shutdown (this is what init does when it sees the death of a critical system server). That's why I think that logging once and then dying is more useful than logging recursively (and eventual die, with the disk being full or something). "

There was some unresolved discussion about whether it should also be possible to let the use decide what should happen in such a situation and give him at least the chance to get control over the situation back.

In parsing (and starting a sub-thread), Marcus picked on the issue of writing core files: "another thing that would certainly be useful to have is support for core files. The crash server works today, we just don't have a sensible core file format, and a function that dumps such a core file (and another function that allows gdb to read it back). "

Roland encouraged someone to implement this saying that it was not very hard to do.

So Jon tried. He had a look at the Linux ELF core writer and thought that the header info could be written using proc_getprocinfo and related functions but needed more info on how to get the user register state and the VM of the process.

Marcus agreed that much core writing code could be shared but warned: "The main difference that comes to mind is that we have native (kernel-level) threads, and the threads state needs to be stored in the core file as well. "

Roland advised:

The proc server is not really much involved. You need to understand the structure of the system and what Mach tasks and threads are about in some detail before attempting to work on this.

Certainly writing an ELF core file is the right thing to do. How to store the memory is clear, and that part of the file format is just the same as for Linux and other systems that use ELF core files. You can look at the `vminfo' program (utils/vminfo.c) to see how to use vm_region to examine the address space of a task, and then you use vm_read to get the data.

The rest of the information about register state and so on is a little less clear. How we will store it is clear: in ELF notes as other systems do. But how many notes and what formats to use for them is up to us. We need to choose those note formats, and modify gdb to understand them. Mark Kettenis is the authority in the Hurd native gdb port, and I will tend to defer to him on what these formats should be.

I suggest that you start out by just doing the memory dump and not worrying about the rest at all (no notes). Rather than hacking on the crash server directly, it will be easiest to debug this in a little standalone program that just uses pid2task to get the task port of a process, suspends it, and dumps its memory as an ELF core file (i.e. a minimal "gcore"). I might get around to whipping up the basic memory-dumping code soon, and then you could see how it's done and debug the code for me.

The next day, Jon wrote back saying:


At long last, I have succeeded in getting a program to produce and read a core file under the Hurd.

This is not a done project, but I think enough is working to upload some patches. There are some known issues/limitations which I have tried my best to document below.

I can't take much credit for this work at all since Roland McGrath has done a lot of the heavy lifting. I also want to say thanks for the help he has given me and his patience with me. I now have a much greater appreciation for the chain of events following a dereference of an invalid pointer. :)

Roland seemed pleased that Jon had been working on this and asked him if he had tested the "gcore" command. Jon said that 'generate-core-file' in GDB seemed to work.

Roland followed up with advice about Hurd hacking: "

As to the Hurd project in general, there is always more hacking to be done. See the tasks and TODO files, and post here about items you are interested in working on. Marcus might have some opinions about priorities to be of most immediate help.


Finishing off, Roland said:

We now have core dumps! Many thanks to Jon Arney for his hacking on this that spurred me to write some new code, and for helping to debug it.

I have now added a sys/procfs.h header file installed by the hurd package (it's in hurd/include/sys/procfs.h). This defines the data structures used in the note segment in ELF core files. I've added new code to the crash server to write ELF core files using these data structures. The note formats are patterned after those used by Solaris, for which the gdb code to read the formats based on sys/procfs.h types already exists and handles multiple threads.

An old gdb probably ought to be able to read one of these core dumps enough to tell you about the memory regions with `info files'. You can also use objdump and elfdump to see that the contents look sane for the process you dumped.

gdb should understand these core files every bit as well as it does on any other platform if you make sure that the hurd's new sys/procfs.h is installed (there are no other related header changes, so you can just copy it into /include) and rebuild gdb with the following patches. (These patches are against the current gdb in cvs, but they should apply fine to 5.1 as well.) Make sure you re-run configure you don't have an old ../config.cache file in your gdb build, since it will contain lies about sys/procfs.h and its contents.

This should also be enough for gdb's "gcore" command to work. It creates a core file with less complete information than the crash server's core files will contain, but it should be enough for gdb to read it back in again.

4. New libstore

5�Mar�2002�-�12�Mar�2002 (8 posts) Archive Link: "roland_libstore_modules_branch"

Topics: FS: ext2

People: Roland McGrath,�Paul Emsley,�Marcus Brinkmann

Roland McGrath asked if anyone had tried roland_libstore_modules_branch. Roland had added this (filesystem library (used, for example by GNU Parted)) branch to the Hurd's CVS last month, saying at the time:

For static linking, you have to make specific reference at link time to each store class you want to have available by name at run time. Any module that you get linked in from libstore.a will automagically go into the standard list (because each one has a `store_std_classes' section pointing to the `struct store_class' it defines). So e.g. if you call `store_gunzip_open' or `store_gunzip_create' in your program, then the "gunzip" store type will be available to store_open et al as well. But if you don't refer to a given type, it won't be linked in. (In practice you wind up always getting the "file" type and the pseudo-types like "typed" and "query" just by virtue of calling store_open or store_parsed_open.) To get a desired set of types into a static link, you can use `-lstore_TYPE' (i.e. `-lstore_gunzip' for the "gunzip" type module). That works because there is a libstore_TYPE.a installed for each standard TYPE, which is actually a linker script with the same effect as the linker switch `-u store_TYPE_class'.

When using the shared library (which is almost always the case), the same link-reference plan applies. However, this is extended to the sections named `store_std_classes' not just of the executable but of each shared object that's loaded. Since still contains all the standard types, this is in the usual case just like the current fixed array. However, the functions that look for a store type by name at run time will also try to load an external module using dlopen if there is no existing type of that name. It dlopen's `' for the type "foo", and uses dlsym to look up `store_foo_class'. Note that once loaded for a successful open, the module is never unloaded. Thereafter, that module (and any other modules you might have dlopen'd independent of libstore) get searched for `store_std_classes' sections. Because of this (and dlopen's own redundancy detection), you could have a single module that defined several classes and symlink together the different names by which it might be first loaded.

It had been my original thinking to take all of the standard modules out and make them individually loaded on demand. But all the modules we have are in fact so tiny that the extra overhead and memory wasted for page alignment and redundant relocs would be just silly. So I left all the existing modules in the main library. For static linking, this means that if you call e.g. store_gunzip_open directly then you don't need -lstore_gunzip -lstore, just -lstore. In the shared library, it means that existing types are in practice found in basically the same way as before.

Obviously, the important thing that the new dynamic features make possible is to have new store type implementation modules that are not part of the hurd package itself and can be developed and maintained separately. For example, nbd could be moved out into a standalone nbd_client package for parity with the Debian package for Linux (along with a simple `nbd-client' shell script to give a work-alike interface). In the case of nbd, that doesn't really buy anything because libstore/nbd.c is simple and self-contained (but it would make sense if e.g. the nbd protocol might change to get the client and server from related interdependent packages instead of the client being part of the basic hurd package).

Marcus Brinkmann had had a look at it showing that concatenating files as a file sytem with seemed to work (and actally fixed a couple of bugs). This is what he did:

settrans -ac testimg.node ./ext2fs.static -T concat @file:testimg1@file:testimg2

Where testimg1 and testimg2 are the two parts of a full filesystem image -

(ed. [Paul Emsley] cute, eh?)

Marcus did not test the dynamic loading functionality because module.c was not included in the branch. Roland fixed that but Marcus has apparently not retried it.

Under the subject gunzip store trouble identified there was some discussion of the gunzip store - but no apparent resolution yet.

5. argp limitation

10�Mar�2002�-�11�Mar�2002 (15 posts) Archive Link: "argp limitation"

People: Roland McGrath,�Marcus Brinkmann,�Thomas Bushnell

First an introduction to argp seems appropriate. Roland McGrath had previously described it: "argp is a set of library functions for parsing command line arguments in a highly modular way. It was originally written (by Miles Bader) for the Hurd, and is used extensively in the Hurd libraries and programs. However, there is nothing really Hurd-specific about it. It is now part of GNU libc (2.1 and later, on all platforms) and no longer in the Hurd sources. See the header file argp.h, and there is a section in the GNU libc manual that documents it. "

Marcus was considering using argp in is console server:

I wanted to use a hierarchy of argp parsers in the console server, one in each driver module, one for focus groups, one for consoles, and one for the main program. However, argp parents and childs shade the options of other childs, in the sense that argp calls only one parser for each recognized option. It is not possible to have common options in different parsers this way (I wanted to enable/disable parsers in parent parsers).

Note: pfinet would have the same problem, except that it avoids it by lack of modularity. libstores would have the same problem except it avoids it by lack of flexibility (all arguments for a store are encoded in the store name). I think both are not acceptable work arounds here, because drivers are platform specific and complex. Here is an example potential console command line I have in mind:

/hurd/console --encoding isolat1 --console maincons --output-device vga --vt 1 --encoding utf8 --focus-group mainfocg --focus maincons --input-device pckbd --layout de --input-device mouse --name mouse1 --device /dev/ttyS0 --protocol mouseman

Or, for example, to remove the mouse: fsysopts /node --remove mouse1

Roland quite liked the idea: "I think your conclusions are right. I don't think argp's are intended to be used so you shadow options. If you wanted to do that, you could call another argp's parse_opt function from yours via the children pointers. That latter style is what I would have suggested to begin with, I think--it's simple and clean for a parse_opt to just consume as many arguments as it wants to rather than having argp somehow involved in the control flow of what is conceptually a procedure call at that point in the argument parsing anyway. "

Marcus was considering using a subset of the X server protocol to handle the keyboard LEDs.

Roland got grumpy with Thomas Bushnell as they argued about dynamically changing the console server options with a setopts command. The issue seemed unresolved.

6. Memory manager issues

13�Mar�2002 (9 posts) Archive Link: "memory_object_lock_request and memory_object_data_return fnord"

People: Neal Walfield,�Thomas Bushnell,�Neil Walfield

This was a conversation between Neil Walfield and Thomas Bushnell on the subject of Mach and memory management. Neil started with:

If a memory manager issues a memory_object_lock_request to gain read/write access to a set of pages (i.e. evoking all of the kernel's access rights), the kernel will eventually return any modified pages in the range using the memory_object_data_return message and end the sequence with a memory_object_lock_completed message. This is similar to normal eviction path.

In the latter case, the page is returned to the memory manager with the expectation that it will immediately be written to the backing store and freed. The kernel cannot, however, be sure. As such, it changes the association of the page from the manager to the trusted default pager. This way, if the manager fails to free the page in a timely fashion, the kernel can still flush it to swap.

Yet, what happens in the former case? Specifically, how will the returned pages be evicted? Will the kernel send another memory_object_data_return? Unlikely. It has nothing to return. So, the page will be evicted via the default pager. What is the correct way to give the management of the page back to the kernel? Perhaps, we could use the memory_object_data_supply message (and if we modified it, we could supply it as precious). _The Mach 3 Kernel Interfaces_ cautions against supplying data that has not been explicitly request, however, it does not prohibit it. And yet, what if milliseconds from now we get another message to write to the page? Again, we need to go through the same song and dance -- request the page, write to it and return it to the kernel.

Thomas Bushnell wanted to know what Neil was trying to achieve and replied: "Note that you have provided a *clean* page back to the kernel (even if it was dirty when you got it from the kernel); as a consequence, the kernel might delete the page now, losing any changes that have been made So you must mark it precious even if you didn't modify it. In otherwords, the return of the page to you, and you supplying it back, has cleared the dirty bit for the page."

The followed a quick exchange about when the dirty bit was set.

As to his motivation, Neil replied:

I am trying to understand the motivation for having a relatively complex interface to manage page ownership which, in the Hurd, we do not use.

From what I can see, the pager_memcpy function can be extremely slow. Just consider what happens when we just want to replace a page on disk (which has not yet been read in to memory). pager_memcpy causes a page fault. The kernel sends a message to the manager which reads the page from disk (which is completely unnecessary), then, we write to the page and eventually, it is flushed back to the disk. This is even worse if we are writing to multiple pages -- our thread and the manager thread play ping-pong! This could be avoided by acquiring as much of the range up front as possible.

Thomas replied:

Ah, the principal motivation is to allow, for example, a pager to manage pages shared between many "kernels". The reason to demand pages back or require locking, in general, is so that you can hand them up to other "kernels".

In principle, we need to do this already! The glaringest security issue with the Hurd right now is the assumption that all users will just take their pagers and hand them to the kernel with vm_map. But they might play as "kernels" themselves. To deal with this, the pagers need to be able to deal with multiple "kernels", and also have strategies for dealing with recalcitrant "kernels" that aren't behaving properly.

Re: pager_memcpy: What's supposed to be going on behind the scenes is that the kernel should detect that you are faulting the pages in sequentially, and ask for pages from the pager ahead of time, optimizing the sequential access case.

Neil replied:

But the manager knows; why force the kernel to guess? Say we send:

io_read (file, data, vm_page_size * 4, 0, &amount)

By the time the kernel detects a sequential read, it is already too late to be of any use.

This, according to Thomas is an old Mach debate.

Correcting Neil, he said that the manager doesn't actually know any better than the kernel: "Only the *user* knows what the access pattern is. (Even if the "user" resides in the same task as the pager.) " . Thomas suggested future plans to allow the user to declare things about memory regions mapped in their address space.

Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.