<kc version="0.1.0"> 
 
<headquote><a href="http://www.cs.utah.edu/projects/flux/mach4/html/">Mach 
4</a> | <a href="http://www.gnu.org/software/hurd/hurd.html">Hurd Servers</a> 
| <a href="http://www.gnu.org/software/hurd/debian-gnu-hurd.html">Debian Hurd 
Home</a> | <a href="http://www.debian.org/ports/hurd/hurd-faq">Debian Hurd 
FAQ</a> | <a href="http://lists.debian.org/#debian-hurd">debian-hurd 
List Archives</a> | <a 
href="http://mail.gnu.org/pipermail/bug-hurd/">bug-hurd List Archives</a> | 
<a href="http://mail.gnu.org/pipermail/help-hurd/">help-hurd List Archive</a> | 
<a href="http://www.gnu.org/software/hurd/reference-manual.html">Hurd Reference 
Manual</a> | <a href="http://web.walfield.org/pub/people/neal/papers/hurd-installation-guide/english/hurd-install-guide.html">Hurd 
Installation Guide</a> | <a 
href="http://pages.hotbot.com/sf/igorkh/gnumach-cross.txt">Cross-Compiling 
GNUMach</a> | <a 
href="http://www.urbanophile.com/arenn/hacking/hurd/hurd-hardware.html">Hurd 
Hardware Compatibility Guide</a></headquote> 
 
<title>Hurd Traffic</title> 
 
<author contact="mailto:paule@chem.gla.ac.uk">Paul Emsley</author> 
 
<issue num="115" date="18 Mar 2002 23:00:00 -0800" /> 

 
<p>After an extended hiatus, the Hurd summaries are back (I'be been
busy).  I can't promise that they will happen every week from now, but
they should be more frequent than they have been in recent months.</p>

<section
  title="OSKit-Mach booting nits"
  subject="oskit-mach debugging log"
  archive="http://lists.debian.org/debian-hurd/2002/debian-hurd-200203/msg00075.html"
  posts="17"
  startdate="08 Mar 2002 22:48:57 -0800"
  enddate="11 Mar 2002 04:30:16 -0800"
>
<topic>Bootloaders</topic>

<mention>Daniel Wagner</mention>
<mention>Marcus Brinkmann</mention>

<p>Doug Hilton is in the process of trying to build an mach
microkernel that will support SMP.  Doug is trying to use the
oskit-mach and is using remote debugging.  He posted his most recent
crash (related to cpu type assignement). Daniel Wagner advised him
that he may need the "--enable-indirect-osenv" argument to OSKit's
configure.  </p>


<p>Doug was still at a loss as how to proceed and waited for Roland
McGrath to reply... which he did: </p>

<quote who="Roland McGrath">
   
<p> I would not recommend either [using gcc-3.0 nor disabling the
optimization], especially for the oskit.  It's not unlikely that the
linux driver code in the oskit won't compile properly if you try
either one of these.  The proven method is to use gcc 2.95.x and
-O2.</p>

<p>Re: viewing the registers on the 2nd CPU using gdb:The stub talking
on the serial line is only ever running on one CPU at a time.  The
base_critical_enter/base_critical_leave calls in
oskit/kern/gdb_serial.c are there to ensure this stays reasonably
sane.  Each CPU that hits a trap that should go to gdb will call the
oskit function `gdb_trap' (via kernel_trap).  If the segments or IDT
or whatnot is too fouled up to get to the trap handlers properly
(through the locore.S code to kernel_trap in i386/i386/trap.c), then
you are SOL.  If kernel traps are working on that CPU, then it will
get into teh gdb_serial.c code and do base_critical_enter.  If another
CPU is already in that code, the second CPU will spin until the first
one finishes and calls base_critical_leave.  So once you get a CPU
talking to gdb, you will keep talking to the same CPU until gdb tells
it to continue running (at which time the other CPU can come in and
talk to gdb).  (There are hooks into the gdb code that we could use to
write SMP-aware code that would report each CPU to gdb as a separate
thread so you could use the gdb thread commands to select the CPU.
But that would be a bit of hacking, and it would rely on the IPIs
working right and such--not really a way to debug the SMP setup
itself.)
  </p>

</quote>  

 
<p>Doug came back with improved debugging - saying that there is a
crash on the third time though, in ipc_port_release_send()
(oskit-mach/ipc/ipc_port.c:1126). </p>


<p>Marcus Brinkmann diagnosed this as a bug in the boot script cleanup
handler.  He also recommended going back and trying to boot using
serverboot (the old and more tried and trusted way).  Crucially,
Marcus added that oskit-mach did not like Doug's GRUB boot
configuration. </p>

 
<p>Indeed, that seemed to be the case, shortly afterwards, Doug wrote
back saying <quote who="Douglas Hilton"> mission accomplished. ${root}
and $(task-create) are NULL. </quote> </p>

<p>There was some discussion about the "-T typed" argument, Roland
saying <quote who="Roland McGrath">We use -T typed so that root can be
set using either the device type, e.g. device:hd0s2 or the part type,
e.g. part:2:device:hd0.</quote> </p>


<p>Doug seemed to be happy: <quote who="Douglas Hilton">It booted up
in multiuser mode just spiffy. Thanks a lot for all the help guys! 
Whew! (A great weight is lifted) Now onto the important stuff!
</quote> </p>

</section>
 
<section
  title="I/O permission bitmap"
  subject="I/O permission bitmap patch for oskit-mach"
  archive="http://mail.gnu.org/pipermail/bug-hurd/2002-March/006689.html"
  posts="12"
  startdate="28 Feb 2002 16:10:35 -0800"
  enddate="11 Mar 2002 12:03:54 -0800"
>

<p>Marcus Brinkmann posted a close-to-final version of his patch to
 I/O permissions bitmaps for oskit-mach, saying: </p>
<quote who="Marcus Brinkmann">

<p>Design flaw: Changes don't propagate immediately to other threads in
the same task running on another processor.  This applies to enable as
well as disable operations!  Espen Skoglund pointed out that we only
need to propagate disable operations, enable operations could be picked
up in the fault handler (for extra performance).
 </p>

<p>In real life, it might be easy for us to make sure that it doesn't happen
by enabling the ports before creating other threads, and by never deleting
them.  This should work fine for the console server, and it should also
be fine for single threaded apps like X.
 </p>

</quote>

<p>Thomas Bushnell picked up on this, wondering why it was so: <quote
who="Thomas Bushnell">Why not just suspend the threads around the
operation and then resume them (forcing them to reload processor
state)?  </quote> </p>
 
<p>Marcus replied:</p>
<quote who="Marcus Brinkmann">
   
<p>Last time it was discussed, we said that we should probably use the
same mechanism as is used for page maps, that is inter-CPU interrupts.
The only reason I did not do it is that I have only a limited understanding
of the CPU synchronization code, and I would not like to touch it.  I also
don't think that I have the time to learn enough about SMP right now to get
this right. </p>

<p>I haven't considered alternatives, like suspending threads.  I am not sure
if that would work.  I guess that if you do it within the I/O bitmap lock,
it could be safe, although it seems to me that it is potentially much
slower than the other approach (not that it would matter much).
 </p>

<p> If that approach actually works, we might put it in for the SMP
case until we have a better solution?  </p>
</quote>


<p>This was OK for Thomas:<quote who="Thomas Bushnell"> I'm happy to
leave it as a documented bug, and not worry about it until SMP is
working.  ("Documented" means somewhere other than comments in the
code and mailing list entries. :)))
</quote> </p>

<p>Roland McGrath reviewed the patch, made some minor comments and
also added: <quote who="Roland McGrath">For the SMP problem, it would
certainly be simple to hack the existing check_io_fault code to notice
when the task's iopb has changed, reload it, and retry the
instruction.  But I am really not concerned about this problem
cropping up in practice--certainly it won't before we finish making
SMP work at all! :) </quote> </p>

<p>Marcus asked: <quote who="Marcus Brinkmann">Do you think the
performance increase is worth the memory overhead of a zone, esp if
you preallocate iopb's (each being 8192 bytes huge)?</quote> </p>
 
<p>Roland replied to this: <quote who="Roland McGrath">What memory
overhead are you talking about?  The zone_t data structure is just a
few words.  The question is basically whether or not you have a free
list for fast allocation.  There is no unnecessary extra memory used,
because the free list is returned when memory is tight
(ZONE_COLLECTABLE).  </quote> </p>

<p>Marcus checked in his code.</p>


</section>

<section
  title="Syslogging and Cores"
  subject="core file writing"
  archive="http://mail.gnu.org/pipermail/bug-hurd/2002-March/006715.html"
  posts="15"
  startdate="04 Mar 2002 10:33:21 -0800"
  enddate="13 Mar 2002 14:15:43 -0800"
>
<topic>FS: procfs</topic>

<mention>Neil Walfield</mention>
<mention>Mark Kettenis</mention>

<p>Users of GNU/Hurd will know, if they've poked around it a bit -
that it the syslogging is thin on the ground.  Jon Arney suggested a
means to improve the Hurd's logging with a <quote who="Jon Arney">log
daemon for the Hurd for the sake of keeping track of hurd translator
events such as filesystem errors, auth daemon problems, or other
information relating to the operation of hurd daemons</quote>. </p>

<p>Neil Walfield was more of a mind to fix up syslog (if necessary).</p>

<p>Jon's worry about using syslog was <quote who="Jon Arney">if, say,
'pfinet' needed to log something and syslog were configured to write
to UDP ports, pfinet would have to write to a UDP port causing perhaps
a recursive loop of log calls.  I realize that this is ultimately a
pathological case, but it's worth considering.</quote> </p>

<p>But Marcus Brinkmann pointed out <quote who="Marcus Brinkmann">If
sending something over a socket causes the socket server to log a
message, then you have a serious problem anyway.</quote> </p>

<p>Jon commented: <quote who="Jon Arney">take an example look at
Linux's kernel logging.  Obviously a 'printk' inside a file 'write'
function is dangerous even in Linux.  However, the mechanism, while
flawed in the general case is still useful in a wide variety of
special cases. I hate to keep picking on Linux for my examples, but it
is a useful and instructive O/S.  I'd hate to throw the baby out with
the bathwater in terms of design approach.  </quote></p>

<p>With the Hurd, things are not so pathological, as Marcus pointed
out: <quote who="Marcus Brinkmann">The good thing is that we
[Hurd-users] can turn down the server in an emergency, while the rest
of the system is hopefully still up and running, and can at least
attempt a sane shutdown (this is what init does when it sees the death
of a critical system server).  That's why I think that logging once
and then dying is more useful than logging recursively (and eventual
die, with the disk being full or something).  </quote> </p>

 
<p>There was some unresolved discussion about whether it should also
be possible to let the use decide what should happen in such a
situation and give him at least the chance to get control over the
situation back.  </p> 

 
<p>In parsing (and starting a sub-thread), Marcus picked on the issue
of writing core files: <quote who="Marcus Brinkmann">another thing
that would certainly be useful to have is support for core files.  The
crash server works today, we just don't have a sensible core file
format, and a function that dumps such a core file (and another
function that allows gdb to read it back).  </quote> </p>

 
<p>Roland encouraged someone to implement this saying that it was not
very hard to do.</p>

<p>So Jon tried.  He had a look at the Linux ELF core writer and
thought that the header info could be written using proc_getprocinfo
and related functions but needed more info on how to get the user
register state and the VM of the process.</p>

<p>Marcus agreed that much core writing code could be shared but
warned: <quote who="Marcus Brinkmann">The main difference that comes
to mind is that we have native (kernel-level) threads, and the threads
state needs to be stored in the core file as well.  </quote> </p>

<p>Roland advised: </p>

<quote who="Roland McGrath">
 
<p>The proc server is not really much involved.  You need to
understand the structure of the system and what Mach tasks and threads
are about in some detail before attempting to work on this.  </p>

 
<p>Certainly writing an ELF core file is the right thing to do.  How
to store the memory is clear, and that part of the file format is just
the same as for Linux and other systems that use ELF core files.  You
can look at the `vminfo' program (utils/vminfo.c) to see how to use
vm_region to examine the address space of a task, and then you use
vm_read to get the data.  </p>

 
<p>The rest of the information about register state and so on is a
little less clear.  How we will store it is clear: in ELF notes as
other systems do.  But how many notes and what formats to use for them
is up to us.  We need to choose those note formats, and modify gdb to
understand them.  Mark Kettenis is the authority in the Hurd native
gdb port, and I will tend to defer to him on what these formats should
be.  </p>

 
<p>I suggest that you start out by just doing the memory dump and not
worrying about the rest at all (no notes).  Rather than hacking on the
crash server directly, it will be easiest to debug this in a little
standalone program that just uses pid2task to get the task port of a
process, suspends it, and dumps its memory as an ELF core file (i.e. a
minimal "gcore").  I might get around to whipping up the basic
memory-dumping code soon, and then you could see how it's done and
debug the code for me.  </p>
</quote> 

 
<p>The next day, Jon wrote back saying:</p> <quote who="Jon Arney"> 

<p>Hurah! </p>

<p>
   At long last, I have succeeded in getting a program to produce and
   read a core file under the Hurd. </p>

<p>This is not a done project, but I think enough is working to
upload some patches.  There are some known issues/limitations
which I have tried my best to document below.
 </p>

<p>I can't take much credit for this work at all since Roland McGrath has
done a lot of the heavy lifting.   I also want to say thanks
for the help he has given me and his patience with me.  I now have
a much greater appreciation for the chain of events following
a dereference of an invalid pointer. :)
 </p>

 </quote>

 
<p>Roland seemed pleased that Jon had been working on this and asked
him if he had tested the "gcore" command.  Jon said that
'generate-core-file' in GDB seemed to work. </p>

<p>Roland followed up with advice about Hurd hacking: <quote
who="Roland McGrath">

 
<p>As to the Hurd project in general, there is always more hacking to be done.
See the tasks and TODO files, and post here about items you are interested
in working on.  Marcus might have some opinions about priorities to be of
most immediate help.
 </p>
</quote>    </p>

 
<p>Finishing off, Roland said:</p>

<quote who="Roland McGrath">

 
 
<p>We now have core dumps!  Many thanks to Jon Arney for his hacking on this
that spurred me to write some new code, and for helping to debug it.
 </p>

 
<p>I have now added a sys/procfs.h header file installed by the hurd package
(it's in hurd/include/sys/procfs.h).  This defines the data structures used
in the note segment in ELF core files.  I've added new code to the crash
server to write ELF core files using these data structures.  The note
formats are patterned after those used by Solaris, for which the gdb code
to read the formats based on sys/procfs.h types already exists and handles
multiple threads.
 </p>

 
<p>An old gdb probably ought to be able to read one of these core dumps enough
to tell you about the memory regions with `info files'.  You can also use
objdump and elfdump to see that the contents look sane for the process you
dumped. </p>

 
<p>gdb should understand these core files every bit as well as it does on any
other platform if you make sure that the hurd's new sys/procfs.h is
installed (there are no other related header changes, so you can just copy
it into /include) and rebuild gdb with the following patches.  (These
patches are against the current gdb in cvs, but they should apply fine to
5.1 as well.)  Make sure you re-run configure you don't have an old
../config.cache file in your gdb build, since it will contain lies about
sys/procfs.h and its contents.
 </p>

 
<p>This should also be enough for gdb's "gcore" command to work.  It creates a
core file with less complete information than the crash server's core files
will contain, but it should be enough for gdb to read it back in again.
 </p>

</quote>  

</section>


<section
  title="New libstore "
  subject="roland_libstore_modules_branch"
  archive="http://mail.gnu.org/pipermail/bug-hurd/2002-March/006774.html"
  posts="8"
  startdate="05 Mar 2002 13:26:52 -0800"
  enddate="12 Mar 2002 01:17:31 -0800"
>
<topic>FS: ext2</topic>

<mention>Paul Emsley</mention>
<mention>Marcus Brinkmann</mention>

<p>Roland McGrath asked if anyone had tried
roland_libstore_modules_branch. Roland had added this (filesystem
library (used, for example by GNU Parted)) branch to the Hurd's CVS
last month, saying at the time:</p>

<quote who="Roland McGrath">

 
<p>For static linking, you have to make specific reference at link time to
each store class you want to have available by name at run time.  Any
module that you get linked in from libstore.a will automagically go into
the standard list (because each one has a `store_std_classes' section
pointing to the `struct store_class' it defines).  So e.g. if you call
`store_gunzip_open' or `store_gunzip_create' in your program, then the
"gunzip" store type will be available to store_open et al as well.  But
if you don't refer to a given type, it won't be linked in.  (In practice
you wind up always getting the "file" type and the pseudo-types like
"typed" and "query" just by virtue of calling store_open or
store_parsed_open.)  To get a desired set of types into a static link,
you can use `-lstore_TYPE' (i.e. `-lstore_gunzip' for the "gunzip" type
module).  That works because there is a libstore_TYPE.a installed for
each standard TYPE, which is actually a linker script with the same
effect as the linker switch `-u store_TYPE_class'.
 </p>

 
<p>When using the shared library (which is almost always the case), the
same link-reference plan applies.  However, this is extended to the
sections named `store_std_classes' not just of the executable but of
each shared object that's loaded.  Since libstore.so still contains all
the standard types, this is in the usual case just like the current
fixed array.  However, the functions that look for a store type by name
at run time will also try to load an external module using dlopen if
there is no existing type of that name.  It dlopen's
`libstore_foo.so.0.2' for the type "foo", and uses dlsym to look up
`store_foo_class'.  Note that once loaded for a successful open, the
module is never unloaded.  Thereafter, that module (and any other
modules you might have dlopen'd independent of libstore) get searched
for `store_std_classes' sections.  Because of this (and dlopen's own
redundancy detection), you could have a single libstore_foo.so.0.2
module that defined several classes and symlink together the different
names by which it might be first loaded.
 </p>

 
<p>It had been my original thinking to take all of the standard modules out
and make them individually loaded on demand.  But all the modules we
have are in fact so tiny that the extra overhead and memory wasted for
page alignment and redundant relocs would be just silly.  So I left all
the existing modules in the main library.  For static linking, this
means that if you call e.g. store_gunzip_open directly then you don't
need -lstore_gunzip -lstore, just -lstore.  In the shared library, it
means that existing types are in practice found in basically the same
way as before.
 </p>

 
<p>Obviously, the important thing that the new dynamic features make
possible is to have new store type implementation modules that are not
part of the hurd package itself and can be developed and maintained
separately.  For example, nbd could be moved out into a standalone
nbd_client package for parity with the Debian package for Linux (along
with a simple `nbd-client' shell script to give a work-alike interface).
In the case of nbd, that doesn't really buy anything because
libstore/nbd.c is simple and self-contained (but it would make sense if
e.g. the nbd protocol might change to get the client and server from
related interdependent packages instead of the client being part of the
basic hurd package).
 </p>

</quote>   
 
<p>Marcus Brinkmann had had a look at it showing that concatenating
files as a file sytem with seemed to work (and actally fixed a couple
of bugs).  This is what he did: </p>

<p>settrans -ac testimg.node ./ext2fs.static -T concat @file:testimg1@file:testimg2 </p>

<p>Where testimg1 and testimg2 are the two parts of a full filesystem
image - 
<editorialize who="Paul Emsley">cute, eh?</editorialize></p>

<p>Marcus did not test the dynamic loading functionality because
module.c was not included in the branch.  Roland fixed that but Marcus
has apparently not retried it.</p>

<p>Under the subject <a
href="http://mail.gnu.org/pipermail/bug-hurd/2002-March/006872.html">gunzip
store trouble identified</a> there was some discussion of the gunzip
store - but no apparent resolution yet.  </p>


</section>

<section
  title="argp limitation"
  subject="argp limitation"
  archive="http://mail.gnu.org/pipermail/bug-hurd/2002-March/006839.html"
  posts="15"
  startdate="10 Mar 2002 12:35:27 -0800"
  enddate="11 Mar 2002 00:49:06 -0800"
>

<mention>Thomas Bushnell</mention>

<p>First an introduction to argp seems appropriate.  Roland McGrath
had previously described it: <quote who="Roland McGrath">argp is a set
of library functions for parsing command line arguments in a highly
modular way.  It was originally written (by Miles Bader) for the Hurd,
and is used extensively in the Hurd libraries and programs.  However,
there is nothing really Hurd-specific about it.  It is now part of GNU
libc (2.1 and later, on all platforms) and no longer in the Hurd
sources.  See the header file argp.h, and there is a section in the
GNU libc manual that documents it. </quote></p>


 
<p>Marcus was considering using argp in is console server:</p>

 <quote
who="Marcus Brinkmann">

<p>I wanted to use a hierarchy of argp parsers in
the console server, one in each driver module, one for focus groups,
one for consoles, and one for the main program.  However, argp parents
and childs shade the options of other childs, in the sense that argp
calls only one parser for each recognized option.  It is not possible
to have common options in different parsers this way (I wanted to
enable/disable parsers in parent parsers).
 </p>

<p>Note: pfinet would have the same problem, except that it avoids it by lack
of modularity.  libstores would have the same problem except it avoids it by
lack of flexibility (all arguments for a store are encoded in the store
name).  I think both are not acceptable work arounds here, because drivers
are platform specific and complex.  Here is an example potential console
command line I have in mind:
 </p>

<p>/hurd/console --encoding isolat1 --console maincons --output-device vga
  --vt 1 --encoding utf8 
  --focus-group mainfocg --focus maincons --input-device pckbd --layout de
  --input-device mouse --name mouse1 --device /dev/ttyS0 --protocol mouseman
 </p>

<p>Or, for example, to remove the mouse:
fsysopts /node --remove mouse1 </p>
</quote>


 
<p>Roland quite liked the idea: <quote who="Roland McGrath">I think
your conclusions are right.  I don't think argp's are intended to be
used so you shadow options.  If you wanted to do that, you could call
another argp's parse_opt function from yours via the children
pointers.  That latter style is what I would have suggested to begin
with, I think--it's simple and clean for a parse_opt to just consume
as many arguments as it wants to rather than having argp somehow
involved in the control flow of what is conceptually a procedure call
at that point in the argument parsing anyway.  </quote> </p>


 
<p>Marcus was considering using a subset of the X server protocol to
handle the keyboard LEDs.</p>

<p>Roland got grumpy with Thomas Bushnell as they argued about
dynamically changing the console server options with a setopts
command.  The issue seemed unresolved.  </p> </section>



<section
  title="Memory manager issues"
  subject="memory_object_lock_request and memory_object_data_return fnord"
  archive="http://mail.gnu.org/pipermail/bug-hurd/2002-March/006915.html"
  posts="9"
  startdate="13 Mar 2002 17:27:03 -0800"
  enddate="13 Mar 2002 23:55:22 -0800"
>

<p>This was a conversation between Neil Walfield and Thomas Bushnell
on the subject of Mach and memory management.  Neil started with: </p>

<quote who="Neal Walfield">

<p>If a memory manager issues a memory_object_lock_request to gain
read/write access to a set of pages (i.e. evoking all of the kernel's
access rights), the kernel will eventually return any modified pages
in the range using the memory_object_data_return message and end the
sequence with a memory_object_lock_completed message.  This is similar
to normal eviction path.
 </p>

 
<p>In the latter case, the page is returned to the memory manager with
the expectation that it will immediately be written to the backing
store and freed.  The kernel cannot, however, be sure.  As such, it
changes the association of the page from the manager to the trusted
default pager.  This way, if the manager fails to free the page in a
timely fashion, the kernel can still flush it to swap.
 </p>

 
<p>Yet, what happens in the former case?  Specifically, how will the
returned pages be evicted?  Will the kernel send another
memory_object_data_return?  Unlikely.  It has nothing to return.  So,
the page will be evicted via the default pager.  What is the correct
way to give the management of the page back to the kernel?  Perhaps,
we could use the memory_object_data_supply message (and if we modified
it, we could supply it as precious).  _The Mach 3 Kernel Interfaces_
cautions against supplying data that has not been explicitly request,
however, it does not prohibit it.  And yet, what if milliseconds from
now we get another message to write to the page?  Again, we need to go
through the same song and dance -- request the page, write to it and
return it to the kernel.
 </p>
</quote>

 
<p>Thomas Bushnell wanted to know what Neil was trying to achieve and
replied: <quote who="Thomas Bushnell">Note that you have provided a
*clean* page back to the kernel (even if it was dirty when you got it
from the kernel); as a consequence, the kernel might delete the page
now, losing any changes that have been made So you must mark it
precious even if you didn't modify it.  In otherwords, the return of
the page to you, and you supplying it back, has cleared the dirty bit
for the page.</quote></p>

<p>The followed a quick exchange about when the dirty bit was set. </p> 



<p>As to his motivation, Neil replied:</p>


<quote who="Neil Walfield">

 
<p>I am trying to understand the motivation for having a relatively
complex interface to manage page ownership which, in the Hurd, we do
not use. </p>
 
<p>From what I can see, the pager_memcpy function can be extremely slow.
Just consider what happens when we just want to replace a page on disk
(which has not yet been read in to memory).  pager_memcpy causes a
page fault.  The kernel sends a message to the manager which reads the
page from disk (which is completely unnecessary), then, we write to
the page and eventually, it is flushed back to the disk.  This is even
worse if we are writing to multiple pages -- our thread and the
manager thread play ping-pong!  This could be avoided by acquiring as
much of the range up front as possible.
 </p>
</quote>


 
<p>Thomas replied: </p>

<quote who="Thomas Bushnell">


 
<p>Ah, the principal motivation is to allow, for example, a pager to
manage pages shared between many "kernels".  The reason to demand
pages back or require locking, in general, is so that you can hand
them up to other "kernels". </p>
 
<p>In principle, we need to do this already!  The glaringest security
issue with the Hurd right now is the assumption that all users will
just take their pagers and hand them to the kernel with vm_map.  But
they might play as "kernels" themselves.  To deal with this, the
pagers need to be able to deal with multiple "kernels", and also have
strategies for dealing with recalcitrant "kernels" that aren't
behaving properly.
 </p>
 
<p>Re: pager_memcpy: What's supposed to be going on behind the scenes
is that the kernel should detect that you are faulting the pages in
sequentially, and ask for pages from the pager ahead of time,
optimizing the sequential access case. </p>
</quote>

<p>Neil replied: </p>
<quote who="Neil Walfield">
 
<p>But the manager knows; why force the kernel to guess?  Say we send:
 </p>

 
<p>        io_read (file, data, vm_page_size * 4, 0, &amp;amount)
 </p>

 
<p>
By the time the kernel detects a sequential read, it is already too
late to be of any use. </p>
</quote>

<p>This, according to Thomas is an old Mach debate. </p>


<p> Correcting Neil, he said that the manager doesn't actually know
any better than the kernel: <quote who="Thomas Bushnell">Only the
*user* knows what the access pattern is.  (Even if the "user" resides
in the same task as the pager.)  </quote>. Thomas suggested future
plans to allow the user to declare things about memory regions mapped
in their address space. </p>


</section>




</kc>
