Kernel Traffic #269 For 19 Jul 2004

By Zack Brown

Table Of Contents

Mailing List Stats For This Week

We looked at 1276 posts in 7728K.

There were 379 different contributors. 196 posted more than once. 149 posted last week too.

The top posters of the week were:

1. Status Of UML Inclusion In 2.6; Quilt Patch Tool

26 Jun 2004 - 3 Jul 2004 (16 posts) Archive Link: "Inclusion of UML in 2.6.8"

Topics: User-Mode Linux, Version Control

People: Paolo GiarrussoAndrew MortonJeff DikePaul JacksonAndreas Gruenbacher

Paolo Giarrusso asked:

what are the requisite for stable inclusion of the UML update inside 2.6-mm (or directly 2.6.8)? Currently (splitting out a little piece, which should not be included) we have almost all the stuff inside arch/um and include/asm-um, the addition of <linux/ghash.h> and of two filesystems for UML use only, and this little hunk (plus 2 uses of it inside mm/page_alloc.c).

+#ifndef HAVE_ARCH_FREE_PAGE
+static inline void arch_free_page(struct page *page, int order) { }
+#endif

Could it go in as-is? I'm especially worried about having it included soon in 2.6.8, since last time it entered -mm and stayed there just for one release.

The patch correctly applies to 2.6.7 and works; the current code, instead, does not even compile at all, so there is no reason for not applying it (unless you want to remove UML support / but since you never said this, we need this patch applied). However, if you don't want some parts of the code, just tell me; I'm waiting for this before preparing the UML patch to send you

Also, I have some patches managed with your patch-scripts, which I'll send you after you include the UML patch.

About the STATE of the code:

Of the two filesystems, one (hostfs) now should work perfectly with 2.6 (I've just fixed one porting bug to 2.6, related to the force_delete() -> .drop_inode change documented in Documentation/filesystems/vfs.txt); the other maybe has some problems, but I can remove it from the patch (it also will probably be replaced soon by a more generic one, i.e. externfs).

Andrew Morton replied:

I have no problem plopping it into -mm, as long as it doesn't cause me too much pain. It did cause patch management pain last time, but probably whatever is was interacting with has now been merged up so it'll be OK.

But for a merge into mainline we do need to get down and do some work on it - reintroducing ghash.h would not be welcome (I though Jeff was going to eliminate that?) and last time we looked the patch had some blockdev drivers in it which were doing antiquated 2.4 things.

Generally, UML in 2.6 seems to have fallen behind fairly seriously and at some stage we need to go through the exercise of splitting the patch up, reviewing and fixing all the bits and feeding it in.

Jeff Dike replied:

Yup. I've come to the conclusion that I've painted myself into a corner a bit with BK and my currently style of working. I'm looking at quilt, and I'm pondering taking all the changes since the last time Linus merged UML (2.5.69 or something), and breaking them up into sensible patches.

That'll be a lot of work, but I think it's something that needs doing.

The discussion at this point skewed off into a consideration of quilt (http://savannah.nongnu.org/projects/quilt) . Paul Jackson said:

Good tool.

It's a bit like a loaded gun with no safety. You will learn a few new ways to shoot your foot off, and become good at first aid. You will want someway to keep personal revision history of your patches, to aid in such repair work. CVS or RCS or local bitkeeper or (for ancient hackers like me) raw SCCS or some such. Quilt handles the patches, but in and of itself has nothing to do with preserving history.

All software is divided into two parts - the concrete and the fluid.

Once something is accepted into the main kernel, it's concrete. You can never go back - you can only layer fixes on top. Bitkeeper rules for this stuff.

But work in progress, for which oneself is still the primary source, is fluid. You can slice and dice and redo it, and indeed you want to, to get the best patch set. Quilt and friends rule for this stuff.

Conclusion - use Quilt (with your favorite personal version control) on top of Bitkeeper.

Question - what tools are available for convenient patch set submission? Composing multiple, related email sets in a GUI emailer is a bit tedious and error prone. It's an obvious candidate for scripting.

Andrew liked Paul's description, adding, "quilt is a grown-up version of patch-scripts, and is tailored to what I do, and to what distributors do: maintain a series of diffs against a monolithic tree which someone else maintains." He added, "I use patch-scripts+CVS in the way which you describe. patch-scripts has the "patch-bomb" script, which would presumably work OK for quilt - it would need a little tweaking. http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.18/" Andreas Gruenbacher said, "Ideas for improvement are always welcome -- they would best be discussed on http://lists.nongnu.org/mailman/listinfo/quilt-dev." He added:

The concepts behind quilt are all stolen from patch-scripts, so it has the same usability problem that patch-scripts has: forgetting to add a file to a patch before modifying it is painful. the ``quilt edit'' command helps somewhat. I do not have a good idea how to fix this in a more satisfactory way.

Quilt is missing some of the features of patch-scripts: there are no equivalents to export_patch, which renames exported patches so that the filename sort order equals the order of the patches in the series file. Neither is there a way to strip such sequencing prefixes when importing patches. (I consider this obsolete.) There is nothing kernel specific, and nothing specific to version control systems. Also there are no equiovalents to patch-scripts's new-kernel, mv-patch, patch-bomb, pstatus, rename-patch, tag-series, unitdiff.py commands.

On the other hand there are lots of small improvements, no more patch control files (that list the files a patch touches in patch-scripts), improved diffing and status inquiry functionality, patch dependency analysis, support for RPM packages. And there is more documentation.

Things I'm currently considering include:

All of the above things will potentially conflict with the goal of keeping the whole thing as policy-free and generally useful as possible.

2. Adeos Ported To ia64/SMP

28 Jun 2004 - 4 Jul 2004 (4 posts) Archive Link: "[ANNOUNCE] HYADES (ITEA) project -- Adeos/ia64"

Topics: Microkernels: Adeos, Real-Time: RTAI, SMP

People: Philippe GerumFrancois Romieu

Philippe Gerum said:

At http://www.hyades-itea.org you will find a port of Adeos for Linux 2.6 to the ia64/SMP architecture, currently running on Bull's Novascale systems.

The main objective of the EU-funded (ITEA) HYADES project is to adapt standard technologies for applications that require real-time response, associated with heavy, parallel computations.

HYADES is a partnership led by the Thales Group, composed of various organizations interested in deterministic intensive computing on the Linux/ia64 platform, among which are Bull, MandrakeSoft and Dolphin. The contribution of the HYADES project to the community will extend beyond the port of Adeos, by adapting the RTAI/fusion experimental technology to fit their needs on the ia64/SMP architecture.

The Adeos/ia64 patch and others can also be found at the usual place: http://download.gna.org/adeos/patches/

Francois Romieu suggested splitting the patch up into small chunks, and adhering to the CodingStyle documentation.

3. Dealing With Odd Intel Behavior

28 Jun 2004 - 1 Jul 2004 (18 posts) Archive Link: "[RFC PATCH] x86 single-step (TF) vs system calls & traps"

Topics: BSD: NetBSD, Bug Tracking

People: Roland McGrathLinus TorvaldsAndrew MortonDavide LibenziDaniel Jacobowitz

Roland McGrath said:

Andrew Cagney discovered this problem while working on GDB. I suspect this bug has always been there, but I've only actually tested current 2.6 kernels.

When you single-step into a trap instruction, you actually don't get a SIGTRAP until the instruction after the trap instruction has also executed. I have demonstrated this in three cases: `into' generating a SIGSEGV that is suppressed via ptrace; an `int $0x80' system call entry; and a `sysenter' system call entry via the vsyscall entry point.

In https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=126699 you can find a working test program and full details on reproducing the problem using gdb.

Roland went on to add, "From reading the code and the x86 specs on traps, it makes sense why it happens. The trap flag causes a single-step trap after the execution of the instruction that sets the trap flag. For instructions that generate their own traps, TF is cleared on the way into the kernel, and it's the normal iret that is restoring the flag on the way back to user mode. As advertised, that executes the next instruction, i.e. whatever the restored user PC is at, and then traps. But from the userland perspective, this is highly unexpected: the user executed one instruction like 'into' or 'int' or `sysenter', and expects execution to stop after that "instruction" is done. To the user, everything the kernel does in response to the trap is part of the execution of that one instruction." Linus Torvalds replied:

This is documented Intel behaviour. It also guarantees that there is forward progress in some strange circumstances, if I remember correctly.

And I refuse to make the fast-path slower just because of this. Not only has Linux always worked like this, as far as I know all other x86 OS's also tend to just do the Intel behaviour thing.

Roland confirmed that NetBSD 1.6.1 did share Linux's behavior, and Linus remarked, "I bet that if you really search, you cna probably find _some_ OS out there that considered the Intel behaviour a bug, and fixed it with something like your patch. But I bet it's not just Linux and BSD that use the Intel behaviour, just because it's such a pain _not_ to." He reiterated that he just wouldn't add any fix that affected the fast-path of the kernel. Andrew Morton replied, "Davide" [Libenzi] "'s patch (which has been in -mm for 6-7 weeks) doesn't add fastpath overhead." He posted the patch, but Roland had issues. He said Davide's patch was a bit obfuscated, and didn't seem to handle user-mode setting TF properly. Davide Libenzi replied, "I don't think (pretty sure actually ;) we can handle the case where TF is set from userspace and, at the same time, the user uses PTRACE_SINGLESTEP. The ptrace infrastructure uses the hw TF flag to work. The PTRACE_SINGLESTEP gives you the SYSGOOD behaviour, if you set it. And sends a SIGTRAP notification to the ptrace'ing parent process." Roland didn't like this at all. He said, "that is a change in the behavior. Since its inception, SYSGOOD has meant exactly and only that when you use PTRACE_SYSCALL you will get a different notification for a syscall-tracing stop than other sources of SIGTRAP that may arise during execution of user code between system calls. At no time ever before, has it been possible to get the SIGTRAP|0x80 wait result when you had not just called PTRACE_SYSCALL. After your change, calling PTRACE_SINGLESTEP can now produce that result. I don't think that change is a good thing." He suggested that Daniel Jacobowitz, whom he thought had originated SYSGOOD, might have an opinion on the matter. Daniel replied that he had certainly not originated SYSGOOD, although he had done some work on it. But he did say, "I think reporting the system call using 0x80|SIGTRAP when you PTRACE_SINGLESTEP over the trap instruction makes excellent good sense." Roland apologized for the misattribution, and said:

If you are not concerned about existing users of PTRACE_O_TRACESYSGOOD calling PTRACE_SINGLESTEP and then being confused, then I have no objection. I consider you to be the authority on any such users there might be.

In that case, I'm happy to endorse Davide's original patch. I will look into extending it to cover x86-64's ia32 support as well.

I still wonder if anyone has any insight into why this issue does not arise for native x86-64's syscall/sysret. From my reading of the AMD64 manual, I would expect it to happen there as well. That is, sysret is the instruction that sets TF, and the manual says that the instruction after the one that sets TF gets executed before the trap. It would be convenient if sysret were a special case for this rule, since it makes it do what is best for the system call case. But I haven't found a mention of that in the manual.

Daniel said he also supported Davide's original patch.

4. Status Of Linux Trace Toolkit (LTT) In 2.6

30 Jun 2004 - 6 Jul 2004 (11 posts) Archive Link: "[PATCH] IA64 audit support"

Topics: Ottawa Linux Symposium, Small Systems, User-Mode Linux

People: Karim YaghmourAndrew MortonRobert Love

Peter Martuccelli submitted a patch to the 'audit' subsystem, which Andrew Morton accepted, leading to this discussion. Karim Yaghmour said:

I, and quite a few other folks, have been trying to get the Linux Trace Toolkit in the kernel for the past 5 years and the code being added is almost identical to what the audit patch adds, yet we've always got reponses such "this is bloated" and Linus told us that he didn't see the use of this kind of stuff.

Have we simply not figured out the secret handshake?

I'd really like to have some advice here since I believe we have tried every trick in the book: posting the patches for review, asking kernel developers for input, porting the patches to multiple architectures, modulirizing the system, etc.

Andrew replied that the audit code was much less intrusive than LTT. He posted lists of files modified by both projects, showing much higher intrusiveness by LTT, and saying that LTT "adds hooks all over the place." . He went on:

The security code adds hooks everywhere too, but those deliver end-user functionality rather than being purely a developer support tool.

Developer support tools are good, but are not as persuasive as end-user features. Because the audience is smaller, and developers know how to apply patches and rebuild stuff.

Regarding the 'secret handshake', Andrew said, "It's a balance between (ongoing maintenance cost multiplied by the number of impacted developers) versus (additional functionality multiplied by the number of users who benefit from it). To my mind, LTT (and kgdb and various other developer-support things) don't offer good ratios here." He suggested that if LTT "could use kprobes hooks that'd be neat. kprobes is low-impact."

Karim replied on several levels, first of all arguing that the impact of the patch was not as large as Andrew thought, and that an examination of what it actually did would show that the changes and hooks, etc., were in many cases just simple one-liners. But Karim took great exception to Andrew's characterization of LTT as a developer-targeted tool. He said:

This is probably one of the biggest misconception about LTT amongst kernel developers. So let me present this once more: LTT is _NOT_ for kernel developers, it has never been developed with this crowd in mind. LTT is and has _ALWAYS_ been intended for the end user.

The fact of the matter is that the events recorded by LTT are far too little in detail to help in any sort of kernel debugging. Don't take my word for it: I met Marcelo at OLS once and he recounted attempting to use LTT to track things in the kernel and how he found it NOT to be good enough for what he was doing. Ditto with Andrea.

How is this tool useful for the end user? Here's an excerpt from an e-mail I sent to Andrea and a few other SuSE folks explaining this some time ago:

What LTT is really good at, however, is to provide non-kernel gurus with an understanding of kernel dynamics. It is not reasonable to expect that every sysadmin will understand exactly how the kernel behaves and then rely on ktrace to isolate a problem (as I expect most kernel developers to be able to do). On the other hand, it is quite reasonable to expect sysadmins to be able to fire-up a tool which gives them a good idea of what's going on in a system. This may not help them find kernel bugs, but it will most certainly help them track down transient performance problems, and all other kernel- behavior-related bugs which are simply invisible to /proc, ps, and their friends.

The same goes for developers tracking synchronization problems. gdb won't help, strace won't help, etc. because they rely on ptrace() which itself modifies application behavior ... same applies to printf() etc.etc.etc. There's an entire category of problems for which current user-space tools are not adapted for and kernel debugging tools (ktrace including) are simply overkill.

Generally speaking, there isn't a single tool out there that currently exists that enables any end-user to understand the complex dynamic behavior between the Linux kernel, his applications and the outside world. And as you personally noted in the forward to Robert Love's book, the kernel is only getting more complicated. Using the trace points added by the LTT patch, the user-space utilities can provide a wealth of information to the end-user that he cannot possibly collect in any other way.

Karim offered some sample output to demonstrate this, and went on:

As you can see, the granularity of the details is not refined enough for any sort of kernel debugging, yet it is clear that an end-user or an application developer can benefit immensly from such information. Given the ever increasing complexity of the kernel, the ever increasing number of applications run on servers and workstations, and the ever increasing use of Linux in time-sensitive applications such as embedded systems, it seems to me that this type of capability is no less necessary then ptrace().

I'll conceed that LTT may be of some benefit for some driver developers in some cases and that it may help consolidate the slew of tracing mechanisms already included in the kernel as part of various drivers and subsystems, but the fact of the matter is that it is of little use for kernel developers. If a kernel developer needs tracing, he should be using ktrace.

Andrew said that Karim hadn't understood his statement, and that Andrew had been referring to all developers, not just kernel developers. He had intended, Andrew clarified, to say that LTT was a tool for developers of various stripe, not only kernel folks, and that it was this that made Andrew conclude that, as a developer support tool, LTT would have only a limited audience, and that this audience would be sufficiently skilled to apply the patches themselves if they wanted to.

Karim replied, "If features such as UML, oprofile, audit, security hooks, vserver, etc., that are targeted at the same category of users that LTT is targeted at can make it in the kernel, then I have a hard time understanding how there could be any justification for refusing LTT's inclusion simply on the basis that it doesn't benefit the least computer-literate of Linux users." But there was no further discussion.

5. Linux 2.6.7-mm5 Released

30 Jun 2004 - 5 Jul 2004 (15 posts) Archive Link: "2.6.7-mm5"

Topics: Disks: IDE, Kernel Release Announcement, Profiling, Version Control

People: Andrew Morton

Andrew Morton announced Linux 2.6.7-mm5, saying:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.7/2.6.7-mm5/

6. Rule Set Based Access Control (RSBAC) v1.2.3 Released

2 Jul 2004 (1 post) Archive Link: "Announce: RSBAC v1.2.3 released"

Topics: Access Control Lists

People: Amon Ott

Amon Ott said:

Rule Set Based Access Control (RSBAC) v1.2.3 has been released! Full information and downloads are available from http://www.rsbac.org

We are also proud to announce the relaunch of our Website and a set of worldwide mirrors.

RSBAC Key Features:

Between the first upload and this announcement, the first important security bugfixes had to be released, too, which also apply to previous versions. You can always find the latest bugfixes at http://www.rsbac.org/download/bugfixes, they are already included in some of the pre-patched kernel sources (-bfX) at http://www.rsbac.org/download/kernels/v1.2.3/

New features in RSBAC v1.2.3:

General
  • Port to 2.6 kernel series with many internal changes
  • Full log separation between system and RSBAC log
  • Improved hiding of unaccessible processes
AUTH
  • Learning mode, global and per-process
RC
  • System boot role, now separate from root's role
  • Extra process type for kernel threads for explicit access control
  • Types for user objects
DAZ
  • New 100% compatible Dazuko (www.dazuko.org (http://www.dazuko.org/) ) module
  • On-access scanning through user space antivirus daemons
  • In-kernel scanning result cache, speeding it all up significantly
ACL
  • Global learning mode
PAX
  • New PaX support module
JAIL
  • Several security related and other bugfixes (it is strongly recommended to update)
  • Linux capability restrictions for jailed processes
MAC
  • Trusted-for-user list instead of single value

Please forward this announcement to where you think it is applicable, e.g. local or national security lists, newspapers or magazines, or your favourite Internet forum.

7. Linux 2.4.27-rc3 Released

3 Jul 2004 (1 post) Archive Link: "Linux 2.4.27-rc3"

People: Marcelo Tosatti

Marcelo Tosatti announced Linux 2.4.27-rc3, saying:

the number of changes this time is pretty small.

It includes network update from davem, PPC fcc enet driver fix, and most importantly, the missing chown() security checks which allowed users to change the group affiliation of arbitrary files on the system.

8. Linux Kernel State Tracer (LSKT) Version 2.1.0 Released For Linux 2.6.6

4 Jul 2004 (1 post) Archive Link: "[ANNOUNCE] LKST 2.1.0 for linux-2.6.6 is released."

People: Masami Hiramatsu

Masami Hiramatsu said:

We are pleased to announce releasing new version of Linux Kernel State Tracer.

The Linux Kernel State Tracer(a.k.a. LKST) version 2.1.0 has been released. This version can be applied to linux-2.6.6. And platforms of LKST 2.1.0 are both IA32 and IA64.

LKST is a tool that supports to analyze of fault and evaluate for kernel. Especially it is usuful for analyzing the unanticipated fault of kernal.

The latest version of the LKST has new features:

For more changes, see Changelog-2.1.0.txt <http://prdownloads.sourceforge.net/lkst/Changelog-2.1.0.txt?do wnload (http://prdownloads.sourceforge.net/lkst/Changelog-2.1.0.txt?download) >

Remarks
For supporting IA64, we hacked KernelHooks.

LKST binaries, source code and documents are available in the following site,
https://sourceforge.net/projects/lkst/
http://sourceforge.jp/projects/lkst/

We prepared a mailing list written below in order to let users know update of LKST.

lkst-users@lists.sourceforge.net (mailto:lkst-users@lists.sourceforge.net)
lkst-users@lists.sourceforge.jp (mailto:lkst-users@lists.sourceforge.jp)

To subscribe, please refer following URL,

http://lists.sourceforge.net/lists/listinfo/lkst-users http://lists.sourceforge.jp/mailman/listinfo/lkst-users

And if you have any comments, please send to the above list, or to another mailing-list written below.

lkst-develop@lists.sourceforge.net (mailto:lkst-develop@lists.sourceforge.net)
lkst-develop@lists.sourceforge.jp (mailto:lkst-develop@lists.sourceforge.jp)

9. dmraid (Device-Mapper RAID Tool) 1.0.0-rc1 Released

6 Jul 2004 (1 post) Archive Link: "*** Announcement: dmraid 1.0.0-rc1 available at http://people.redhat.com:~heinzm/sw/dmraid"

Topics: Disk Arrays: RAID

People: Heinz Mauelshagen

Heinz Mauelshagen said:

dmraid 1.0.0-rc1 available at http://people.redhat.com:/~heinzm/sw/dmraid/ in source and i386 rpm.

dmraid (Device-Mapper Raid tool) discovers, [de]activates and displays properties of software RAID sets (ie. ATARAID) and contained MSDOS partitions using the device-mapper runtime of the 2.6 kernel.

The following ATARAID types are supported on Linux 2.6:

Highpoint HPT37X
Highpoint HPT45X
Promise FastTrack
Silicon Image Medley

These ATARAID types can be discovered only in this version:
Intel Software RAID
LSI Logic MegaRAID

Please provide insight to support those metadata formats completely.

Thanks.

See file README, which comes with the source tarball for prerequisites to run this software and further instructions on installing and using dmraid!

Call for testers:

I need testers with the above ATARAID types, to check that the mapping created by this tool is correct (see options "-t -ay") and access to the ATARAID data is proper.

You can activate your ATARAID sets without danger of overwriting your metadata, because dmraid accesses it read-only unless you use option -E with -r in order to erase ATARAID metadata (see 'man dmraid')!

This is a release candidate version so you want to have backups of your valuable data *and* you want to test accessing your data read-only first in order to make sure that the mapping is correct before you go for read-write access.

The author is reachable at <Mauelshagen@RedHat.com>.

For test results, mapping information, discussions, questions, patches, enhancement requests, free beer offers and the like, please subscribe and mail to <ataraid@redhat.com>.

10. Linux Test Project (LTP) Version 20040707 Released

7 Jul 2004 (1 post) Archive Link: "[ANNOUNCE] July release of LTP now available"

Topics: FS: ext3, POSIX

People: Marty Ridgeway

Marty Ridgeway of IBM announced the July release of the Linux Test Project, saying:

LTP-20040707

 

 

 

 

 

 

Sharon And Joy
 

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.