Kernel Traffic #50 For 10�Jan�2000

By Zack Brown

Table Of Contents


Many thanks go out to Jan Evert van Grootheest, who found a serious bug in my html templates, that would make unfollowed links and followed links the same color for anyone who turns stylesheets off in their browser preferences. Thanks, Jan!

Mailing List Stats For This Week

We looked at 693 posts in 3149K.

There were 287 different contributors. 122 posted more than once. 103 posted last week too.

The top posters of the week were:

1. Intel Clams Up On Ether Express Pro (i960) Spec

23�Dec�1999�-�27�Dec�1999 (4 posts) Archive Link: "Intel Ether Express Pro (i960) in 2.2.x??"

People: Victor Khimenko,�Alan Cox

Matthew Clark had an Intel Ether Express Pro 10/100 (based on the i960 chip), and wanted to use it under 2.2.x; Victor Khimenko replied, "AFAIK such driver simple does not exist. All versions of EEpro 100 are supported except high end ones with i960. It's just really different card with [almost] same name as normal EEPro 100 :-/" and Alan Cox added, "All attempts to get info out of Intel failed, and given their obfuscated gige card driver I dont expect this to change."

2. Writing To NTFS Partitions

22�Dec�1999�-�28�Dec�1999 (4 posts) Archive Link: "NTFS Write Code ?"

Topics: BSD, FS: NTFS

People: Steve Dodd

Stephane Dudzinski asked when the code for writing to NTFS partitions would be integrated into the stable tree, and someone replied that the code had not been developed during 2.3.x, so would probably not make it into 2.4; Daniel Silverstone replied that Steve Dodd had been working on it, but had been busy with his money job. Steve replied:

Yes, the (just about) paid job is sucking all of my time away at the moment. Martin von L. is still officially the NTFS maintainer, but I've heard virtually nothing of/from him for months. If I do anything, it'll probably a complete rewrite, or perhaps a port of the Free(?)BSD driver. I'll also need to have a rummage around and find out what's changed in the VFS/mm systems since I last looked. Anyone want to pay me to hack filesystems? <g>

Whether NTFS {should,will have to} be dropped from 2.4 is a decision for somebody else. If I find time I'll poke at it and see what needs doing.

End Of Thread.

3. Kernel-Based Windowing For Embedded Systems; License Debate

23�Dec�1999�-�30�Dec�1999 (43 posts) Archive Link: "Announce: DinX windowing system 0.2.0"

Topics: Framebuffer, Small Systems

People: Ben Williamson,�nofirstname nolastname,�Victor Khimenko,�Mike A. Harris,�Alan Cox,�Alex Buell,�Pavel Machek,�James Simmons

Ben Williamson announced version 0.2.0 of the DinX ("DinX is not X") graphical windowing kernel module. The license was "MPL with GPL option". He gave a pointer to the homepage ( and a download site ( . He described DinX:

DinX is an experimental windowing system that performs clipping and drawing inside Linux kernel modules. This eliminates much context switching between clients and the server, and makes the code small, simple and fast. It is aimed at small systems like Linux handhelds.

The first public release includes draggable windows and a simple image viewer. All clipping and drawing operations are working properly. The window server program is under development, to provide a more complete set of events.

He asked for feedback, adding that since it was his first kernel module, he'd like to hear if he made any horrible design mistakes, etc.

James Simmons said this would be great for embedded systems, and Pavel Machek asked about possible performance gain, and the impact on kernel size. Alex Buell reassured him that it was very small. He added that X apps could be compiled directly for DinX without any source modifications.

Someone else also replied to Pavel, saying that the compiled module ran about 14K. He or she went on:

Performance currently resembles the old manual etch-a-sketch devices, but that is being worked on. Ultimately performance should be virtually the same as using a framebuffer directly.

It doesn't add a "window manager" to the frame buffer or anything like that. Think of it instead as a framebuffer multiplexer; if written to use dinx, multiple framebuffer apps can coexist happily. Dinx does not provide goodies like internal windows, sprites, transparency/translucency, etc. Those have to be done in userspace.

Ben replied:

Just to clarify, performance is currently horrible on PC hardware because we read (memmove) from the framebuffer a lot when dragging windows around. And it seems PC hardware does this really slowly.

DinX is designed with small systems in mind, many of which have the video buffer in main DRAM, and no 2D accel. (I'm thinking of my favourite ARM parts, like the ARM7500, CL-PS7110, EP7211 etc.) On these, reading the buffer is as fast as any other memory, so DinX should work fine the way it is now.

Regarding DinX as a framebuffer multiplexer, Ben went on:

Right. The idea is just to take clipping and blitting out of the server process and put them in the kernel, to avoid lots of context switches and big complex buffering code.

Something else I should mention: The clipping code performs no memory allocations, so drawing keeps off the heap. When an obscuring window splits a rectangular blit into two rectangles, the routine recurses for each, so the complexity of the visible area goes on the stack. Now, how much stack space does the kernel have? :)

Ben had apparently thought that the stack was large under Linux. Victor Khimenko pointed out, "Kernel stack is TINY. On iX86 it's only 7KiB or so. And when you'll overflow if you'll corrupt you system badly, BTW."

Ben hit himself on the head and started studying furiously. After some further discussion, including private email, Ben said, "fixed in the new version, the clipping algorithm now uses a stack structure allocated with kmalloc. Thanks to everyone who provided advice on this. I now understand very clearly why a dynamically growing stack area would make dealing with races awful tough. :)"

Pavel, in his original reply to Ben's announcement, also pointed out that the "MPL with GPL option" meant that the code could not be compiled into the kernel, which was what most embedded systems would want. Victor asked why this would be the case, since the user could optionally use the GPL as the license. Mike A. Harris replied that users could compile the code into the kernel if and only if they chose to use the GPL option. If they chose the MPL, they wouldn't be able to link the code into the kernel in any way. Victor replied that they'd be able to use it as a loadable module, but Mike replied, "If it modifies ANY existing kernel source, it would be in violation of GPL regardless of if it is linked monolithically or modularly." Several folks pointed out that as DinX was a module, it didn't modify existing kernel source, and could thus be used as a loadable module under non-GPL-compatible licenses.

There was a bit of debate, and at a certain point, Ben explained:

The intention of using the MPL with GPL option is that anyone making a Linux kernel distribution can take whatever files they need from DinX, replace the MPL notice with the GPL notice, and put them in the distribution, perhaps to be statically linked. This is my understanding of what the Netscape lawyers meant when they wrote:

Alternatively, the contents of this file may be used under the terms of the GNU General Public license (the "[GPL] License"), in which case the provisions of [GPL] License are applicable instead of those above. If you wish to allow use of your version of this file only under the terms of the [GPL] License and not to allow others to use your version of this file under the MPL, indicate your decision by deleting the provisions above and replace them with the notice and other provisions required by the [GPL] License. If you do not delete the provisions above, a recipient may use your version of this file under either the MPL or the [GPL] License.

I'm sorry you seem to be annoyed by my choice of the MPL. I did read it carefully and gave it plenty of thought before making that choice. I chose the MPL because it says what I want to say better than I could have said it.

If folks on the linux-kernel list are of the general opinion that I'm completely wrong and that the MPL/GPL would prohibit the DinX kernel modules from ever becoming part of a statically linked kernel distribution, please let me know asap so that we can resolve this while the contributor list is still short. Thanks.

This made sense to Alan Cox, who replied:

It seems completely sane to me. The only advice I would give is to ask people to always provide their modifications clearly under the dual license to avoid any questions/mess.

Using the MPL makes it easier for people to use the driver in non Linux OS's, and if thats what you want and intend its very cool.

4. Recoding Floating Point Emulation Routines

23�Dec�1999�-�24�Dec�1999 (4 posts) Archive Link: "Linux Kernel Floating Point Emulation and CORDIC"

People: Arthur Jerijian,�Matthew Wilcox,�Brian Gerst

Arthur Jerijian gave a link to CORDIC (COordinate Rotation DIgital Computer) ( by Ingo Cyliax, and said, "I have taken a look at the source code of the Linux Kernel floating point emulation engine for i386 (as of 2.2.12, don't know if it changed in 2.3.x). I noticed that it uses Taylor/Maclaurin polynomials to approximate the sine, cosine, tangent, and inverse tangent functions. Wouldn't CORDIC be a better algorithm for computing trigonometric and exponential functions instead? CORDIC is a method for calculating mathematical functions using only addition, shifting, and looking up entries in a table." Brian Gerst replied that he didn't think it was worth it to recode all the floating point routines, since they were only needed on 386s and some 486s, and no one was likely to be doing serious number-crunching on such old machines. Matthew Wilcox replied that there were many non-Intel machines that would benefit from better FP routines. But he added, "However, I seem to remember someone analysing the FP emulator (on ARM) and concluding that the cost of emulating FP instructions was dominated by the trap handler and instruction decode. So there's not much point in replacing the algorithms with more efficient algorithms when it's not going to make much difference."

5. Strace Hole

25�Dec�1999�-�31�Dec�1999 (10 posts) Archive Link: "strace can lie"

People: Pavel Machek,�Mike Coleman

Pavel Machek reported:

When you see snippet from strace, that says:

open("/etc/passwd/index.html", O_RDONLY) = 3

Do you trust it? You should not. Malicious program could open _any_ file on filesystem with this syscall.

He posted some code to do just that:

char *c = 0x94000000;
open( "/tmp/delme/index.html", O_RDWR );
mmap( c, 4096, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, 3, 0);
*c = 0;
if (fork()) {
while(1) {
strcpy( c, "/public/index.html" );
strcpy( c, "/secret/index.html" );
} else
while (1)
open( c, 0 );

He added:

Depending on races, /public or /secret is printed with strace. This can be reproduced even on UP and easiest way to do so is to make /public file readable, and then look if you get

[pid 224] open("/public/index.html", O_RDONLY) = 718
[pid 224] open("/secret/index.html", O_RDONLY) = 719
[pid 224] open("/public/index.html", O_RDONLY) = 720

snippet like this. It is impossible for kernel to open non-existent file; that means that strace printed something that did not actually happen.

Any ideas how to get rid of this problem? It is nasty. It is very nasty and makes strace unusable for anything security-sensitive.

Mike Coleman replied, "Yes, this is a problem if you're trying to be secure. Anything that allows memory contents to change while a process is stopped is trouble." He suggested some possible solutions, but added, "One subtle problem with eliminating the problem race is that it potentially provides a method for the program to discover that it is being traced ("No races? I must be being traced."). Ideally there would be no way for the program to tell."

6. Unexecutable Stack

27�Dec�1999�-�30�Dec�1999 (55 posts) Archive Link: "Unexecutable stack"

People: Victor Khimenko,�Richard B. Johnson,�John Alvord

Mike Karmyshev asked if Solar Designer's patch would make it into the standard kernel as a security feature, since it appeared to have no overhead. In particular he asked about the "secure stack" feature, that would prevent any code in the stack from being executed. Victor Khimenko replied, "Last time when this question was raised was more then year ago (if I recall correctly) and Linus said that his feeling about unexecutable stack is that it does not make exploits impossible but insted give you false sense of safety."

A long discussion ensued. As some folks pointed out, the subject of a secure stack has come up many times before. At one point, Richard B. Johnson said, "The notion of a secure stack implies that you get some kind of security by making the stack non-executable. This theory has, to the best on my knowledge, never been shown to have merit, much less proof."

In the course of discussion, it was pointed out that the unexecutable stack was not fully secure, though it did have security benefits against buffer-overrun attacks. However, it also either broke or made more difficult, existing "trampoline" code. Trampoline code, according to the Free Online Dictionary of Computing, is:

An incredibly hairy technique, found in some HLL and program-overlay implementations (e.g. on the Macintosh), that involves on-the-fly generation of small executable (and, likely as not, self-modifying) code objects to do indirection between code sections. These pieces of live data are called "trampolines". Trampolines are notoriously difficult to understand in action; in fact, it is said by those who use this term that the trampoline that doesn't bend your brain is not the true trampoline.

At one point, John Alvord pointed out, "Linus was talking about methods of eliminating the need for trampolining a few weeks ago. That would make a non-executible stack trivial to implement."

7. 'strace' Anomoly

29�Dec�1999�-�30�Dec�1999 (14 posts) Archive Link: "strace security <feature>"

People: Alan Cox,�Peter Benie,�Richard B. Johnson

Richard B. Johnson found that any user could do an 'strace cp somefile /etc/passwd' to overwrite the system passwd file on systems that had their 'strace' binary setuid. A lot of folks were confused, because strace did not seem to be setuid on their machines, and Alan Cox added, "strace is not meant to be installed setuid root." But Peter Benie replied, "That's not entirely true - see the 'setuid installation' section of the manpage. If you've ever needed to trace a setuid program, it's obvious why this feature exists. The manpage does give a clear explanation of the security implications, so the 'bug' is still an installation error."

Richard couldn't explain how 'strace' had come to be setuid on his system, but pointed out that "Certainly `cp` never attempted to obtain root privilege so the suid-root bit set in its parent's file should have done nothing."

Amidst other replies, Peter Benie said:

That's not true. The reasons have been hashed out by others in the thread, but they are missing the point that strace is expecting that it might be installed setuid and so should demote privilege before running cp.

Failure to demote privilege means that strace has failed to provide cp with its normal environment so the behaviour of a traced cp could be different from a untraced cp. This _is_ a bug in strace, however, it is _not_ a security bug since anyone who can run a setuid-root strace is trusted anyway.

8. memcpy() Benchmarks For Winchip

30�Dec�1999�-�20�Dec�1999 (11 posts) Archive Link: "3DNow! patches on Winchip."

People: Dave Jones,�Alan Cox

Dave Jones reported that with the Winchip 2A-233, 3dNow/MMX memcpy() was around 7 or 8 times faster than a normal memcpy(), according to his benchmarks. He added, "This is however running on a 66Mhz bus, and the chip is designed for running on a 100Mhz bus, so the real gain is probably much higher. (I don't have a 100Mhz board to test this)." He suggested adding a IDT Winchip option of the CPU selection menu of the kernel configuration, which would then set CONFIG_X86_USE_3DNOW. Alan Cox replied that if his benchmarks were correct, the kernel should use the faster memcpy() if available. But he admonished, "Make sure you benchmark both the cached and uncached cases," and added:

Note that there is some stuff pending (hopefully for 2.4.0) that allows you to plug in multiple memcpy routines and handle the choice per cpu. That will also allow you to do finer tuning for the winchip. Right now with the current draft of that code it has support for

Integer copies (rep movs etc)
MMX + 3Dnow! (mmx with prefetch)
MMX no 3dnow (older mmx cpus)
FPU trick (earlier preventiums)

and more can be added (eg the K6-2 seems to be fastest using integer operations unrolled, and with prefetch stuff)

Dave did some more benchmarks with cached and uncached copies, and initially found a huge gain for the Winchip. But it turned out he had misunderstood what Alan had asked for. Instead of testing uncached data, he'd turned the cache off entirely, skewing the results. Finally he posted a much smaller (though still significant) win for the Winchip. A difference of seconds rather than minutes. No final word was heard from Alan on whether this smaller gain would be worth a patch.

Sharon And Joy

Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.