Table Of Contents
|1.||27 Apr 2003 - 5 May 2003||(48 posts)||Proposed System Call To Speed Process Creation|
|2.||28 Apr 2003 - 1 May 2003||(21 posts)||Some WLAN Chip Specs Secret To Protect Military Communications|
|3.||29 Apr 2003 - 2 May 2003||(47 posts)||'Must-Fix' Bug List For 2.6 (Or 3.0)|
|4.||1 May 2003 - 2 May 2003||(3 posts)||OSS Support For ICH5 Sound|
|5.||1 May 2003 - 7 May 2003||(12 posts)||Aic7xxx And Aic79xx Driver Updates|
|6.||1 May 2003 - 2 May 2003||(19 posts)||Possible License Violations Within The Kernel Source|
|7.||1 May 2003||(1 post)||New Release Of Hotplugging Scripts|
|8.||1 May 2003 - 5 May 2003||(3 posts)||kdb 4.2 Released|
|9.||2 May 2003 - 5 May 2003||(49 posts)||New 'Exec Shield' Security Feature|
|10.||3 May 2003||(5 posts)||Status Of 'make xconfig'|
|11.||4 May 2003 - 7 May 2003||(32 posts)||Linux 2.5.69 Released; Approaching 2.6|
|12.||6 May 2003 - 7 May 2003||(14 posts)||Status Of DVB In 2.5|
|13.||7 May 2003||(29 posts)||TTY Updates|
Mailing List Stats For This Week
We looked at 1675 posts in 7514K.
There were 440 different contributors. 234 posted more than once. 226 posted last week too.
The top posters of the week were:
1. Proposed System Call To Speed Process Creation
27 Apr 2003 - 5 May 2003 (48 posts) Archive Link: "[RFD] Combined fork-exec syscall."
People: Mark Grosberg, Matthias Andree, Davide Libenzi, Larry McVoy
Mark Grosberg proposed:
Is there any interest in a single system call that will perform both a fork() and exec()? Could this save some extra work of doing a copy_mm(), copy_signals(), etc?
I would think on large, multi-user systems that are spawning processes all day, this might improve performance if the shells on such a system were patched.
Perhaps a system call like:
pid_t spawn(const char *p_path, const char *argv, const char *envp, const int filp);
The filp array would allow file descriptors to be redirected. It could be terminated by a -1 and reference the file descriptors of the current process (this could also potentially save some dup() syscalls).
If any of these parameters (exclusing p_path) are NULL, then the appropriate values are taken from the current process.
I originally was thinking of a name of fexec() for such a syscall, but since there are already "f" variant syscalls (fchmod, fstat, ...) that an fexec() would make more sense about executing an already open file, so the name spawn() came to mind.
I know almost all of my fork()-exec() code does almost the same thing. I guess vfork() was a potential solution, but this somehow seems cleaner (and still may be more efficient than having to issue two syscalls)... the downside is, of course, another syscall.
There was not much enthusiasm. Only Rafael Costa dos Santos showed any interest, offering to help code the thing up. Elsewhere, Larry McVoy, while not outright against the idea, felt there were significant compatibility issues. In particular, he suggested ensuring compatibility with Windows NT. Elsewhere, Matthias Andree took a somewhat dimmer view of the compatibility issue. He said adding a new system call was "a major showstopper, because it'd only be useful to non-portable, Unix-specific applications (thus it wouldn't be put to much use)."
Other folks pointed out that the small amount of time saved by avoiding an extra system call during process creation would be completely overwhelemed by the time it took to actually execute the program that would run in that process. On the other hand, Davide Libenzi suggested doing this as part of the C library, in user space. A new system call would be overkill for such small gains, but it might be worth adding a library call.
2. Some WLAN Chip Specs Secret To Protect Military Communications
28 Apr 2003 - 1 May 2003 (21 posts) Archive Link: "Broadcom BCM4306/BCM2050 support"
Topics: Legal Issues, Networking
People: Martin List-Petersen, David S. Miller, Alan Cox, Carl-Daniel Hailfinger, Richard B. Johnson, Bas Mevissen, Zack Brown
Bas Mevissen asked if Linux had any support for Broadcom's BCM4306 or BCM2050 WLAN chips. He saw that the BCM4401 ethernet chip had a Linux driver, and was hopeful that maybe the WLAN chips did as well. Martin List-Petersen replied, "It seems, that the specs haven't been released yet. There are quite a few Wlan cards out there based on the Broadcom chips (nearly all cards, that support 802.11g), so it's quite a shame. (Actually this fits the the TrueMobile 1180, 1300 and 1400, speaking of Dell wireless lan cards)." He added, "The same problem is with the Intel Prowireless 2100 (Centrino) WLan card. No Linux support available yet, which is another choice for the Dell notebooks at the moment." But he also said there was a Petition (http://www.petitiononline.com/BCM4301/petition.html) folks could sign, regarding this very issue. Martin concluded, "I've tried to contact Broadcom directly, but they are just ignoring mails containing the word "Linux", so it seems." David S. Miller also said:
Don't expect specs or opensource drivers for any of these pieces of hardware until these vendors figure out a way to hide the frequency programming interface.
Ie. these cards can be programmed to transmit at any frequency, and various government agencies don't like it when f.e. users can transmit on military frequencies and stuff like that.
The only halfway plausible idea I've seen is to not document the frequency programming registers, and users get a "region" key file that has opaque register values to program into the appropriate registers. The file is per-region (one for US, Germany, etc.)and the wireless kernel driver reads in this file to do the frequency programming.
So don't blame the vendors on this one, several of them would love to publish drivers public for their cards, but simply cannot with upsetting federal regulators.
Alan Cox remarked that folks were already cracking the Windows interface on those cards, and that non-US governments cared about this issue as well. He said, "The fact people are already abusing the technology suggests that they will be forced to go the crypted settings route for next generation hardware anyway." And added, "I talked to one vendor about this stuff and fingers crossed we will see open drivers except for the radio module. In the longer term I suspect vendors will move to signed register sets, so you can load "US 802.11g" but you can't load "police frequency, full power""
At some point Bas suggested that if these vendors were really willing to release their specs, but were only holding back to satisfy government agencies, then maybe they could release some binary drivers in the interim. Martin replied to this, "I totally agree on this. A binary driver could better than nothing at this point. Another thing that wonders me, is why companies like Broadcom, if they are so open to releasing the drivers at some point, where they can make the regulation agencies somewhat happy, are so ignorant then. I've heard of serveral people, that tried to get a statement on the possibilty for Linux drivers from then and the return is nothing. I've actually tried myself. No response at all."
Elsewhere, Carl-Daniel Hailfinger's eyes lit up at the prospect of transmitting on military frequencies. He said he "wants binary only driver for these cards to build opensource driver with ability to set "interesting" frequency range." Martin said, "It's there for Windows." And at some point, Richard B. Johnson said:
Contrary to popular opinion, there is no FCC regulation prohibiting one from receiving some particular frequency. There is, however, a federal law prohibiting the disclosure of a radio message by a third party. This means that the media, or even law enforcement can't listen to a private radio (cell phone) conversation and then disclose its content. At one time, cell phones used FM at 960 MHz. This could be readily received by receivers designed for Amateur Radio use. For a time, the FCC refused to Type Approve receivers that cover these frequencies. However, most Hams know how to fix their receivers so they can receive whatever they want and Type Approval was only required for receivers that were designed to be sold. You could build anything you want for yourself. This refusal to Type Approve receivers was a trick to make the usual receiver owner think that there was some dumb regulation when, in fact, under the Communications Act of 1934 (as amended), there can't be such a regulation without creating a new public law, which hasn't happened and probably will not.
Recently, some broadcast satellite companies have tried to get the FCC to declare that their transmissions are private and unauthorized reception should be unlawful. The FCC has continually postponed any such declaration because, if once broadcast, a radio signal doesn't become public, then anybody could sue every radio transmitter operator to prevent the trespass of "their" signals onto private property. You can't have it both ways, either radio signals are public and, therefore cannot commit a trespass, or they are private and can.
But, unlike some other countries regulators, the FCC has steadfastly refused to allow broadcasters, even satellite broadcasters, to pursue such extortion. Basically, once a signal leaves an antenna, it becomes public property.
The same is not true for cable and "guided waves". Satellite broadcasters have not been able to convince the FCC that their transmissions are "guided waves". However, some private RF link companies signals, including some that use satellites, are considered "guided waves" and cannot be used without permission.
Various commercial interests have convinced governments of many other countries that they "own" their radio signals and therefore different regulations exist in many other countries. In the UK, for instance, one has to purchase a license to use a receiver (you know, some Sony Walkman). This is, in my opinion, extremely repressive. It would be nice for somebody to start suing the BBC (and others) to recover damages for the criminal trespass of "their" radio signals onto private property. After a few such lawsuits, the ownership of such broadcast signals would revert to the public, just like in the US.
Carl-Daniel replied, "Here in Germany, receiving some particular frequencies (e.g. those used by the police) was prohibited a few years ago (I don't know exactly if they changed the law). The argument was that some receiver types emitted a weak signal on the frequency they were listening to (and could be tuned to become a private radio station) which could interfere with the low-power police devices. However, it was simply not sensible to prohibit all radios, so they were constained to a specific frequency range."
Close by, Alan took exception to Richard's statement that people needed a license for things like a Sony Walkman in England. Alan said, "You need a license to receive terrestrial TV but that is rather different and relates to both cultural and historical tax differences in philosophy between the US and UK. The big problem with 'soft' radios is transmit. You can hotwire your centrino. People in the UK are already trying to use US drivers in Windows XP because "they go further". If you listen to police transmissions then its ultimately poor police security, if you transmit on their frequency then its a lot more serious because you might interfere with emergency services."(Several months later, in August, Ed Weinberg sent me an email saying that Richard's "description of The Communications Act of 1934, as amended, is close. A few years ago (less than 10, anyway) Congress passed a law forbidding radio's to receive signals in the bands used by mobile phones. Unauthorised listening to satellite transmissions seems to also be prevented by new laws. Recent reports I have read say that the satellite broadcasters are getting people under the DMCA." -- Ed: [05 Aug 2003 00:00:00 -0800]
3. 'Must-Fix' Bug List For 2.6 (Or 3.0)
29 Apr 2003 - 2 May 2003 (47 posts) Archive Link: "must-fix list for 2.6.0"
Topics: Big Memory Support, Big O Notation, Bug Tracking, Device Mapper, Disk Arrays: RAID, Disks: IDE, Disks: SCSI, Executable File Format, FS: NFS, FS: ReiserFS, FS: devfs, FS: ext3, FS: sysfs, Hot-Plugging, Ioctls, Networking, PCI, Power Management: ACPI, Scheduler, Software Suspend, Version Control, Virtual Memory
People: Andrew Morton, Andi Kleen, Benjamin Herrenschmidt, Christoph Hellwig, Chris Mason, Peter Braam, Pavel Machek, Mike Galbraith
Andrew Morton said:
Below is a first cut at tracking the major work items which should be completed for a 2.6 release.
When considering these items it would be useful to have a clear idea of what a 2.6.0 release is actually _for_. Obviously, 2.6.0 doesn't mean "it's finished, ship it".
I'd propose that 2.6.0 means that users can migrate from 2.4.x with a good expectation that everything which they were using in 2.4 will continue to work, and that the kernel doesn't crash, doesn't munch their data and doesn't run like a dog. Other definitions are welcome.
I shall be maintaining list this so we can understand where we are with respect to 2.6 readiness. And so we can look at features and say "no". And so we can look at bugs and say "not gating 2.6.0".
Things we should not track here are:
Things which we should track here are significantly-sized outstanding development activities which resolve big bugs or which address missing features & speedups.
I've organised it into three main sections:
The list is already very long, and very incomplete. Additions (and removals!!!) are sought. Thanks.
And thanks to the various contributors who helped pull this together.
TTY locking is broken (see FIXME in do_tty_hangup())
"One bug that was found is that the dropping of lock_kernel from do_exit caused races in the exit tty cleanup. There was a patch for that, but I'm not sure it was merged."
RAID0 dies on strangely aligned BIOs
- Need to hoist BIO-split code out of device mapper, use that.
1/ RAID5 should work fine. It accepts any sort of bio and always submits a 1-page bio to the underlying device, and if my understanding is correct, every device must be able to handle a single page bio, no matter what the alignment (which is why raid0 has a problem - it doesn't).
2/ RAID1 works pretty well. The only improvement needed is to define a merge_bvec_fn function which passes the question down to lower layers. This should be easy except for the small fact that it is impossible :-) There is no enforced pairing between calls to merge_bvec_fn and submit_bh, so it is possible that a hot spare with different restrictions could get swapped in between the one and the other and could confuse things. I suspect that can be worked around somehow though...
Someone sent me a patch that is sorely needed - it allows you to simply call blk_queue_stack() (or somethink like that), and it will get your stacked limits set appropriately.
3/ I just realised that raid0 is easier than I had previously thought. We don't need the completely functional bio splitting that dm has. We only need to be able to split a bio that has just one page as the use of merge_bvec_fn will ensure that we never get a larger bio that we cannot handle. And splitting a bio with only one page is a lot easier. I now have code in my tree that implements this quite cleanly and will probably post a patch during the week.
CD burning. There are still a few quirks to solve wrt SG_IO and ide-cd.
Jens: The basic hang has been solved (double fault in ide-cd), there still seems to be some cases that don't work too well. Don't really have a handle on those :/
NFS client gets an OOM deadlock.
- Some fixes exist in -mm. Seem to mostly work.
NFS client runs very slowly consuming 100% CPU under heavy writeout.
- Unsubtle fix exists in -mm. (Looks like it's fixed anyway).
AIO/direct-IO writes can race with truncate and wreck filesystems.
- Easy fix is to only allow the feature for S_ISBLK files.
O(1) scheduler starvation, poor behaviour seems unresolved.
Jens: "I've been running 2.5.67-mm3 on my workstation for two days, and it still doesn't feel as good as 2.4. It's not a disaster like some revisisons ago, but it still has occasional CPU "stalls" where it feels like a process waits for half a second of so for CPU time. That's is very noticable."
Also see Mike Galbraith's work.
Alan: 32bit uid support is *still* broken for process accounting.
Overcommit accounting gets wrong answers
- underestimates reclaimable slab, gives bogus failures when dcache&icache are large.
- gets confused by reclaimable-but-not-freed truncated ext3 pages. Lame fix exists in -mm.
UDP apps can in theory deadlock, because the ip_append_data path can end up sleeping while the socket lock is held.
It is OK to sleep with the socket held held, normally. But in this case the sleep happens while waiting for socket memory/space to become available, if another context needs to take the socket lock to free up the space we could hang.
I sent a rough patch on how to fix this to Alexey, and he is analyzing the situation. I expect a final fix from him next week or so.
Semantics for IPSEC during operations such as TCP connect suck currently.
When we first try to connect to a destination, we may need to ask the IPSEC key management daemon to resolve the IPSEC routes for us. For the purposes of what the kernel needs to do, you can think of it like ARP. We can't send the packet out properly until we resolve the path.
What happens now for IPSEC is basically this:
O_NONBLOCK: returns -EAGAIN over and over until route is resolved
!O_NONBLOCK: Sleeps until route is resolved
These semantics are total crap. The solution, which Alexey is working on, is to allow incomplete routes to exist. These "incomplete" routes merely put the packet onto a "resolution queue", and once the key manager does it's thing we finish the output of the packet. This is precisely how ARP works.
I don't know when Alexey will be done with this.
(Trond:) Yes: I'm still working on an atomic "open()", i.e. one where we short-circuit the usual VFS path_walk() + lookup() + permission() + create() + .... bullsh*t...
I have several reasons for wanting to do this (all of them related to NFS of course, but much of the reasoning applies to *all* networked file systems).
1) The above sequence is simply not atomic on *any* networked filesystem.
2) It introduces a sh*tload of completely unnecessary RPC calls (why do a 'permission' RPC call when the server is in *any* case going to tell you whether or not this operations is allowed. Why do a 'lookup()' when the 'create()' call can be made to tell you whether or not a file already exists).
3) It is incompatible with some operations: the current create() doesn't pass an 'EXCLUSIVE' flag down to the filesystems.
4) (NFS specific?) open() has very different cache consistency requirements when compared to most other VFS operations.
I'd very much like for something like Peter Braam's 'lookup with intent' or (better yet) for a proper dentry->open() to be integrated with path_walk()/open_namei(). I'm still working on the latter (Peter has already completed the lookup with intent stuff).
Real serious use of IPSEC is hampered by lack of MPLS support. MPLS is a switching technology that works by switching based upon fixed length labels prepended to packets. Many people use this and IPSEC to implement VPNs over public networks, it is also used for things like traffic engineering.
A good reference site is:
Anyways, an existing (crappy) implementation exists. I've almost completed a rewrite, I should have something in the tree next week.
Sometimes we generate IP fragments when it truly isn't necessary.
The way IP fragmentation is specified, each fragment must be modulo 8 bytes in length. So suppose the device has an MTU that is not 0 modulo 8, ethernet even classifies in this way. 1500 == (8 * 187) + 4
Our IP fragmenting engine can fragment on packets that are sized within the last modulo 8 bytes of the MTU. This happens in obscure cases, but it does happen.
I've proposed a fix to Alexey, whereby very late in the output path we check the packet, if we fragmented but the data length would fit into the MTU we unfragment the packet.
This is low priority, because technically it creates suboptimal behavior rather than mis-operation.
IPV4 output engine changes for IPSEC need to be moved over to IPV6.
IPV6 ipsec works but gravely suboptimally in some cases. It is also for this reason that the zerocopy UDP stuff isn't functional on the ipv6 side.
The USAGI project (www.linux-ipv6.org) is working with Alexey on this work.
davem: Netfilter needs to stop linearizing packets as much as possible.
Zerocopy output packets are basically undone by netfilter becuase all of it assumed it was working with linear socket buffers.
Rusty is fixing this piece by piece. He is nearly done with this work.
(Pat) There is some preliminary work at bk://ldm.bkbits.net/linux-2.5-power, though I'm currently in the process of reworking it.
A better suspend-to-disk mechanism that swsusp.
There are various other details to be worked out, which are the real fun part. And of course, driver support, but that is something that can happen at any time.
We need a kernel side API for reporting error events to userspace (could be async to 2.6 itself)
(Prototype core based on netlink exists)
fixup tty-based ISDN drivers which provide TIOCM* ioctls (see my recent 3-set patch for serial stuff)
Alternatively, we could re-introduce the fallback to driver ioctl parsing for these if not enough drivers get updated.
Andi Kleen replied:
I found a new bad class of bugs (slowly working on fixing them, also present in 2.4)
Machine Check handlers use printk in an NMI like (ignoring cli) situation. This can deadlock on the console or low level character driver (serial, vga) locks. Not all MCEs are fatal (e.g. corrected ECC errors) and the kernel should be safely able to continue.
Need to buffer the printk in an atomic fashion (e.g. in a ring buffer managed with cmpxchg) and cause an self IPI that triggers an interrupt after the next sti. This is easy with x86/APIC mode, but difficult with PIC (the 8259 supports it in theory, but it's not clear that all clones in various chipsets do; also changing the programming may be risky). Fallback: pick it up with the next timer interrupt by adding a check there.
New entries for the x86-64 list (actually I'm not sure they are all x86-64 specific, just that the bug has been seen there)
need /proc/kcore access for kernel mappings that are outside vmalloc (in particular the kernel and the modules are special mappings on x86-64; other architectures have the same problem)
Best would be to put them in the vmalloc mappings list, but that requires some more fixes in other code that uses it. Also /proc/kcore seems to have some 64bit signedness bugs (patch for 2.4 exists)
To the generic item, Pavel Machek replied that his patch had been accepted. Andi replied that things were still quite broken; and Pavel said a new patch was on his way to Linus.
Elsewhere, regarding Andrew's item regarding IDE suspend/resume without races, Benjamin Herrenschmidt said, "I have something that work not too badly for PPC already but that need some cleanup, to be tested/adapted to Pat's new work (especially tested against his swsusp, and we shall still verify if it fits x86 needs)" .
Elsewhere, Christoph Hellwig added his own items to Andrew's list:
large parts of the locking are hosed or not existant
4. OSS Support For ICH5 Sound
1 May 2003 - 2 May 2003 (3 posts) Archive Link: "OSS support for ICH5 sound"
Topics: Sound: OSS
People: Jeff Garzik, Martin Schlemmer
Martin Schlemmer got sound working on his ICH5, by simply adding the ICH5 IDs to the list. That worked for his system, but Jeff Garzik replied, "Unfortunately this doesn't work on all ICH5s out there. At the very minimum, for now, it would be nice to match up ich5 and codec pairs, as codec differentiation seems to be what stops this patch from working on all ICH5." And Martin replied:
Anybody working on getting support for the 875 Chipset into 2.5? Can I send a 'lspci -vv' to help ? I have a Asus P4C800 here (Intel 875p), so I can do some testing if need be.
But there was no reply.
5. Aic7xxx And Aic79xx Driver Updates
1 May 2003 - 7 May 2003 (12 posts) Archive Link: "Aic7xxx and Aic79xx Driver Updates"
Topics: BSD: FreeBSD, Version Control
People: Justin T. Gibbs
Justin T. Gibbs announced:
I've just uploaded version 1.3.8 of the aic79xx driver and version 6.2.33 of the aic7xxx driver. Both are available for 2.4.X and 2.5.X kernels in either bk send format or as a tarball from here:
RPMs and DUDs for various distributions are also available:
6. Possible License Violations Within The Kernel Source
1 May 2003 - 2 May 2003 (19 posts) Archive Link: "Did the SCO Group plant UnixWare source in the Linux kernel?"
People: Chris Friesen, Nomen Nescio, Christoph Hellwig, Jim Nance
Someone gave a link to a CNet article (http://msnbc-cnet.com.com/2100-1016_3-999371.html) which said that the SCO group claimed to have found instances of copyrighted UnixWare code in the Linux kernel sources. Chris Friesen replied:
According to an article here:
SCO-Caldera Senior Vice President Chris Sontag explicitly says that the kernel.org kernel is *not* tainted, but that that other stuff that Red Hat and SuSE are including *is*.
Quote from the interview:
"Chris Sontag: We're not talking about the Linux kernel that Linus and others have helped develop. We're talking about what's on the periphery of the Linux kernel."
He doesn't specify exactly what he's talking about, but he makes an interesting claim:
"Chris Sontag: We are using objective third parties to do comparisons of our UNIX System V [SCO-owned Unix] source code and Red Hat as an example. We are coming across many instances where our proprietary software has simply been copied and pasted or changed in order to hide the origin of our System V code in Red Hat. This is the kind of thing that we will need to address with many Linux distribution companies at some point."
But Nomen Nescio said:
Hmm. SCO Group Chief Executive Darl McBride says _exactly_ the opposite according to http://msnbc-cnet.com.com/2100-1016_3-999371.html :
"We're finding ... cases where there is line-by-line code in the Linux kernel that is matching up to our UnixWare code.
We're finding code that looks likes it's been obfuscated to make it look like it wasn't UnixWare code -- but it was."
Chris Sontag should get his story straight with his boss before he opens his mouth to the press.
Elsewhere, Christoph Hellwig replied to the original post as well, saying:
As somone who walked for SCO (or rather Caldera how it was called at that time) I can tell you this is utter crap. There were very people actually doing Linux kernel work then (and when the German office was closed down all those left the company) and we really had better things to do then trying to retrofit UnixWare code into the linux kenrel. Especially given that the kernel internals are so different that you'd need a big glue layer to actually make it work and you can guess how that would be ripped apart in a usual lkml review :)
It might be more interesting to look for stolen Linux code in Unixware, I'd suggest with the support for a very well known Linux fileystem in the Linux compat addon product for UnixWare..
Jim Nance said, "Wouldnt it be halirous if whatever code SCO is talking about when they say there is Unix code in Linux turns out to be code some SCO employee ripped out of some GPL program and stuck it into Unixware. That is actually far more likely than what they alledge."
There were a few more quips, and the thread petered out inconclusively.
7. New Release Of Hotplugging Scripts
1 May 2003 (1 post) Archive Link: "[ANNOUNCE] 2003-05-01 release of hotplug scripts"
People: Greg KH
Greg KH announced:
I've just packaged up the latest Linux hotplug scripts into a release, which can be found at:
Or from your favorite kernel.org mirror at:
or for those who like bz2 packages:
I've also packaged up some pre-built (and signed) Red Hat 7.3 based rpms:
The source rpm is available if you want to rebuild it for other distros or versions of Red Hat at:
The main web site for the linux-hotplug project can be found at:
which contains lots of documentation on the whole linux-hotplug process.
There are lots of changes in this release from the last one (which was almost 8 months ago), most of them make things work better for systems running 2.5, but some of them fix problems that 2.4 users will see.
Some of the major changes in this release are:
The full ChangeLog extract since the last release is included below for those who want to know everything that's been changed, and who to blame for them :)
8. kdb 4.2 Released
1 May 2003 - 5 May 2003 (3 posts) Archive Link: "Announce: kdb v4.2 is available for kernel 2.4.20, i386 and ia64"
Topics: Big O Notation, FS: XFS, Forward Port, Scheduler, USB
People: Keith Owens, Thomas Duffy, Steven Dake, Vamsi Krishna S.
Keith Owens announced kdb (ftp://oss.sgi.com/projects/kdb/download/v4.2/) kernel debugger 4.2 for the 2.4.20 kernel on i386 and ia64 systems. He said:
Changelog extracts since v4.1.
2003-05-02 Keith Owens <email@example.com>
2003-05-02 Keith Owens <firstname.lastname@example.org>
2003-05-02 Keith Owens <email@example.com>
And Thomas Duffy replied:
Attached is a (link to a) forward port of the sparc64 kdb patch to v4.2 of kdb. It is still rough around the edges, but it at least builds and boots and is somewhat usable.
You must apply the kdb common patch and the patch to kdb common I sent earlier to get it to build properly.
9. New 'Exec Shield' Security Feature
2 May 2003 - 5 May 2003 (49 posts) Subject: "[Announcement] "Exec Shield", new Linux security feature"
Topics: Executable File Format, FS: sysfs, Virtual Memory
People: Ingo Molnar, Johannes
Ingo Molnar announced:
We are pleased to announce the first publically available source code release of a new kernel-based security feature called the "Exec Shield", for Linux/x86. The kernel patch (against 2.4.21-rc1, released under the GPL/OSL) can be downloaded from:
The exec-shield feature provides protection against stack, buffer or function pointer overflows, and against other types of exploits that rely on overwriting data structures and/or putting code into those structures. The patch also makes it harder to pass in and execute the so-called 'shell-code' of exploits. The patch works transparently, ie. no application recompilation is necessary.
It is commonly known that x86 pagetables do not support the so-called executable bit in the pagetable entries - PROT_EXEC and PROT_READ are merged into a single 'read or execute' flag. This means that even if an application marks a certain memory area non-executable (by not providing the PROT_EXEC flag upon mapping it) under x86, that area is still executable, if the area is PROT_READ.
Furthermore, the x86 ELF ABI marks the process stack executable, which requires that the stack is marked executable even on CPUs that support an executable bit in the pagetables.
This problem has been addressed in the past by various kernel patches, such as Solar Designer's excellent "non-exec stack patch". These patches mostly operate by using the x86 segmentation feature to set the code segment 'limit' value to a certain fixed value that points right below the stack frame. The exec-shield tries to cover as much virtual memory via the code segment limit as possible - not just the stack.
The exec-shield feature works via the kernel transparently tracking executable mappings an application specifies, and maintains a 'maximum executable address' value. This is called the 'exec-limit'. The scheduler uses the exec-limit to update the code segment descriptor upon each context-switch. Since each process (or thread) in the system can have a different exec-limit, the scheduler sets the user code segment dynamically so that always the correct code-segment limit is used.
the kernel caches the user segment descriptor value, so the overhead in the context-switch path is a very cheap, unconditional 6-byte write to the GDT, costing 2-3 cycles at most.
Furthermore, the kernel also remaps all PROT_EXEC mappings to the so-called ASCII-armor area, which on x86 is the addresses 0-16MB. These addresses are special because they cannot be jumped to via ASCII-based overflows. E.g. if a buggy application can be overflown via a long URL:
then only ASCII (ie. value 1-255) characters can be used by attackers. If all executable addresses are in the ASCII-armor, then no attack URL can be used to jump into the executable code - ie. the attack cannot be successful. (because no URL string can contain the \0 character.) E.g. the recent sendmail remote root attack was an ASCII-based overflow as well.
With the exec-shield activated, and the 'cat' binary relinked into the the ASCII-armor, the following layout is created:
$ ./cat-lowaddr /proc/self/maps 00101000-00116000 r-xp 00000000 03:01 319365 /lib/ld-2.3.2.so 00116000-00117000 rw-p 00014000 03:01 319365 /lib/ld-2.3.2.so 00117000-0024a000 r-xp 00000000 03:01 319439 /lib/libc-2.3.2.so 0024a000-0024e000 rw-p 00132000 03:01 319439 /lib/libc-2.3.2.so 0024e000-00250000 rw-p 00000000 00:00 0 01000000-01004000 r-xp 00000000 16:01 2036120 /home/mingo/cat-lowaddr 01004000-01005000 rw-p 00003000 16:01 2036120 /home/mingo/cat-lowaddr 01005000-01006000 rw-p 00000000 00:00 0 40000000-40001000 rw-p 00000000 00:00 0 40001000-40201000 r--p 00000000 03:01 464809 locale-archive 40201000-40207000 r--p 00915000 03:01 464809 locale-archive 40207000-40234000 r--p 0091f000 03:01 464809 locale-archive 40234000-40235000 r--p 00955000 03:01 464809 locale-archive bfffe000-c0000000 rw-p fffff000 00:00 0
In the above layout, the highest executable address is 0x01003fff, ie. every executable address is in the ASCII-armor.
this means that not only the stack is non-executable, but lots of mmap()-ed data areas and the malloc() heap is non-executable as well. (some data areas are still executable, but most of them are not.)
the first 1MB of the ASCII-armor is left unused to provide NULL pointer dereference protection and leave space for 16-bit emulation mappings used by XFree86 and others.
Compare this with the memory layout without exec-shield:
08048000-0804b000 r-xp 00000000 16:01 3367 /bin/cat 0804b000-0804c000 rw-p 00003000 16:01 3367 /bin/cat 0804c000-0804e000 rwxp 00000000 00:00 0 40000000-40012000 r-xp 00000000 16:01 3759 /lib/ld-2.2.5.so 40012000-40013000 rw-p 00011000 16:01 3759 /lib/ld-2.2.5.so 40013000-40014000 rw-p 00000000 00:00 0 40018000-40129000 r-xp 00000000 16:01 4058 /lib/libc-2.2.5.so 40129000-4012f000 rw-p 00111000 16:01 4058 /lib/libc-2.2.5.so 4012f000-40133000 rw-p 00000000 00:00 0 bffff000-c0000000 rwxp 00000000 00:00 0
In this layout none of the executable areas are in the ASCII-armor, plus the exec-limit is 0xbfffffff (3GB) - ie. including all userspace mappings.
Note that the kernel will relocate every shared-library to the ASCII-armor, but the binary address is determined at link-time. To ease the relinking of applications to the ASCII-armor, Arjan Van de Ven has written a binutils patch (binutils-220.127.116.11.18-elf-small.patch), which adds a new 'ld' flag "ld -melf_i386_small" (or "gcc -Wl,-melf_i386_small") to relink applications into the ASCII-armor. (The patch can be found at the exec-shield URL as well.)
the patch was designed to be as efficient as possible. There's a very minimal (couple of cycles) tracking overhead for every PROT_MMAP system-call, plus there's the 2-3 cycles cost per context-switch.
This feature will not protect against every type of attack.
E.g. if an overflow can be used to overwrite a local variable which changes the flow of control in a way that compromises the system. But we do believe that this feature will stop every attack that is purely operating by overflowing the return address on the stack, or overflowing a function pointer in the heap. Furthermore, exec-shield makes it quite hard to mount a successful attack even in the other cases, because it inhibits the execution of exploit shell-code, in most cases.
also, if the overflow is within the exec-shield itself (e.g. within the data section of one of the shared library objects in the ASCII-armor) then the overflow might be possible to exploit.
All in one, exec-shield is one barrier against attacks, not blanket 100% protection in any way. The most efficient security can be provided by installing as many layers as possible.
To provide as good protection as possible, there's no trampoline workaround in the exec-shield code - ie. exec-limit violations in the trampoline case are never let through. Applications that need to rely on gcc trampolines will have to use the per-binary ELF flag to make the stack executable again. (The ELF flag is the same as used by Solar Designer's non-exec stack patch, to provide as much compatibility with existing non-exec-stack installations as possible.)
The exec-shield feature will uncover applications that incorrectly assumed that PROT_READ allows execution on x86. One such example is the XFree86 module loader. The latest XFree86 on rawhide.redhat.com fixes this problem. For those who cannot install the XFree86 bugfix at the moment there's a workaround added by the patch, which can be activated via:
echo 1 > /proc/sys/kernel/X-workaround
This will make every iopl() using application (such as X) have the exec-shield disabled. Other applications (sendmail, etc.) will still have the exec-shield enabled. This workaround is default-off. We strongly encourage to solve this problem by upgrading X, or by using the 'chkstk' utility to make X's stack forced-executable.
Apply the exec-shield-2.4.21-rc1-B6 kernel patch to the 2.4.21-rc1 kernel, recompile & install the kernel and reboot into it, that's all.
There is a new boot-time kernel command line option called exec-shield=, which has 4 values. Each value represents a different level of security:
exec-shield=0 - always-disabled exec-shield=1 - default disabled, except binaries that enable it exec-shield=2 - default enabled, except binaries that disable it exec-shield=3 - always-enabled
the current patch defaults to 'exec-shield=2'. The security level can also be changed runtime, by writing the level into /proc:
echo 0 > /proc/sys/kernel/exec-shield
IMPORTANT: security-relevant applications that were started while the exec-shield was disabled, will have an executable stack and will thus have to be restarted if the exec-shield is enabled again.
I've also uploaded a modified version of Solar Designer's chstk.c code, which adds the options necessary to change the 'enable non-exec stack' ELF flag:
$ ./chstk Usage: ./chstk OPTION FILE... Manage stack area executability flag for binaries -e enable execution permission -E enable non-execution permission -d disable execution permission -D disable non-execution permission -v view current flag state
ie. there are two distinct flags, one for forcing an executable stack, one for forcing a non-executable stack. If both flags are zero then the binary will follow the system default.
ie. it's possible to use an exec-shield level of 1, and enable the non-exec stack on a per binary basis, by using the 'exec-shield=1' boot option and changing binaries one at a time:
./chstk -E /usr/sbin/sendmail
(People migrating production environments to an exec-shield kernel might prefer this variant.)
anyway, comments, suggestions and test feedback is welcome.
Later, he posted an update:
a new (-C5) release of the exec-shield patch can be found at:
Changes since -B6:
most of the new stuff in this patch (randomization, information filtering) has been done in other patches as well (such as PaX, grsecurity, non-exec stack patch, etc.) - i tried to filter out and add the ones that matter most, do not introduce constraints and are thus uncontroversial.
Various folks were very happy to see this work, and a bunch of people started discussing the implementation and various security issues.
10. Status Of 'make xconfig'
3 May 2003 (5 posts) Subject: "make xconfig & qt"
People: Daniele Pala, Balram Adlakha, Sam Ravnborg, Diego Calleja Garcia
Daniele Pala asked, "Trying to run 'make xconfig' i got into the message 'you don't have installed qt!'...so the xconfig is now dependant from qt? why? what about us poor guy who only use twm and not kde? isn't qt pretty big and fat?" Diego Calleja Garcia replied that 'make gconfig' would use the gtk library; Balram Adlakha said, "I think xconfig should be the "X" based one, qconfig should be the qt based one and gconfig should be the gtk one." Sam Ravnborg invited him to contribute a generic X-based config program.
11. Linux 2.5.69 Released; Approaching 2.6
4 May 2003 - 7 May 2003 (32 posts) Archive Link: "Linux 2.5.69"
Topics: FS: devfs
People: Linus Torvalds, Christoph Hellwig
Linus Torvalds announced 2.5.69 () and said:
Ok, I finally found the reason for why some of my machines had trouble with restarting the X server, and it turns out that it's been around since very early February. I bet others must have seen it too, with random crashes on X server restart when the server used AGP (which means that it mainly hit either hw-accelerated 3D setups or the intel integrated graphics which use a UMA model with AGP as the backing store).
That's a big relief for me, as it was the major thing I personally worried about for 2.6.x.
Anyway, that's fixed here, along with a lot of other updates. Much of 2.5.69 is small one-liners to drivers to handle the new IRQ semantics, but there's a lot of other cleanups in there too (Christoph Hellwig continued on his devfs rampage, for example).
NOTE! As of this release I think I'll want to have patches either be _really_ obvious, or they should go through one of more people for approval. In particular, I'm hoping that the paperwork stuff with Andrew should be getting closer to finalized, and that we could start moving over towards a 2.6.x release schedule..
12. Status Of DVB In 2.5
6 May 2003 - 7 May 2003 (14 posts) Archive Link: "[PATCH[[2.5][3-11] update dvb subsystem core"
Topics: Digital Video Broadcasting, FS: devfs, I2C, Version Control
People: Michael Hunold, Christoph Hellwig, Gerd Knorr, Alan Cox
Michael Hunold announced:
this patch updates the dvb subsystem core.
Christoph Hellwig had some criticism of the code, and gave Michael some suggestions. He also said, "your devfs stuff is a mess. I already told one of the DVB folks (it wasn't you IIRC) that I'll publish a 2.5 devfs API on 2.4 header. But first I have to fix the devfs API on 2.5 and randomly bringing back old crap and lots of ifdefs in those changing areas won't help. What the problem with 2.5, dvb and devfs?" Michael replied:
The main problem is that our development "dvb-kernel" CVS tree *should* compile under 2.4 aswell, because most of the dvb-users don't want to participate in kernel development in general, but only on the development of the dvb subsystem. So work is done on the "dvb-kernel" tree, which should be synced with the 2.5 kernel frequently.
So, regarding devfs, I introduced #ifdefs around the functions that have changed recently. That's not nice, I know. But in my eyes it's important to keep the CVS and the kernel version more in sync.
IIRC Gerd Knorr has the same problems with his driver packages (regarding the i2c subsystem mainly), but he has written some perl scripts to remove the #ifdef stuff before submitting his patches...
Christoph felt that it would be best to delay the dvb updates, because "you don't just add ifdefs (which give me lots of rejects and you much uglier code than just using the compat header I'll send to lkml once I'm done with the API changes) but you also change the code that's ifdefed for 2.5 to reverse change I did. There is a reason why I removed every occurance of devfs_handle_t from all drivers and the particular reason is that it will go away in the next series of patches." Michael asked how best to proceed, and after a little wrangling, it was agreed that Michael should continue to send updates to either Christoph or Alan Cox, bearing in mind that 2.5 features shouldn't be broken by Michael's updates.
13. TTY Updates
7 May 2003 (29 posts) Archive Link: "[BK PATCH] TTY changes for 2.5.69"
Topics: FS: sysfs
People: Greg KH, Hanna Linder
Greg KH announced:
Here are some TTY changesets that do two different things TTY related things:
Sharon And Joy
Kernel Traffic is grateful to be developed on a computer donated by Professor Greg Benson and Professor Allan Cruse in the Department of Computer Science at the University of San Francisco. This is the same department that invented FlashMob Computing. Kernel Traffic is hosted by the generous folks at kernel.org. All pages on this site are copyright their original authors, and distributed under the terms of the GNU General Public License version 2.0.