Wine Traffic #41 For 1 May 2000
Table Of Contents
Introduction
This is the 41st release of the Wine's kernel cousin
publication. It's main goal is to distribute widely what's
going on around Wine (the Un*x windows emulator).
Wine 20000430 has been released. Main changes are:
- Wine is now distributed under the X11 license.
- DirectDraw restructuration.
- Debugger is now an external Winelib program.
- pthreads emulation for thread-safe glibc routines.
- On-demand loading of built-in dlls.
- WININET, URLMON and i18n fixes merged from Corel tree.
- Lots of bug fixes.
Mailing List Stats For This Week
We looked at 161 posts in 444K.
There were 29 different contributors.
20 posted more than once.
11 posted last week too.
The top posters of the week were:
- 28 posts in 150K by Patrik Stridvall <ps@leissner.se>
- 21 posts in 13K by Dimitrie O. Paun <dimi@cs.toronto.edu>
- 13 posts in 33K by gerard patel <g.patel@wanadoo.fr>
- 13 posts in 28K by Uwe Bonnes <bon@elektron.ikp.physik.tu-darmstadt.de>
- 13 posts in 28K by Alexandre Julliard <julliard@winehq.com>
- Full Stats
1.
Improving wrc
Archive Link: "Resource endianness"
People:
Bertho Stultiens, Alexandre Julliard, Ulrich Weigand,
Bertho Stultiens, while preparing for a new version of wrc (the Wine
resource compiler), had some yet unanswered questions:
According to what I found on the web are resources always
little-endian because MS does not support/wrote OSes for big-endian
processors. There are a couple of questions that go with this:
- Is it true that MS only has little-endian version?
- Should I support big-endian at all in wrc? Currently, wrc
generates the native endianness of the platform, but it does _not_
convert binary resources (such as bitmaps). It is actually extremely
difficult to mix endianness in resources because everything has to be
examined and _cannot_ be guaranteed to be correct (such as RCDATA).
- Should wine only use little-endian in the resources? In my
opinion, yes. Let the resources be the same all the time and let the
resource-loader take care of conversion. There is a comment in a
header about byte-swapping and wrc. I really would prefer to have
byte-swapping in wine rather than wrc. Mainly because wine already
requires to do the analysis of resource-contents, whereas wrc only
packs data (without contextual/semantical knowledge).
Bertho asked for feedback and also experiences natively running Wine
on a big-endian CPU.
Both Alexandre Julliard and Ulrich Weigand answered that all current
NT versions run on little-endian only systems, so this question
doesn't seem to have been addressed (it still remains open on Windows
CE). Alexandre even made some sarcasm:The Windows
headers contain a few #ifdef _MAC that attempt to add big-endian
support (apparently using a generic #ifdef BIG_ENDIAN was
a concept a bit too abstract for Microsoft)
Ulrich went a bit further:
I agree, resources should always be treated little-endian.
At the most, we might think about making a distinction between the
resource data itself and the 'meta data' surrounding it (resource
directory, PE header links ...); it might be easier to have the latter
in native byte ordering, especially in the case of the dummy PE headers
created for Wine modules (these structures are completely internal
between wrc and the Wine loader, so we can use whatever is easiest
here, of course).
Every 'external' format, be it .RES file or cursors/icons/etc. imported
by or included in RC files, should IMO always be little-endian. The
same applies to the raw resource data exchanged between app and Wine,
e.g. when using a Create...Indirect routine.
Ulrich gave also some feedback on his successful trials to run
'hello3' on Solaris (32 bit big endian) (even if he never sent the
patches, because he never finished the clean up):
I decided to have resource contents in little-endian, and meta data
(resource directory) in native big-endian format, as this seemed to be
the solution requiring the fewest Wine changes. The changes described
in the following achieved this result.
Major changes include reading and writing meta-data in wrc (doing some
swapping when needed), as well as modifying reading of resources in
Wine (same type of swapping). Ulrich also pointed out some less
obvious modification to be made: another problem is
in the handling of Unicode strings: wide characters are also
endianness-sensitive, of course, so a simple lstrcpyWtoA doesn't do
the right thing...
and pe_resource.c routines
don't work, as they rely on various bit-field structures to break out
the 'resource name is string' and 'resource data is directory'
bits. This doesn't work, as on Sparc bit-fields are allocated starting
from the MSB down, not LSB up as on Intel :-/
Finally, Bertho announced he shall be sending a new wrc version later
this week.2.
Wine's license
People:
Alexandre Julliard,
After the previous events (see "shall we
change?" and "vote for a
change!") episodes), Alexandre Julliard changed the Wine
license for the X11 one. Here's the terms of the new license:
Copyright (c) 1993-2000 the Wine project authors (see the file AUTHORS for a
complete list)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to
deal in the Software without restriction, including without limitation the
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
3.
Ansi and Unicode
Archive Link: "ANSI/Unicode"
People:
Ove Kaaven, Dmitry Timoshkov, Dimitrie Paun, Alexandre Julliard, , Ove Kåven, Patrik Stridvall
Dimitrie Paun was kind of unhappy with Wine's current string
support. As you may already know, most of 32 APIs come into two
flavors: ANSI and Unicode. API suffixed with 'A' are ANSI, and the
ones with 'W' are Unicode. Being ANSI (resp. Unicode) express how the
function must handle any string input or output parameter. So, the
same function, say CreateWindow, come in two flavors CreateWindowA and
CreateWindowW.
Microsoft uses the same convention (a #define UNICODE triggers the
Unicode mode at compile time).
ANSI means a one byte per character coding, whereas Unicode implies
several bytes (at least two, but some are escapes to longer
sequences). Even if Unicode consumes more memory, it also allows to
store strings for various languages: most of non textual languages
(Japanese, at least in Kanji or Chinese, most of cyrillic alphabets,
as Russian... but also some other European languages, with specific
diacritics).
Ove Kåven gave an overview of the different encodings:
- ASCII: 7-bit, one byte per character
- ISO 8859 encodings, ordinary SBCS codepages: 8-bit (often extended
ASCII), one byte per character. (Note: All the ISO Latin
1,2.... follow this scheme)
- Asian languages, DBCS codepages: 8-bit; either one or two bytes
per character (if the first byte is a "lead byte", it's a two-byte
character).
- UTF-16: Unicode encoding, two bytes per character (preferably
big-endian but I doubt MS cares). May employ surrogate pairs (two
UTF16 characters in reserved ranges) to encode Unicode characters
beyond the first 64K; the surrogate pairs allow access to 1M more
characters (may be necessary for very exotic Asian languages, but no
such characters are defined yet).
- UCS2: Unicode encoding, two bytes per character, but not
surrogate pairs.
- UCS4: Unicode encoding, four bytes per character, easily and
conveniently encodes the full Unicode set. This is what GNU systems
prefer, since they don't want to deal with surrogate pairs.
- UTF-32: Same as UCS4, just defined by different organizations
(UCS4 is ISO, UTF32 is Unicode Consortium, plus the added restriction
of that no more than 64K+1M different characters may exist in UTF32).
- UTF-8 (UTF-FSS): Unicode encoding useful for compatibility with
software written for 8-bit C strings. Variable-width (between 1 and 6
bytes per character). Lower 128 characters are encoded as plain ASCII.
- UTF-7: Unicode encoding for compatibility with software written
for 7-bit characters (email, news, etc). A hybrid of Base64 and
Quoted-Printable.
In the rest of this article, W will refer to UTF-16 strings or
functions, and U to UTF-8 strings or functions.
Currently, as Dimitrie points out, most of the Wine code is poorly
written with regard to Unicode: most of the W functions convert the
string into an ANSI one, and then call the A function, implying a loss
of information, and some potential bugs.
Dimitrie proposed to change Wine's style for coding by providing a
unique function (let's say suffixed by 'X') which would be the work
horse for both A and W functions.
Dmitry Timoshkov didn't like this proposal, and rather suggested to
Wine should have only one functional implementation
indeed. I think, it should be implemented like in NT: all actual work
does Unicode version, ANSI version simply converts ANSI to Unicode and
then calls Unicode work horse. But this transition will consume a lot
of time and efforts.
Dimitrie Paun went further with:
Somehow, I don't think working with W is the right thing to do in
Unix. We have the following situation: we receive strings as
arguments; their encoding is not explicit with every string, but
rather is implicit by the entry point. Now, we can do two things:
- [eager] convert at the entry point in one common format, and
carry on in with one internally with that format
- [lazy] remember the encoding that the strings are in, and pass
that around until we actually need a specific encoding
Anyway, I like 2 better than 1. Not committing to an encoding early in
the game is good -- sometimes we need UTF8 (file systems, X), in other
cases we need UTF16 (pure Win stuff). Moreover, the thing is scalable
-- if another encoding comes along, we could easily support it. And,
on top of it all, it should be more efficient.
With lots of discussions and contributions from many people, the
following table has been built:
|
Description |
Pros |
Cons |
| 1 |
W->A conversion, work internally with A |
- best option for debugging
- fast for A (common
case today)
- use std. Win API
|
- we do NOT support Unicode, we just pretend we do(1)
- a
lot of work, a lot clutter, close to no gain.
- inefficient for
the W case
|
| 2 |
A->W conversion, work internally with W |
- full Unicode support
- fast for W
- use of std. Win
API
- part of Wine is already written this way
|
- a lot of clutter
- very inefficient in the A case
(A->W->U usually)(2)
|
| 3 |
A,W call onto a X function which carries the encoding
around |
- full Unicode support
- as fast as 1 for A, and as 2
for W (for common code path like display)
- support for new
encodings is trivial
- not much worse than 2 for
debugging
- maybe a bit less clutter than in 1 or 2
(debatable)
- easy transition from what we have to this
|
- use of non std. Win API: this doesn't work across DLLs
(would require new APIs)
- it is not used in Wine
currently
- test coverage of all possible paths can be
huge
|
| 4 |
Write all functions independent of the encoding and
recompile to get all encodings (same .c file would generate .Ansi.o,
.w.o object files |
- fastest option for A, W
- easy to support future
encodings
- use of std. API
- less clutter (in theory)
|
- huge bloat
- it is not used in Wine
currently
- (maybe) difficult transition path
|
Notes:
- Patrik Stridvall modified his winapi_check tool to list the
cases where W->A conversion was used. At least 172 suspect
functions have been reported.
- Alexandre Julliard pointed out that
converting
A->W->U for file I/O may seem wasteful but it isn't really since
we need to support code pages; you can only do A->U directly for
7-bit ascii which is not enough. And supporting code pages without the
Unicode step means N^2 conversion tables instead of 2*N (where N is
the number of code pages).
Since Alexandre's preferred approach is #2, it was the chosen
one. However, lots of arguments, mainly between Dimitrie Paun and
Patrik Stridvall flooded wine-devel to such an extent that some
readers thought they were reading linux-kernel mailing list.
Patrik also proposed to automate some of the A->W or W->A
conversions so that stubs for some functions could be generated from
the .spec file. This didn't work out as, because there are different
options to take care of:
- strings can be input, output, or input/output string
- being a NULL string can be an error or a normal parameter
- string can be 0 terminated, of fixed length...
- in some cases (like resources), strings represent IDs (if
HiWord is 0)
Semantics seemed too complex to really provide a robust framework.
As a conclusion, Wine internal string encoding shall (slowly) shift
from Ansi to Unicode (UTF-16).
Sharon And Joy