Re: german css validation message - umlauts are not encoded

On Thu, 16 Jun 2005, Georg Gell wrote:

> when i validate a css document successfully, i receive a localized
> (german) success message.

Exactly what did you do? When I tried setting my browser's language
preferences to prefer German over English, I get the page
http://jigsaw.w3.org/css-validator/validator-uri.html
in German. But after submitting a style sheet for validation,
I get a response page in English (whether there are errors or not).

This happened when my language preferences were Finnish, German, English
in that order (on IE 6, which assigns different q values to them by the
order). When I moved German the first, I get the response page in German.

Thus, the language negotiation does not quite work. It should of course
not default to English just because the _first_ language in the
preferences isn't one that the service supports; it should select the one
with the highest q value.

> But umlauts are not displayed correctly. I
> think either the umlauts are not encoded like ü or the character
> encoding is missing. That's uncool for w3.org ;)

I see u umlauts, as well as o umlaut and sharp s (es zed), as question
marks on IE 6, which reports the encoding as West European (Windows),
i.e. as windows-1252. Doesn't look good, does it? The source of the report
page begins with
<?xml version='1.0' encoding='iso-8859-1'?>
(thereby throwing IE 6 into quirks mode, but I digress)
but does not seem to contain umlaut letters etc. in ISO-8859-1 encoding.
This looks very much like a bug. The actual HTTP headers seem to contain
just
Content-Type: text/html
(I think the W3C should show good example and specify the encoding in a
charset parameter in HTTP headers.)

The results are the same on Mozilla Firefox.

Setting the encoding manually to e.g. UTF-8 in a browser does _not_ help.
It seems that the question marks appearing in place of umlaut letters are
really question mark characters, not just browsers' indications of
undisplayable character. Thus, it seems that the routine that generates
the response page tries to present data in ISO-8859-1 but fails to do
that; I would guess that the German texts come from a file or database
where they appear as UTF-8 encoded, but the conversion destroys them.

The French version works: accented letters look OK. (The French result
pages seem to be UTF-8 encoded, for no apparent reason, since the accented
letters are represented using entities, e.g. &eacute;.)

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Friday, 17 June 2005 11:24:26 UTC