Re: [CSS21] response to issue 115 (and 44) from Bjoern Hoehrmann on 2004-02-21 (www-style@w3.org from February 2004)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sat, 21 Feb 2004 17:01:03 +0100
To: David Woolley <david@djwhome.demon.co.uk>
Cc: www-style@w3.org
Message-ID: <40377c52.771245282@smtp.bjoern.hoehrmann.de>

* David Woolley wrote:
>> determined as ISO-8859-1 (HTTP) or US-ASCII (MIME) (and in fact, a
>> processor that chooses to adhere to CSS must violate HTTP/MIME...)
>
>In reality, an unspecified character set in HTTP, for HTML, has meant
>Windows 1252 in the USA and Big5 in Taiwan, for a very long time (and
>even iso-8859-1 in a meta element can mean this, and in the Taiwan
>case, Windows-1252 in a meta element probably also has this meaning[1]).

Well, in my reality there are browsers which behave rather different
from what you describe, Internet Explorer for Windows for example, e.g.
http://schneegans.de/bugs/ie-utf-7/utf-7-test.htm is considered UTF-7
if Internet Explorer is configured to autodetect the encoding.

>> > 3) If neither the header nor looking for U+FEFF or @charset yield an
>> >    encoding, but this style sheet was loaded because a document

>Without this rule, the vast majority of documents that don't have
>naturally UTF-8 compatible style sheets will become invalid.

Which is a very good thing.

>No browser developer interested in a non-US market can sensibly reject
>such documents.

They do not have to.

>>       .björn { color: white }
>
>This case is resolved by assuming the same character set as the
>referring document; it doesn't actually matter if that character
>set is wrong, for fixed length code ASCII compatible character sets,
>and this case probably recovers even for UTF-8.

No, this depends on the encoding of the referring document, the encoding
the browser guessed for it and the actual encoding of the style sheet.
The processor probably does not know anything about fixed lengths or
ASCII compatibility.

Note that this example was a single style sheet with at least two rules,
attempts to recover from the encoding error (if any) might yield in a
different selector which causes the text to be unreadable.

>>       .bj\0000f6rn { background-color: black }
>
>This case was written by an I18N aware user, and there is relatively good
>chance that they did identify the character sets explicitly, although
>legacy considerations mean that they may still not have @charset and
>HTTP metadata ones, that the HTTP header doesn't specify it.

This might aswell have been an editor which had been told to save the
document as us-ascii for compatibility reasons but has later been edited
in a normal text editor without such compatibility consideration.

Received on Saturday, 21 February 2004 11:00:56 UTC