Re: Charset tutorial: updated from Bjoern Hoehrmann on 2004-02-04 (public-i18n-geo@w3.org from February 2004)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Wed, 04 Feb 2004 05:04:07 +0100
To: "Richard Ishida" <ishida@w3.org>
Cc: "GEO" <public-i18n-geo@w3.org>
Message-ID: <402d68c2.43625420@smtp.bjoern.hoehrmann.de>

* Richard Ishida wrote:
>I added a first draft of a final section to the tutorial at
>http://www.w3.org/International/tutorials/tutorial-char-enc.html
>this afternoon.

[...]
   In the case of conflict between multiple encoding declarations,
   precedence rules apply to determine which declaration wins out. For
   XHTML and HTML, the precedence is as follows, with 1 being the
   highest:

    1. HTTP Content-Type
    2. XML declaration
    3. meta charset declaration
    4. link charset attribute
[...]

The XML declaration is just a processing instruction for HTML user
agents and gets thus ignored (if you are lucky); also for XHTML
documents delivered as text/html user agent behaivour variies (no
surprise since the specifications to not deal with it), e.g., the W3C
MarkUp Validator reads the <meta> information while the W3C CSS
Validator does not.

Where is the BOM? HTML 4.01 does not mention the BOM to determine the
character encoding of the document, neither does CSS 2.0... If user
agents are somehow expected to use the BOM to determine the character
encoding of the document, it should be listed here.

I think this should be split into three parts, XML (XHTML, SVG, ...),
HTML/XHTML (text/html) and CSS as they have different rules and user
agent behaivour varies.

[...]
   The escape mechanism for representing characters in CSS is a
   backslash followed by a hexadecimal number representing the scalar
   value. Note that these escapes are terminated by a space, rather
   than a semi-colon. The CSS escape for á is \E1.
[...]

Or they are not terminated at all (or implicitly), e.g. Bj\F6rn or
M\0000F6bel as opposed to M\F6bel (not "Möbel" but M U+F6BE l).

Received on Tuesday, 3 February 2004 23:04:27 UTC