RE: several messages about handling encodings in HTML

> Section "8.2 Parsing HTML documents" is indeed exclusively 
> for user agent implementors and conformance checker
> implementors. For authors and authoring tool implementors,
> you want section "8.1 Writing HTML documents" and section
> "3.7.5.4. Specifying the document's character encoding"
> (which is linked to from 8.1). These give the flipside of
> these requirements, the authoring side.

* Section 8.1 says that any document may start with a BOM. However, some encodings do not allow a BOM at the beginning (UTF-16BE/UTF-16LE). And, obviously, some encodings cannot encode the BOM. The statement should be changed to say that the BOM is only allowed if the encoding allows it.

* 3.7.5.4 (The META element) is the not correct place to define encoding requirements for authors. It is counter-intuitive to have to look in the definition of the META element to find out that you can use the BOM or the Content-Type header to specify the encoding. The encoding requirements should be in section 8, and it should be emphasized that the encoding should be given in the Content-Type ("transport layer") whenever possible. The fact that the encoding is determined based on Content-Type, then the BOM, the XML declaration, then <META> is relavant for content authors as well as parser implementers.

- Brian

Received on Friday, 29 February 2008 15:01:11 UTC