- From: (unknown charset) Etan Wexler <ewexler@stickdog.com>
- Date: Sat, 6 Dec 2003 19:08:13 -0800
- To: (unknown charset) Chris Lilley <chris@w3.org>, www-international@w3.org, w3c-css-wg@w3.org, www-style@w3.org
Chris Lilley wrote to <mailto:www-international@w3.org>, <mailto:w3c-css-wg@w3.org>, <mailto:w3c-i18n-ig@w3.org>, and <mailto:www-style@w3.org> on 6 December 2003 in "Re: UTF-8 signature / BOM in CSS" (<mid:862788409.20031206164822@w3.org>): > EW> I assumed that the CSS engine would make use of out-of-band > information > EW> to indicate the detected encoding scheme. > > Please check the definition of that out of band information [Þ in] > particular what it says about when a BOM must be present. Perhaps I was unclear. I did not mean that the CSS engine would propagate the "charset" parameter's value unmodified. What I had in mind is as follows. The CSS engine retrieves a style sheet. It could be from HTTP, the local file system, FTP, SMTP + MIME, a database, or any source, really. The CSS engine detects an encoding scheme according to the prescribed or accepted best practice. Factors that could determine the detection include a "charset" parameter, a byte-order mark (U+FEFF), a database schema, a file name extension, and the native byte order of the local machine. Once the encoding scheme is detected, it is noted for further use. The encoding scheme will never be noted as UTF-16 or UTF-32. There is no encoding scheme UTF-16. There is a "charset" value in the IANA registry called "UTF-16", but UTF-16 is an encoding form. Any serialized UTF-16 document is either big-endian or little-endian. Nevertheless, the "UTF-16" label is allowed and in use, so we resort to the BOM to disambiguate the label's meaning. The situation with UTF-32 is analogous. We return to our CSS scenario. The encoding scheme has been detected and noted, including any significant endian-ness. No BOM is necessary for the tokenizer. The BOM is stripped from the internal representation of the style sheet. The remaining byte stream moves along to the tokenizer. The tokenizer consults the noted encoding scheme in order to properly interpret the bytes. Parsing proceeds on course. What did I miss or misunderstand? -- Etan Wexler. Û
Received on Saturday, 6 December 2003 22:07:29 UTC