- From: François Yergeau <francois@yergeau.com>
- Date: Sun, 7 Dec 2003 20:53:47 -0500 (EST)
- To: Etan Wexler <ewexler@stickdog.com>
- Cc: www-international@w3.org, www-style@w3.org
Etan Wexler a écrit : > That given, > does my processing model make sense? I'll repeat for convenience: > > The encoding scheme has been detected and noted, including any > significant endian-ness. No BOM is necessary for the tokenizer. The BOM > is stripped from the internal representation of the style sheet. The > remaining byte stream moves along to the tokenizer. The tokenizer > consults the noted encoding scheme in order to properly interpret the > bytes. It does make sense, IMHO. It has the advantage of cleanliness that you point out (cleanliness from the PoV of the parser, which doesn't see a character that has become meaningless at this point). But it loses the advantage of making the BOM more explicit and therefore less mysterious. Stripping the BOM may also cause problems for processes that depend on the "file" remaining intact, such as digital signatures or more simply just character counting. Not an insurmountable obstacle of course, but an issue to keep in mind before giving too much weight to cleanliness for just one process. Regards, -- François
Received on Monday, 8 December 2003 05:15:26 UTC