- From: Edward Z. Yang <edwardzyang@thewritingpot.com>
- Date: Sun, 21 Dec 2008 11:35:53 -0500
Ian Hickson wrote: > Yes. (At least, that's the intent; if you find anything that contradicts > that, please let me know.) Great. I'll be sure to ping you if I find out otherwise. > Looking just at parsing, yes, probably... I suppose the big pivot point is "as if". A byte-wise implementation would replace character globally with byte, and any U+xxxx designation with the UTF-8 encoded byte version. HTML 5 dictates end behavior, not the actual algorithm implementation, no? > But an HTML5 implementation, > according to the spec, must at a minimum support the UTF-8 and > Windows-1252 encodings, so the overall implementation might not depending > on exactly how this is done. The plan is to convert Windows-1252 into UTF-8 before processing; with a reasonably good iconv implementation, support for lots of encodings is possible. The implementation might not be fully conforming if iconv doesn't perform the proper (possibly context-sensitive; I haven't checked) substitution when it doesn't recognize a character, but it should be close.
Received on Sunday, 21 December 2008 08:35:53 UTC