- From: <bugzilla@jessica.w3.org>
- Date: Fri, 10 Sep 2010 11:34:19 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=9659 Henri Sivonen <hsivonen@iki.fi> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | --- Comment #6 from Henri Sivonen <hsivonen@iki.fi> 2010-09-10 11:34:18 --- (In reply to comment #4) > Rationale: Concurred with reporter's comments. However, I made a much simpler > change — I just made U+FFFD not change the frameset-ok flag. Doing it this way (as opposed to doing what I suggested) causes two non-browser problems: 1) There are now parser-sensitive characters that aren't in the Basic Latin range. This sucks for implementation that use UTF-8 internally. 2) Implementations that perform Infoset coercion and map XML-unsafe non-space characters to the REPLACEMENT CHARACTER can no longer do so efficiently in the tokenizer but would have to re-examine the data in the tree builder. Since off-the-shelf encoding decoders don't map U+0000 to U+FFFD, it's reasonable to do that mapping in the tokenizer instead of doing two passes over the data: first input stream preprocessing and then tokenization. If you do a single passe, it's not a problem to have a special token for U+0000 that gets mapped to U+FFFD or discarded by the tree builder as appropriate. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Friday, 10 September 2010 11:34:21 UTC