[Bug 9659] Initial U+0000 should not set frameset-ok to "not ok" from bugzilla@jessica.w3.org on 2010-09-10 (public-html-bugzilla@w3.org from September 2010)

From: <bugzilla@jessica.w3.org>
Date: Fri, 10 Sep 2010 11:34:19 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1Ou1s3-0003s0-3h@jessica.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=9659


Henri Sivonen <hsivonen@iki.fi> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |




--- Comment #6 from Henri Sivonen <hsivonen@iki.fi>  2010-09-10 11:34:18 ---
(In reply to comment #4)
> Rationale: Concurred with reporter's comments. However, I made a much simpler
> change — I just made U+FFFD not change the frameset-ok flag.

Doing it this way (as opposed to doing what I suggested) causes two non-browser
problems:
 1) There are now parser-sensitive characters that aren't in the Basic Latin
range. This sucks for implementation that use UTF-8 internally.
 2) Implementations that perform Infoset coercion and map XML-unsafe non-space
characters to the REPLACEMENT CHARACTER can no longer do so efficiently in the
tokenizer but would have to re-examine the data in the tree builder.

Since off-the-shelf encoding decoders don't map U+0000 to U+FFFD, it's
reasonable to do that mapping in the tokenizer instead of doing two passes over
the data: first input stream preprocessing and then tokenization. If you do a
single passe, it's not a problem to have a special token for U+0000 that gets
mapped to U+FFFD or discarded by the tree builder as appropriate.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Friday, 10 September 2010 11:34:21 UTC