- From: <bugzilla@jessica.w3.org>
- Date: Thu, 11 Nov 2010 11:55:58 +0000
- To: public-html@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11298 Summary: Surrogate catching doesn't belong in input stream preprocessing Product: HTML WG Version: unspecified Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: HTML5 spec (editor: Ian Hickson) AssignedTo: ian@hixie.ch ReportedBy: hsivonen@iki.fi QAContact: public-html-bugzilla@w3.org CC: mike@w3.org, public-html-wg-issue-tracking@w3.org, public-html@w3.org The spec says: "Code points in the range U+D800 to U+DFFF in the input must be replaced by U+FFFD REPLACEMENT CHARACTERs." This doesn't really belong in the parser, since document.write()-inserted UTF-16 text should not be subject to lone surrogate replacement since it would add complexity without a backwards compatibility need. Instead, the spec should have a note saying character decoders for UTF-8, UTF-16 and similar (GB18030 maybe?) are required to emit U+FFFD for bogus byte sequences and sequences decoding to surrogates in UTF-8 or lone surrogates in UTF-16 are bogus. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
Received on Thursday, 11 November 2010 11:56:00 UTC