- From: <bugzilla@jessica.w3.org>
- Date: Thu, 11 Nov 2010 11:55:58 +0000
- To: public-html@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11298
Summary: Surrogate catching doesn't belong in input stream
preprocessing
Product: HTML WG
Version: unspecified
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: HTML5 spec (editor: Ian Hickson)
AssignedTo: ian@hixie.ch
ReportedBy: hsivonen@iki.fi
QAContact: public-html-bugzilla@w3.org
CC: mike@w3.org, public-html-wg-issue-tracking@w3.org,
public-html@w3.org
The spec says:
"Code points in the range U+D800 to U+DFFF in the input must be replaced by
U+FFFD REPLACEMENT CHARACTERs."
This doesn't really belong in the parser, since document.write()-inserted
UTF-16 text should not be subject to lone surrogate replacement since it would
add complexity without a backwards compatibility need.
Instead, the spec should have a note saying character decoders for UTF-8,
UTF-16 and similar (GB18030 maybe?) are required to emit U+FFFD for bogus byte
sequences and sequences decoding to surrogates in UTF-8 or lone surrogates in
UTF-16 are bogus.
--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Received on Thursday, 11 November 2010 11:56:00 UTC