- From: <bugzilla@jessica.w3.org>
- Date: Tue, 22 May 2012 17:23:16 +0000
- To: public-html-bugzilla@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=17151
Geoffrey Sneddon <geoffers+w3cbugs@gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |geoffers+w3cbugs@gmail.com
--- Comment #1 from Geoffrey Sneddon <geoffers+w3cbugs@gmail.com> 2012-05-22 17:23:15 UTC ---
This pertains to the following:
> Bytes or sequences of bytes in the original byte stream that could not be converted to Unicode code points must be converted to U+FFFD REPLACEMENT CHARACTERs. Specifically, if the encoding is UTF-8, the bytes must be decoded with the error handling defined in this specification.
> Note: Bytes or sequences of bytes in the original byte stream that did not conform to the encoding specification (e.g. invalid UTF-8 byte sequences in a UTF-8 input byte stream) are errors that conformance checkers are expected to report.
"\xD8\x00" can obviously be decoded as if it were UTF-16BE to HTML5's
definition of a "Unicode code point" (which include lone surrogates), but
according to the Unicode specification it is an invalid UTF-16 code unit
sequence.
It would seem preferable that lone surrogates get converted to U+FFFD as they
currently are in Opera/Firefox.
--
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Tuesday, 22 May 2012 17:23:41 UTC