- From: <bugzilla@jessica.w3.org>
- Date: Tue, 22 May 2012 17:23:16 +0000
- To: public-html-bugzilla@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=17151 Geoffrey Sneddon <geoffers+w3cbugs@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |geoffers+w3cbugs@gmail.com --- Comment #1 from Geoffrey Sneddon <geoffers+w3cbugs@gmail.com> 2012-05-22 17:23:15 UTC --- This pertains to the following: > Bytes or sequences of bytes in the original byte stream that could not be converted to Unicode code points must be converted to U+FFFD REPLACEMENT CHARACTERs. Specifically, if the encoding is UTF-8, the bytes must be decoded with the error handling defined in this specification. > Note: Bytes or sequences of bytes in the original byte stream that did not conform to the encoding specification (e.g. invalid UTF-8 byte sequences in a UTF-8 input byte stream) are errors that conformance checkers are expected to report. "\xD8\x00" can obviously be decoded as if it were UTF-16BE to HTML5's definition of a "Unicode code point" (which include lone surrogates), but according to the Unicode specification it is an invalid UTF-16 code unit sequence. It would seem preferable that lone surrogates get converted to U+FFFD as they currently are in Opera/Firefox. -- Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Tuesday, 22 May 2012 17:23:41 UTC