[Bug 11298] Surrogate catching doesn't belong in input stream preprocessing from bugzilla@jessica.w3.org on 2010-12-29 (public-html-bugzilla@w3.org from December 2010)

From: <bugzilla@jessica.w3.org>
Date: Wed, 29 Dec 2010 08:50:31 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1PXrjr-00070x-Cw@jessica.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=11298

Ian 'Hixie' Hickson <ian@hixie.ch> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ian@hixie.ch

--- Comment #1 from Ian 'Hixie' Hickson <ian@hixie.ch> 2010-12-29 08:50:31 UTC ---
The way it's specced is intentional, so as to make &#xD800; and a raw UTF-8
0xD800 be treated the same.

As I understand it, if you use document.write(), you're using UTF-16, and thus
you can't pass in a lone surrogate that is treated as a Unicode codepoint � it
would have to be UTF-16-decoded first, and there's no way for UTF-16 to
represent lone surrogates.

I guess we could change this, though, so that instead of being handled in the
HTML parser, it's handled in the "decode a byte string as UTF-8, with error
handling" algorithm. Not sure what we'd say for UTF-16 or where we'd say it,
exactly.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Wednesday, 29 December 2010 08:50:33 UTC