- From: <bugzilla@jessica.w3.org>
- Date: Tue, 04 Jan 2011 09:09:47 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11298 --- Comment #2 from Henri Sivonen <hsivonen@iki.fi> 2011-01-04 09:09:46 UTC --- Considering established practice, the spec makes a conceptual error when it pretends that the parser operates on Unicode characters. In the real world, the parser (in applications that support document.write) operates on UTF-16 code units and document.write writes UTF-16 code units. If document.write writes unpaired surrogates, they pass through the parser unchanged and unpaired surrogates end up in the DOM. It's not worthwhile to prevent this as long as scripted DOM manipulation can put unpaired surrogates in the DOM. The conceptually realistic setup is thus: 1) The parser operates on UTF-16 code units. 2) The parser is responsible for munging U+0000 and carriage return. 3) The parser is *not* responsible for touching unpaired surrogates. 4) document.write writes UTF-16 code units (with potentially unpaired surrogates) 5) When the input is a byte stream, the process that converts input bytes into UTF-16 code units is responsible for replacing bogus byte sequences with U+FFFD. When the input byte stream is encoded in a flavor of UTF-16, unpaired surrogates constitute bogus byte sequences. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Tuesday, 4 January 2011 09:09:48 UTC