- From: <bugzilla@jessica.w3.org>
- Date: Fri, 28 Mar 2014 12:01:05 +0000
- To: www-international@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=24104 Anne <annevk@annevk.nl> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bzbarsky@mit.edu, | |hsivonen@hsivonen.fi, | |simon.sapin@exyr.org --- Comment #3 from Anne <annevk@annevk.nl> --- I analyzed too quickly. In Gecko and Chrome is either lone surrogates never reach the utf-8 encoder (replaced by U+FFFD before) or are replaced as part of the encoder. They do not result in an error as that would cause something in the form of &#...; to be emitted rather than a straight U+FFFD. Boris, Henri, Simon, do you have any preferences how we arrange the encoder setup? Should all encoders replace lone surrogates in the input stream with U+FFFD or should we make encoders only take Unicode scalar values and let a layer before handle the lone surrogates? It seems more pragmatic to have encoders take code points. Maybe I should introduce a special lone surrogate error that does the replacing to U+FFFD? -- You are receiving this mail because: You are on the CC list for the bug.
Received on Friday, 28 March 2014 12:01:08 UTC