- From: <bugzilla@jessica.w3.org>
- Date: Fri, 28 Mar 2014 12:01:05 +0000
- To: www-international@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=24104
Anne <annevk@annevk.nl> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bzbarsky@mit.edu,
| |hsivonen@hsivonen.fi,
| |simon.sapin@exyr.org
--- Comment #3 from Anne <annevk@annevk.nl> ---
I analyzed too quickly. In Gecko and Chrome is either lone surrogates never
reach the utf-8 encoder (replaced by U+FFFD before) or are replaced as part of
the encoder. They do not result in an error as that would cause something in
the form of &#...; to be emitted rather than a straight U+FFFD.
Boris, Henri, Simon, do you have any preferences how we arrange the encoder
setup? Should all encoders replace lone surrogates in the input stream with
U+FFFD or should we make encoders only take Unicode scalar values and let a
layer before handle the lone surrogates?
It seems more pragmatic to have encoders take code points. Maybe I should
introduce a special lone surrogate error that does the replacing to U+FFFD?
--
You are receiving this mail because:
You are on the CC list for the bug.
Received on Friday, 28 March 2014 12:01:08 UTC