- From: Anne van Kesteren <annevk@annevk.nl>
- Date: Fri, 13 Sep 2013 12:22:51 +0100
- To: Jonathan Kew <jfkthame@googlemail.com>
- Cc: John Daggett <jdaggett@mozilla.com>, Addison Phillips <addison@lab126.com>, Richard Ishida <ishida@w3.org>, W3C Style <www-style@w3.org>, www International <www-international@w3.org>
On Fri, Sep 13, 2013 at 11:33 AM, Jonathan Kew <jfkthame@googlemail.com> wrote: > This is a tricky issue, IMO. What would it mean for the rendering subsystem > to "treat lone surrogates as errors", exactly? Basically to treat them as if U+FFFD was passed. That's how we deal with them in the encoding layer and in character references and such. > We don't want the presence of > a lone surrogate to cause the rendering system to bail and refuse to render > the remainder of the text run, for example. Nor do we want the lone > surrogate to be completely ignored; its presence in the data is liable to > interfere with other processes, so it's useful for the user to be able to > see that there's *something* there. Agreed. > Rendering as U+FFFD might be an option, but IMO rendering as a hexbox is > actually better. Note that because JS can manipulate the text in terms of > UTF-16 code units (NOT characters), it is possible for it to "reassemble" > two separate isolated surrogates into a single valid Unicode character; so > the fact that the isolated surrogates still retain their distinct > identities, rather than all appearing as U+FFFD glyphs, makes it easier to > understand what is happening in such cases. If all isolated surrogates are > rendered indistinguishably, then the behavior whereby bringing two of them > into contact "magically" produces a valid Unicode character - but which > particular one is impossible to predict from what was displayed - seems far > more mysterious. I guess my point of view is that I'd rather not have 16-bit code units leak through to places that could do without. It's a fair argument though. I guess the flipside would be to embrace the 16-bit code unit nature of the web and just define everything in terms of that. > However, all this is straying rather far from the specific issue of > unicode-range, for which I suggest that surrogate codepoints are simply > irrelevant, as they should not go through font-matching as individual > codepoints at all. Well, if you argue we want to render lone surrogates, I would argue it makes sense to design a different font for them too. I'm not entirely convinced we want to render them though. -- http://annevankesteren.nl/
Received on Friday, 13 September 2013 11:23:19 UTC