Re: [css-fonts-3] i18n-ISSUE-296: Usable characters in unicode-range from Martin J. Dürst on 2013-09-15 (www-international@w3.org from July to September 2013)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Sun, 15 Sep 2013 20:28:23 +0900
To: Anne van Kesteren <annevk@annevk.nl>
CC: Jonathan Kew <jfkthame@googlemail.com>, John Daggett <jdaggett@mozilla.com>, Addison Phillips <addison@lab126.com>, Richard Ishida <ishida@w3.org>, W3C Style <www-style@w3.org>, www International <www-international@w3.org>
Message-ID: <523599D7.9060003@it.aoyama.ac.jp>

On 2013/09/13 20:22, Anne van Kesteren wrote:
> On Fri, Sep 13, 2013 at 11:33 AM, Jonathan Kew<jfkthame@googlemail.com>  wrote:
>> This is a tricky issue, IMO. What would it mean for the rendering subsystem
>> to "treat lone surrogates as errors", exactly?
>
> Basically to treat them as if U+FFFD was passed. That's how we deal
> with them in the encoding layer and in character references and such.

So that would mean that all lone surrogates render the same, and the 
same as some other stuff? I think it would be better to show what's 
there, because that may help in debugging.

> I guess my point of view is that I'd rather not have 16-bit code units
> leak through to places that could do without. It's a fair argument
> though. I guess the flipside would be to embrace the 16-bit code unit
> nature of the web and just define everything in terms of that.

I think you could flip things and describe everything in terms of 16-bit 
units, but then you would have to describe what has to happen with 
certain specific 16-bit units (e.g. lone surrogates) and specific 
combinations of them (i.e. surrogate pairs). In the end, the spec may 
describe the same behavior (or not). It's probably not worth loosing 
time rewriting everything, better concentrate on specifying actual behavior.

>> However, all this is straying rather far from the specific issue of
>> unicode-range, for which I suggest that surrogate codepoints are simply
>> irrelevant, as they should not go through font-matching as individual
>> codepoints at all.

Yes indeed.

> Well, if you argue we want to render lone surrogates, I would argue it
> makes sense to design a different font for them too. I'm not entirely
> convinced we want to render them though.

Different font, yes. Not necessarily much to design, looking ugly will 
be a feature in this case, and the hexbox font should already be 
available (you only need about 16 small glyphs for that, maybe 17 if you 
design the '10' of the last plane as a single glyph).

And for lone surrogates, maybe use a specially ugly color for the 
hexboxes, to show clearly that there is a problem with the data and not 
with the rendering (i.e. not a missing font problem). The 'ugly color' 
may go too far, but showing that there is something weird going on, and 
showing the details of it, is very helpful. I'd sure rather have a 
screen shot with some hex boxes rather than just a screen shot with 
U+FFFD glyphs when asked for help. Ideally, the page creator would be 
able to catch such errors before they hit the Internet at large.

Regards,   Martin.

Received on Sunday, 15 September 2013 11:29:24 UTC