Re: [css-fonts-3] i18n-ISSUE-296: Usable characters in unicode-range from Anne van Kesteren on 2013-09-13 (www-international@w3.org from July to September 2013)

From: Anne van Kesteren <annevk@annevk.nl>
Date: Fri, 13 Sep 2013 10:01:52 +0100
To: Jonathan Kew <jfkthame@googlemail.com>
Cc: John Daggett <jdaggett@mozilla.com>, Addison Phillips <addison@lab126.com>, Richard Ishida <ishida@w3.org>, W3C Style <www-style@w3.org>, www International <www-international@w3.org>
Message-ID: <CADnb78hnJQHAJFu2=0E-zfAQMJdTGF__g7J2prbftpQVW2LZJw@mail.gmail.com>

On Fri, Sep 13, 2013 at 9:46 AM, Jonathan Kew <jfkthame@googlemail.com> wrote:
> Given that it's possible for script to insert lone surrogates into the DOM,
> I think we have to "render" them in some way - though simply rendering a
> hexbox, a "broken character" graphic, or perhaps U+FFFD, would be
> sufficient; there's no need to even attempt font matching as though they
> were actual characters.

So Gecko doesn't do U+FFFD, but I would prefer it if we did. In
particular, if the rendering subsystem would only operate on Unicode
scalar values and treat lone surrogates as errors, I think that'd be
an improvement.

> Unicode-range, OTOH, is expressing a range of Unicode scalar values for
> which the font should be considered in the font matching process. A font
> whose unicode-range is U+0-FFFF, for example, covers the lone-surrogate
> values, but as font selection operates on characters (and clusters), not on
> code units, they'd never actually be used.

It seems weird to say it expresses a range of Unicode scalar values
and then include U+D800 to U+DFFF in that range. And let's not use
"characters" as that's a confusing term. Saying that the range is in
code points but U+D800 to U+DFFF are ignored (rather than treated as
an error) could make sense.

-- 
http://annevankesteren.nl/

Received on Friday, 13 September 2013 09:02:22 UTC