W3C home > Mailing lists > Public > www-style@w3.org > September 2013

Re: [css-fonts-3] i18n-ISSUE-296: Usable characters in unicode-range

From: Anne van Kesteren <annevk@annevk.nl>
Date: Tue, 17 Sep 2013 08:20:34 -0400
Message-ID: <CADnb78gvT4zKGea0z+rDkpDwrMRqDgcoX_-mgvbx+5Wk6v9D9g@mail.gmail.com>
To: John Daggett <jdaggett@mozilla.com>
Cc: Addison Phillips <addison@lab126.com>, Richard Ishida <ishida@w3.org>, W3C Style <www-style@w3.org>, www International <www-international@w3.org>
On Mon, Sep 16, 2013 at 9:55 PM, John Daggett <jdaggett@mozilla.com> wrote:
> In particular, I think Anne's point about surrogate handling [1] is
> completely orthogonal to the behavior of unicode-range:
>
>> It seems weird to say it expresses a range of Unicode scalar values
>> and then include U+D800 to U+DFFF in that range. And let's not use
>> "characters" as that's a confusing term. Saying that the range is in
>> code points but U+D800 to U+DFFF are ignored (rather than treated as
>> an error) could make sense.
>
> Non-Unicode encoding and surrogate handling issues are dealt with in
> levels above the level where font matching occurs.  If you look
> carefully at the description of font matching, the range of codepoints
> defined by the 'unicode-range' descriptor is intersected with the
> underlying character map of the font.  *That* is what defines the
> exact set of codepoints that are matched as part of the font matching
> algorithm. Given that no font ever includes mappings for surrogate
> codepoints to glyphs and no layout engine ever treats lone surrogates
> as individual codepoints, I don't see the need to adjust the
> definition of 'unicode-range'.  Invalid codepoints like this will
> naturally be ignored given the existing definition of font matching.

My point may be orthogonal, or not, but your terminology confusion is
not helping. A surrogate code point is not an invalid code point, it's
perfectly valid. It's just not a Unicode scalar value and not part of
the value space of utf-8 or utf-16.

I'm fine with describing unicode-range in terms of code points and
matching against a font's code points, given that nobody really seems
to know what the value space of the latter is. (We should define that
though, some day. What a font on the platform actually is at an
abstract level. And how the various formats map to it, how CSS uses
it, etc.)


> [1] http://lists.w3.org/Archives/Public/www-style/2013Sep/0318.html


-- 
http://annevankesteren.nl/
Received on Tuesday, 17 September 2013 12:21:05 UTC

This archive was generated by hypermail 2.4.0 : Friday, 25 March 2022 10:08:34 UTC