Re: [css-fonts-3] i18n-ISSUE-296: Usable characters in unicode-range

On Mon, Sep 16, 2013 at 9:55 PM, John Daggett <> wrote:
> In particular, I think Anne's point about surrogate handling [1] is
> completely orthogonal to the behavior of unicode-range:
>> It seems weird to say it expresses a range of Unicode scalar values
>> and then include U+D800 to U+DFFF in that range. And let's not use
>> "characters" as that's a confusing term. Saying that the range is in
>> code points but U+D800 to U+DFFF are ignored (rather than treated as
>> an error) could make sense.
> Non-Unicode encoding and surrogate handling issues are dealt with in
> levels above the level where font matching occurs.  If you look
> carefully at the description of font matching, the range of codepoints
> defined by the 'unicode-range' descriptor is intersected with the
> underlying character map of the font.  *That* is what defines the
> exact set of codepoints that are matched as part of the font matching
> algorithm. Given that no font ever includes mappings for surrogate
> codepoints to glyphs and no layout engine ever treats lone surrogates
> as individual codepoints, I don't see the need to adjust the
> definition of 'unicode-range'.  Invalid codepoints like this will
> naturally be ignored given the existing definition of font matching.

My point may be orthogonal, or not, but your terminology confusion is
not helping. A surrogate code point is not an invalid code point, it's
perfectly valid. It's just not a Unicode scalar value and not part of
the value space of utf-8 or utf-16.

I'm fine with describing unicode-range in terms of code points and
matching against a font's code points, given that nobody really seems
to know what the value space of the latter is. (We should define that
though, some day. What a font on the platform actually is at an
abstract level. And how the various formats map to it, how CSS uses
it, etc.)

> [1]


Received on Tuesday, 17 September 2013 12:21:09 UTC