Re: [css-fonts-3] i18n-ISSUE-296: Usable characters in unicode-range from John Daggett on 2013-09-13 (www-international@w3.org from July to September 2013)

From: John Daggett <jdaggett@mozilla.com>
Date: Thu, 12 Sep 2013 21:46:58 -0700 (PDT)
To: Addison Phillips <addison@lab126.com>
Cc: Richard Ishida <ishida@w3.org>, W3C Style <www-style@w3.org>, www International <www-international@w3.org>
Message-ID: <1873261001.7979575.1379047618131.JavaMail.zimbra@mozilla.com>

Addison Phillips wrote:

>>> 4.5. Character range: the unicode-range descriptor
>>> http://www.w3.org/TR/2013/WD-css-fonts-3-20130711/#unicode-range-desc
>>>
>>> "Valid Unicode codepoint values vary between 0 and 10FFFF
>>> inclusive." Do we need to say something about characters that
>>> cannot be used, such as surrogate codepoints?
>>>
>>> Perhaps what is meant is that the codepoint values cannot be
>>> higher than 10FFFF or lower than 0. In this case, perhaps the spec
>>> should say that the codepoint space (range) is between 0 and
>>> 10FFFF, rather than give the impression that all values in that
>>> space are acceptable.
>> 
>> Hmm, unicode ranges are used to indicate *possible* coverage ranges
>> for fonts. The actual range used in font matching is ultimately
>> determined by the intersection of the unicode-range descriptor
>> value with the actual character map of the font.  There's no
>> attempt to separate actual "valid" Unicode values from ones that
>> are invalid.  I don't think I see a need here to discuss the nitty
>> gritty of surrogate handling.
> 
> I don't think that's really the point though. We read this section
> in the WG call this morning. The text you have got is a little
> sloppy with the word "valid". The range of Unicode code points is,
> indeed, "valid" between 0 and 0x10FFFF, but not all of those code
> points are "valid" characters. We don't really want you to discuss
> the nitty gritty of surrogates and non-character code points. But
> the idea is that maybe you should say instead: "Unicode code points
> range between 0 and 0x10FFFF inclusive" avoiding the problematic
> word "valid"

Hmmm.  "Valid Unicode codepoint" seems fine to me, it's talking about the
codepoint, not whether there's a character represented by that or not.
But I'm not going to quibble, I've updated the spec to remove the term.

Cheers,

John Daggett

Received on Friday, 13 September 2013 04:47:30 UTC