Re: Comments on CSS3 Fonts module from Henri Sivonen on 2002-09-15 (www-style@w3.org from September 2002)

From: Henri Sivonen <hsivonen@niksula.hut.fi>
Date: Sun, 15 Sep 2002 14:55:17 +0300
To: www-style@w3.org
Message-Id: <00BB178A-C8A2-11D6-9247-003065B8CF0E@niksula.hut.fi>

On Saturday, Aug 31, 2002, at 16:02 Europe/Helsinki, Ian Hickson wrote:

> On Fri, 30 Aug 2002, Peter Sheerin wrote:
>>
>> I believe that the fonts module must specify a suggested behavior
>> when faced with a document that specifies a character it can not
>> render because no glyph is available.
>
> I agree.

I agree that it would be good to discourage the use of the question 
mark as a generic surrogate. However, I think the implementation 
details should follow the OS practice and not be normatively specified 
in CSS when there is an OS practice.

  * Using the OS practice helps the user understand that a missing 
character
    is being represented, because the behavior is consistent with the 
behavior
    of other apps.
  * The OS text engine may do the fallback internally, which is likely 
to be
    more efficient than application-side fallback.
  * The OS practice may be better than using a single replacement 
character,
    but the better method may not be available on all target platforms 
of CSS.

For example, Mac OS X comes with a last resort font that contains 
generic fallback characters for each Unicode block. With the last 
resort font the user has a better idea about what is missing.

Screenshot:
http://www.niksula.cs.hut.fi/~hsivonen/typography/last-resort.png

I'd like to see something like this in the spec:
User agents MUST NOT use U+003F QUESTION MARK as a fallback 
representation when a glyph for a given character is missing. If a user 
agent is running on a platform that has a convention specifically 
designed for representing Unicode characters for which glyphs are 
unavailable, the user agent SHOULD follow the platform convention. 
Otherwise, user agents SHOULD use U+FFFD REPLACEMENT CHARACTER as the 
fallback representation.

>> Also, the set of characters specified in the current HTML DTDs is
>> not really sufficient to display many important characters, [...]
>
> HTML4 references ISO10646 which means it has every UNICODE character.
> Ditto XML. Do you want HTML to have actual _named entities_ for all
> 16000+ characters? That simply doesn't scale.

I'm inclined to consider named character entities harmful, because they 
move an input problem to the user agent that is displaying the document 
and increase parsing complexity by requiring the XML parser to process 
the external DTD subset (in the usual case) even when the document 
could otherwise be treated as a standalone document. I think dealing 
with the issue on the input method level on the author's system makes 
more sense.

-- 
Henri Sivonen
hsivonen@niksula.hut.fi
http://www.hut.fi/u/hsivonen/

Received on Sunday, 15 September 2002 07:55:58 UTC