RE: [css3-fonts] font selection for Unicode Variation Selector from Koji Ishii on 2011-02-25 (public-i18n-cjk@w3.org from January to March 2011)

From: Koji Ishii <kojiishi@gluesoft.co.jp>
Date: Thu, 24 Feb 2011 22:09:21 -0500
To: John Daggett <jdaggett@mozilla.com>
CC: "www-style@w3.org" <www-style@w3.org>, John Hudson <tiro@tiro.com>, "CJK discussion (public-i18n-cjk@w3.org)" <public-i18n-cjk@w3.org>, "'WWW International' (www-international@w3.org)" <www-international@w3.org>
Message-ID: <A592E245B36A8949BDB0A302B375FB4E0AB201D5D4@MAILR001.mail.lan>
I'm not sure if this suffices the "strong use case", but allow me to try.

You have a font that supports Adobe-Japan1 IVS you want to use primary, but you also have a font that supports both Adobe-Japan1 IVS and Hanyo-Denshi IVS. Or you could want to use Palatino as a primary body font, but you have a font that has glyphs for Unicode Standard Variants[1].

You may then enter IVS/UVS though your Input Method Editor, or copy the text with IVS/UVS from your e-mail as a plain text. The glyphs will not show correctly until you select the span and apply different font.

Someday after that, you may want to change the font of the paragraph that includes IVS/UVS. Select the paragraph and apply the font to see that the glyphs changed to what you do not want.

These experiences feel me that it's like what I had in '80 or early '90 before Unicode; the code point were shared among the locales, and therefore you have to apply Latin font to Latin text and East Asian font to East Asian text by yourself. If you apply incorrect fonts, glyphs will be broken. Unicode came in to rescue, where code point determines the glyph (non-formatting information) and font determines styles.

I think IVS/UVS not being able to display properly without applying the appropriate font at the top of the font list makes the users of IVS/UVS back to those days. As I said, if the feature was intended to work that way, that's only a font feature. It was done in Unicode because people wanted to work without styling information.

The problem for me to show a use case is that the technology is still new, so while the font section above is an actual use case, "enter IVS/UVS through IME" hasn't happened yet. But I think it will happen before CSS3 fonts go REC. That's my guess or hope, not an real use case we have today though.


Regards,
Koji

[1] http://unicode.org/Public/UNIDATA/StandardizedVariants.html


-----Original Message-----
From: John Daggett [mailto:jdaggett@mozilla.com] 
Sent: Tuesday, February 22, 2011 12:04 PM
To: Koji Ishii
Cc: www-style@w3.org; John Hudson; CJK discussion (public-i18n-cjk@w3.org); 'WWW International' (www-international@w3.org)
Subject: Re: [css3-fonts] font selection for Unicode Variation Selector

Koji Ishii wrote:

> > The question is not difficulty, I don't think it's the right 
> > behavior to require font fallback for UVS selectors.
> > It doesn't make sense to me the way the Unicode spec is written, it 
> > doesn't make sense from given the way other font variant features 
> > work and it doesn't make sense to me from a performance perspective.
> 
> First, there are two perspective; "should", and "technically 
> feasible", and I understand both are important. What you mentioned 
> first is about "should", and performance etc. are the later. I would 
> like to focus on "should" first, because you mentioned you don't think 
> it's the right behavior. If we could agree on we "should", we could 
> then discuss if it's feasible, or if there were any compromises to 
> make it. If we agree on we "should NOT", then the discussion is over.

Again, I think you need to show a strong use case for supporting UVS in font fallback.  What you're proposing effectively changes font fallback from a simple single-pass procedure ("what font supports character x?") to a two-pass procedure ("what font supports character x + selector? if not found, what font supports character x?").  It significantly complicates character-by-character handling.  We should not be adding complicated character-level handling unless there is a really strong reason to do so.

In general, the philosophy behind the CSS3 Fonts spec is that instead of relying on the uncertain results of font fallback, authors can now use the @font-face mechanism to supply fonts that meet the requirements of their content.  Font fallback for variation selectors inevitably means the use of a mixture of fonts across a text run, a result that's rarely ideal.  If an author requires a given variation they should supply a font containing it rather than relying on font fallback handling to magically conjure up the "right" representation for a text run.

In this case "should" and "technically feasible" are not independent issues, the question is whether the benefits of using a given approach produces results that justify the complexity and implementation cost. 

> He also raised another interesting question; if font B has "a"
> and Umlaut[3] while font A has only "a", and source text is
>   U+0061 LATIN SMALL LETTER A
>   U+0308 COMBINING DIAERESIS
> Would CSS render the glyph using font A or B? If it uses font A, the 
> Umlaut will not be rendered.
> 
> He suggested that UVS is closer to the combining marks rather than 
> OpenType font variant features.

The use of combining diacritics is completely different than the use of variation sequences. An 'a' character with an umlaut is a different character than an 'a' while a variation sequence specifies a representation of a given character.  The mechanism for handling combining diacritics is also different, in this case most implementations would normalize the text to the codepoint for a-umlaut *before* matching fonts.  Note that's not specified in CSS, that's simply the way most text engines function.  Fonts treat combining marks as separate codepoints which can be matched independently (albiet with possibly undesirable results).  For
example:

  U+1F4A9 PILE OF POO
  U+0308  COMBINING DIAERESIS

Given a font that defines both these characters, this will render as a pile of poo with an umlaut, even though there is no normalized form for this. This will even render if the characters come from different fonts, although the placement of the umlaut may not be ideally placed over the poo. This is very different from the way variation sequences work.

Regards,

John Daggett
Received on Friday, 25 February 2011 03:10:27 UTC