Re: [css3-fonts] font selection for Unicode Variation Selector from Asmus Freytag on 2011-02-25 (www-style@w3.org from February 2011)

From: Asmus Freytag <asmusf@ix.netcom.com>
Date: Thu, 24 Feb 2011 21:04:24 -0800
To: Koji Ishii <kojiishi@gluesoft.co.jp>
CC: John Daggett <jdaggett@mozilla.com>, www-style@w3.org, John Hudson <tiro@tiro.com>, public-i18n-cjk@w3.org, www-international@w3.org
Message-ID: <4D673858.1060209@ix.netcom.com>
After some reflection, I would want to second Koji here.

Variation sequences are designed to have acceptable fallbacks (their 
base characters). However, not all variation sequences are created 100% 
equal.

Mongolian variation sequences are intended to allow one to show a 
particular glyph form, overriding contextual shaping for Mongolian. It's 
a good bet that using the fallback in such an instance distorts the 
text, even though the distortion may not result in the wrong *word*. 
There's a good chance that Mongolian fonts all support these sequences, 
so luckily, there's not expected to be a font selection issue.

Mathematical variation sequences were encoded for cases where it was not 
clear that a glyph distinction made for a difference in meaning, but 
where it was equally not clear that one could leave the choice of the 
glyph entirely to the font. Or where there was a chance that users might 
insist on just some of these shapes without the need to have two 
mathematical fonts around at the same time. In these cases, it's totally 
unclear what impact the use of a fallback (base character glyph) has to 
the document.

Ideographic variation sequences were created so that Unicode could avoid 
having to encode minor variants of ideographs as their own characters. 
The goal was to create a well specified way for users to express their 
intent (specific glyph) with the added benefit that the text would 
remain somewhat legible if a font substitution were to occur. In other 
words, using the fallback is OK, if the alternative is showing a missing 
glyph symbol.

If the user has fonts that can show the variation, I would tend to agree 
with Koji that it is a very unfriendly design if the user must always be 
aware of the presence of variation sequences in the text when deciding 
on the font selection. Such a design also performs poorly where 
authorship of the text and authorship of the style sheet are unrelated 
and / or are performed at different times.

If applying a style sheet to a text "breaks" the IVS, then this will 
create pressure to encode more of them as first class Unicode 
characters. That's not the direction things should go. Therefore, 
extending the fallback font selection behavior that is currently applied 
on a character basis to variation sequences, and especially IVS, seems a 
reasonable step.

A./


On 2/24/2011 7:09 PM, Koji Ishii wrote:
> I'm not sure if this suffices the "strong use case", but allow me to try.
>
> You have a font that supports Adobe-Japan1 IVS you want to use primary, but you also have a font that supports both Adobe-Japan1 IVS and Hanyo-Denshi IVS. Or you could want to use Palatino as a primary body font, but you have a font that has glyphs for Unicode Standard Variants[1].
>
> You may then enter IVS/UVS though your Input Method Editor, or copy the text with IVS/UVS from your e-mail as a plain text. The glyphs will not show correctly until you select the span and apply different font.
>
> Someday after that, you may want to change the font of the paragraph that includes IVS/UVS. Select the paragraph and apply the font to see that the glyphs changed to what you do not want.
>
> These experiences feel me that it's like what I had in '80 or early '90 before Unicode; the code point were shared among the locales, and therefore you have to apply Latin font to Latin text and East Asian font to East Asian text by yourself. If you apply incorrect fonts, glyphs will be broken. Unicode came in to rescue, where code point determines the glyph (non-formatting information) and font determines styles.
>
> I think IVS/UVS not being able to display properly without applying the appropriate font at the top of the font list makes the users of IVS/UVS back to those days. As I said, if the feature was intended to work that way, that's only a font feature. It was done in Unicode because people wanted to work without styling information.
>
> The problem for me to show a use case is that the technology is still new, so while the font section above is an actual use case, "enter IVS/UVS through IME" hasn't happened yet. But I think it will happen before CSS3 fonts go REC. That's my guess or hope, not an real use case we have today though.
>
>
> Regards,
> Koji
>
> [1] http://unicode.org/Public/UNIDATA/StandardizedVariants.html
>
> -----Original Message-----
> From: John Daggett [mailto:jdaggett@mozilla.com]
> Sent: Tuesday, February 22, 2011 12:04 PM
> To: Koji Ishii
> Cc: www-style@w3.org; John Hudson; CJK discussion (public-i18n-cjk@w3.org); 'WWW International' (www-international@w3.org)
> Subject: Re: [css3-fonts] font selection for Unicode Variation Selector
>
> Koji Ishii wrote:
>
>>> The question is not difficulty, I don't think it's the right
>>> behavior to require font fallback for UVS selectors.
>>> It doesn't make sense to me the way the Unicode spec is written, it
>>> doesn't make sense from given the way other font variant features
>>> work and it doesn't make sense to me from a performance perspective.
>> First, there are two perspective; "should", and "technically
>> feasible", and I understand both are important. What you mentioned
>> first is about "should", and performance etc. are the later. I would
>> like to focus on "should" first, because you mentioned you don't think
>> it's the right behavior. If we could agree on we "should", we could
>> then discuss if it's feasible, or if there were any compromises to
>> make it. If we agree on we "should NOT", then the discussion is over.
> Again, I think you need to show a strong use case for supporting UVS in font fallback.  What you're proposing effectively changes font fallback from a simple single-pass procedure ("what font supports character x?") to a two-pass procedure ("what font supports character x + selector? if not found, what font supports character x?").  It significantly complicates character-by-character handling.  We should not be adding complicated character-level handling unless there is a really strong reason to do so.
>
> In general, the philosophy behind the CSS3 Fonts spec is that instead of relying on the uncertain results of font fallback, authors can now use the @font-face mechanism to supply fonts that meet the requirements of their content.  Font fallback for variation selectors inevitably means the use of a mixture of fonts across a text run, a result that's rarely ideal.  If an author requires a given variation they should supply a font containing it rather than relying on font fallback handling to magically conjure up the "right" representation for a text run.
>
> In this case "should" and "technically feasible" are not independent issues, the question is whether the benefits of using a given approach produces results that justify the complexity and implementation cost.
>
>> He also raised another interesting question; if font B has "a"
>> and Umlaut[3] while font A has only "a", and source text is
>>    U+0061 LATIN SMALL LETTER A
>>    U+0308 COMBINING DIAERESIS
>> Would CSS render the glyph using font A or B? If it uses font A, the
>> Umlaut will not be rendered.
>>
>> He suggested that UVS is closer to the combining marks rather than
>> OpenType font variant features.
> The use of combining diacritics is completely different than the use of variation sequences. An 'a' character with an umlaut is a different character than an 'a' while a variation sequence specifies a representation of a given character.  The mechanism for handling combining diacritics is also different, in this case most implementations would normalize the text to the codepoint for a-umlaut *before* matching fonts.  Note that's not specified in CSS, that's simply the way most text engines function.  Fonts treat combining marks as separate codepoints which can be matched independently (albiet with possibly undesirable results).  For
> example:
>
>    U+1F4A9 PILE OF POO
>    U+0308  COMBINING DIAERESIS
>
> Given a font that defines both these characters, this will render as a pile of poo with an umlaut, even though there is no normalized form for this. This will even render if the characters come from different fonts, although the placement of the umlaut may not be ideally placed over the poo. This is very different from the way variation sequences work.
>
> Regards,
>
> John Daggett
Received on Friday, 25 February 2011 05:48:45 UTC