- From: John Daggett <jdaggett@mozilla.com>
- Date: Sun, 11 Jul 2010 20:43:02 -0700 (PDT)
- To: Yuzo Fujishima <yuzo@google.com>
- Cc: www-style@w3.org, www-font <www-font@w3.org>
Yuzo Fujishima wrote: > What unicode normalization (http://unicode.org/reports/tr15/) must be > applied to the characters in an HTML document before matching against > the unicode-range descriptor > (http://dev.w3.org/csswg/css3-fonts/#unicode-range-desc)? > > A. No normalization at all. All the codepoints are checked against unicode-range as-is. > B. Undefined. Whether to apply normalization is up to UA. > C. Must be normalized to NFC > D. Must be normalized to NFD > E. Must be normalized to NFKC > F. Must be normalize to NFKD > > In my opinion, A (or B) is the most realistic choice, seeing that > Chrome 6, Safari 6, IE8, and Opera 10 don't normalize stylesheets. > (Firefox 6 doesn't seem to be working in this respect.) > http://www.w3.org/International/tests/tests-html-css/tests-normalization/generate?test=10&serveas=xml&format=xhtml5 The short answer is probably (C) strings should be NFC normalized before the font selection process is run, with some caveats listed below. The underlying question here is whether normalization is applied to a character stream before the font selection algorithm is run, unicode-range is just a part of that process. That's actually independent of whether stylesheet data is normalized or not, the font selection process maps content character streams to font character maps, there aren't the same string equivalence issues. The font matching algorithm in CSS has always been described in relation to "characters", precisely how combining characters affect font fallback is unspecified. Fonts can support combined forms and combining forms or just one and not the other (example: a font can have a glyph for 'a-ring' along with a glyph for 'a' and 'combining ring', so there are multiple ways to select appropriate glyphs for "HÃ¥kon"). So the answer to your question isn't quite as simple as specifying a given normalization. If a glyph for the combined codepoint exists in the font, using that glyph is probably best. Otherwise, ideally the base character and combining character should come from the same font, that assures correct placement of the combining character. In the case where the combined character is not included in the cmap but both the base character and combining character are included, I don't think it makes sense to try to do font matching on the decomposition of base character + combining character, I think you'd end up testing for situations that rarely existed and for which the results would not be guaranteed to be correct anyways. So I think we can specify common cases that should match but there are some cases where it might be better left to UA's to deal with appropriately. I'm cc'ing the fonts list in case anyone there feels otherwise. Regards, John Daggett
Received on Monday, 12 July 2010 03:43:36 UTC