Re: [css3-fonts] unicode-range and unicode normalization

Hi, Tab, John,

Thank you for the responses.

I believe it would help readers if the specification explicitly states how
characters
must be normalized, how glyphs are chosen, and what is left for
UA's discretion.

Yuzo

On Mon, Jul 12, 2010 at 12:43 PM, John Daggett <jdaggett@mozilla.com> wrote:

> Yuzo Fujishima wrote:

> > What unicode normalization (http://unicode.org/reports/tr15/) must be
> > applied to the characters in an HTML document before matching against
> > the unicode-range descriptor
> > (http://dev.w3.org/csswg/css3-fonts/#unicode-range-desc)?
> >
> > A. No normalization at all. All the codepoints are checked against
> unicode-range as-is.
> > B. Undefined. Whether to apply normalization is up to UA.
> > C. Must be normalized to NFC
> > D. Must be normalized to NFD
> > E. Must be normalized to NFKC
> > F. Must be normalize to NFKD
> >
> > In my opinion, A (or B) is the most realistic choice, seeing that
> > Chrome 6, Safari 6, IE8, and Opera 10 don't normalize stylesheets.
> > (Firefox 6 doesn't seem to be working in this respect.)
> >
> http://www.w3.org/International/tests/tests-html-css/tests-normalization/generate?test=10&serveas=xml&format=xhtml5


> The short answer is probably (C) strings should be NFC normalized
> before the font selection process is run, with some caveats listed
> below.

> The underlying question here is whether normalization is applied to a
> character stream before the font selection algorithm is run,
> unicode-range is just a part of that process.  That's actually
> independent of whether stylesheet data is normalized or not, the font
> selection process maps content character streams to font character
> maps, there aren't the same string equivalence issues.

> The font matching algorithm in CSS has always been described in
> relation to "characters", precisely how combining characters affect
> font fallback is unspecified.  Fonts can support combined forms and
> combining forms or just one and not the other (example: a font can
> have a glyph for 'a-ring' along with a glyph for 'a' and 'combining
> ring', so there are multiple ways to select appropriate glyphs for
> "Håkon").

> So the answer to your question isn't quite as simple as
> specifying a given normalization.  If a glyph for the combined
> codepoint exists in the font, using that glyph is probably best.
> Otherwise, ideally the base character and combining character should
> come from the same font, that assures correct placement of the
> combining character.  In the case where the combined character is not
> included in the cmap but both the base character and combining
> character are included, I don't think it makes sense to try to do font
> matching on the decomposition of base character + combining character,
> I think you'd end up testing for situations that rarely existed and
> for which the results would not be guaranteed to be correct anyways.

> So I think we can specify common cases that should match but there are
> some cases where it might be better left to UA's to deal with
> appropriately. I'm cc'ing the fonts list in case anyone there feels
> otherwise.

> Regards,

> John Daggett

Received on Monday, 12 July 2010 04:34:19 UTC