Update regarding Unicode caseless matching details [I18N-ACTION-209] from Phillips, Addison on 2013-04-23 (www-international@w3.org from April to June 2013)

From: Phillips, Addison <addison@lab126.com>
Date: Tue, 23 Apr 2013 19:01:01 +0000
To: "CSS WWW Style (www-style@w3.org)" <www-style@w3.org>
CC: "www-international@w3.org" <www-international@w3.org>
Message-ID: <7C0AF84C6D560544A17DDDEB68A9DFB5096065@ex10-mbx-36006.ant.amazon.com>
Hello CSS, 

A week or so ago I wrote the below email regarding our thinking about CSS Fonts caseless matching. John Daggett participated in our most recent teleconference [1], in which we discussed this again in detail. The Interationalization WG resolved that:

1. We feel you should use Unicode C+F case fold matching for font names. We consistently recommend this as the most appropriate matching for your use case as well as the form that recommend in general. By being consistent, we reduce the confusion and the potential for overlapping by incompatible matching schemes.

2. We agree that requiring normalization for font names is overkill, that it is inconsistent with specific real world use cases, and that there is no need to impose normalization on implementers as a result.

Hopefully this should resolve this issue and allow you to process. Please feel free to contact the I18N WG again if further clarification is needed.

Regards (for I18N),

Addison

Addison Phillips
Globalization Architect (Amazon Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.


[1] http://lists.w3.org/Archives/Public/www-international/2013AprJun/0031.html


> -----Original Message-----
> From: Phillips, Addison
> Sent: Tuesday, April 16, 2013 9:06 AM
> To: CSS WWW Style (www-style@w3.org)
> Cc: www-international@w3.org
> Subject: RE: Unicode caseless matching details [I18N-ACTION-198]
> 
> Hello CSS,
> 
> A couple of weeks ago I was tasked by the Internationalization WG [1] with
> responding to this thread. We discussed caseless matching and normalization
> (and fantasai participated in the discussion, which is minuted at [2]).
> 
> Basically, the thinking here was that, since font systems are somewhat diverse
> and fonts themselves use different encoded sequences, capitalizations, and
> other variations, this is a case in which both Unicode normalization and
> Unicode case folding are practical and justified. We would therefore
> recommend that you require Unicode NFC normalization and Unicode C+F case
> folding when comparing font names for selection. We think this is a special
> case because it is isolated and should have no side-effects on other parts of the
> Web, such as Selectors. It merely ensures that a given style sheet has the
> greatest likelihood of matching the intended font names as represented in the
> underlying system.
> 
> Regards (for I18N),
> 
> Addison
> 
> Addison Phillips
> Globalization Architect (Lab126)
> Chair (W3C I18N WG)
> 
> Internationalization is not a feature.
> It is an architecture.
> 
> [1] http://www.w3.org/International/track/actions/198 I18N-ACTION-198 [2]
> http://lists.w3.org/Archives/Public/www-international/2013JanMar/0384.html

> 
> > -----Original Message-----
> > From: Phillips, Addison
> > Sent: Friday, March 08, 2013 1:00 PM
> > To: CSS WWW Style (www-style@w3.org)
> > Cc: www-international@w3.org
> > Subject: RE: Unicode caseless matching details (was Re: [CSSWG]
> > Minutes Tucson F2F 2013-02-05 Tue PM I: Fonts)
> >
> > Some weeks ago, Jonathan Kew wrote:
> >
> > > >
> > > > Given the resistance there seems to be to implementing a -full-
> > > > solution to string-equivalence issues, I don't see why we'd
> > > > require people to implement anything more complex/expensive than a
> > > > purely
> > > > 1:1 mapping in this particular case.
> > >
> > > As Tab mentioned, the Internationalization Group concluded that C+F
> > > was the "right" way, as posted last month [1].  The description
> > > given by
> > Addison was:
> > >
> > >   Case Insensitive comparison: Where CSS cannot be
> > >   case-insensitive for legacy reasons or for implementation
> > >   choice reasons, the I18N WG recommends that comparison be done
> > >   using Unicode "common" plus "full" case fold mapping, as we
> > >   previously recommended. Suggestions that this is hard to
> > >   implement or low-performance are, in our opinion, unfounded, as
> > >   the mapping consists of a relatively small table. There is a
> > >   demonstration implementation in JavaScript and we have
> > >   confirmed with our Unicode colleagues that this is the right
> > >   approach [2].
> > >
> > > I would be fine with either C+S or C+F mappings, but I think we
> > > should take care to define only a single "Unicode caseless matching"
> > > if at all possible for use across all Web platform. I'm not
> > > especially keen on defining it in the Fonts spec but for now it's only needed
> there.
> > > I think it would be unfortunate to use C+F in some places and C+S in others.
> >
> > I agree that only one should be defined.
> >
> > Generally speaking, the "simple" (C+S) case is less good than the
> > "full" (C+F) case for matching in part because the comparison needs to
> > work both ways---as a casefold transform on the search term as well as
> > on the searched corpus. The difference between C+S and C+F is mainly
> > that the latter casefolds certain characters to a multicharacter
> > sequence. This sequence may actually be the one used in the searched
> > values. Using C+F therefore results in higher match fidelity.
> >
> > >
> > > I should note here that HTML5 specifies a different flavor of
> > > caseless matching for radio button name attributes but I think
> > > that's actually a mistake and have filed a bug on that, it's trying
> > > to use a particular Unicode caseless matching algorithm to mimic the
> > > matching behavior in IE, which clearly uses some flavor of
> > > platform-specific caseless
> > matching with normalization.
> > >
> >
> > It would be best if everyone used the same specific matching scheme
> > for caseless matching. That's easier for content authors to understand.
> >
> > At the moment, because normalization is effectively not part of the
> > "rules of the road", the I18N WG is recommending that specs and
> > implementations
> > *not* include normalization in internal identifier matching (such as
> > the radio button case). However, for caseless matching we feel that
> > C+F is the way to go and should form the base for a caseless match
> > algorithm. We are in the process of revising CharMod to say this (and
> > to provide detailed and specific guidelines).
> >
> > User text searching features (such as the "find" command in most
> > browsers) are a separate topic (and one where we feel that
> > normalization is probably advisable), but this is a separate case (which we'll
> cover in CharMod).
> >
> > So... for Font name matching, although there might be very minor
> > efficiency gains found using C+S, the I18N WG recommends that, for
> > consistency, C+F be used.
> >
> > Hope that helps.
> >
> > Regards,
> >
> > Addison
> >
> > Addison Phillips
> > Globalization Architect (Lab126)
> > Chair (W3C I18N WG)
> >
> > Internationalization is not a feature.
> > It is an architecture.
> >
> >
Received on Tuesday, 23 April 2013 19:02:18 UTC