RE: [css2.1] editorial clarification: "case-insensitive" always means "ASCII case-insensitive", y/y?

(chair hat)

I have copied the Internationalization WG mailing list and included this in our next agenda.

(This rest is a personal response)

> Does the i18n wg have any input on whether :lang()'s argument
> should be ASCII case-insensitive or Unicode case-folded?

:lang's argument is a BCP 47 language range. Language tags and ranges are limited to a subset of ASCII and are case insensitive. cf. Section 2 of RFC 4647 which says:

   Language tags and thus language ranges are to be treated as case-
   insensitive: there exist conventions for the capitalization of some
   of the subtags, but these MUST NOT be taken to carry meaning.
   Matching of language tags to language ranges MUST be done in a case-
   insensitive manner.

Thus, I believe that you should require case-insensitive comparison. Note: I believe that, although users may (incorrectly) enter non-ASCII values into :lang(), such values should be considered invalid and case-folding (or case insensitive comparison of) those characters is unnecessary and even potentially harmful.

Note that Section 5.8.1 should be updated to reference BCP 47 and in particular RFC 4647 Basic Filtering.

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N, IETF IRI WGs)

Internationalization is not a feature.
It is an architecture.


> -----Original Message-----
> From: www-international-request@w3.org [mailto:www-international-
> request@w3.org] On Behalf Of fantasai
> Sent: Thursday, July 15, 2010 2:02 AM
> To: Zack Weinberg
> Cc: W3C Emailing list for WWW Style; 'WWW International'
> Subject: Re: [css2.1] editorial clarification: "case-insensitive"
> always means "ASCII case-insensitive", y/y?
> 
> On 07/14/2010 11:29 AM, Zack Weinberg wrote:
> > Section 4.1.3 says
> >
> >       * All CSS syntax is case-insensitive within the ASCII range
> (i.e.,
> >         [a-z] and [A-Z] are equivalent), except for parts that
> are not
> >         under the control of CSS.
> >
> > There are four other normative uses of the term "case-
> insensitive"
> > within the standard:
> >
> > 5.10 ... Pseudo-element and pseudo-class names are case-
> insensitive.
> > 5.11.4 ... The matching of C against the element's language value
> is
> >             performed case-insensitively.
> > 7.3 ... Media type names are case-insensitive.
> > 18.2 ... these [additional names for color properties] are
> >           case-insensitive ...
> 
> 5.10, 7.3, and 18.2 are ASCII case-insensitive per 4.1.3.
> 5.11.4, because it deals with user input, and not CSS-defined
> syntax,
> could be considered Unicode case-insensitive; this should be
> clarified.
> 
> Does the i18n wg have any input on whether :lang()'s argument
> should
> be ASCII case-insensitive or Unicode case-folded?
> 
> ~fantasai

Received on Thursday, 15 July 2010 14:05:57 UTC