- From: Peter Patel-Schneider <pfpschneider@gmail.com>
- Date: Thu, 28 Feb 2013 10:07:09 -0800
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: RDF-WG <public-rdf-wg@w3.org>
- Message-ID: <CAMpDgVx5Vj1d6B9wUe9aTZBQkg=4BQ2eoRKMVPVL6AoQc8xHBw@mail.gmail.com>
I'm not an expert in BCP47, and going through the grammar is painful (what *is* ALPHA?). However, it sure seems to me that language tags are US-ASCII characters, and BCP47 itself talks about upper and lower case (boy is that ever an old notion!). It thus seems to me that what is meant is perfectly clear in terms of BCP47, which even has a similar warning about how to change case in language tags. If the WG wanted to be more pendantic then the document could say something like, "does not contain any uppercase US-ASCII letters - any uppercase US-ASCII letters in surface syntaxes MUST be normalized into their US-ASCII lowercase equivalents". I think that just saying to treat the language tag (case?) insensitively ends up with the same question as transforming to lower case. peter On Thu, Feb 28, 2013 at 9:26 AM, Andy Seaborne < andy.seaborne@epimorphics.com> wrote: > > Section 3.3: (of the editors draft): > > """ > a non-empty language tag as defined by [BCP47]. The language tag must be > well-formed according to section 2.2.9 of [BCP47], and must be normalized > to lowercase. > """ > > but "lowercase" is locale sensitive. > > What is lower case "I"? It's not always "i". > > It isn't in Turkish where there are different dotted and dotless I-like > letters. > > Upper case "I" (U+0049); lower case "ı" (U+0131) > != > Upper case "İ" (U+0130); lower case "i" (U+0049) > > http://www.i18nguy.com/**unicode/turkish.png<http://www.i18nguy.com/unicode/turkish.png> > > The ideal solution is to say that the language tag is to be treated as > case insensitively. > > Andy > > (this email is in UTF-8) > >
Received on Thursday, 28 February 2013 18:07:39 UTC