- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Thu, 28 Feb 2013 19:34:33 +0000
- To: Peter Patel-Schneider <pfpschneider@gmail.com>
- CC: RDF-WG <public-rdf-wg@w3.org>
Peter, If the rule for lower casing is qualified by US-ASCII it would be OK as it meets: """ Implementers SHOULD specify a locale-neutral casing operation to ensure that case folding of subtags does not produce this value, which is illegal in language tags. """ [*] "this value" is the upper case situation. and the current text is a bit better than 2004 concepts where the case-changing was separate from the RFC 3066 mention. The problem only arises if the transformation to lower case is separate from the RFC 3006 reference. There is a canonicalization algorithm in 2.1.1 """ An implementation can reproduce this format without accessing the registry as follows. .... """ (Didn't know about the Lithuanian and Azeri issues) On 28/02/13 18:07, Peter Patel-Schneider wrote: > I'm not an expert in BCP47, and going through the grammar is painful > (what *is* ALPHA?). > > However, it sure seems to me that language tags are US-ASCII characters, > and BCP47 itself talks about upper and lower case (boy is that ever an > old notion!). It thus seems to me that what is meant is perfectly clear > in terms of BCP47, which even has a similar warning about how to change > case in language tags. If the WG wanted to be more pendantic then the > document could say something like, "does not contain any uppercase > US-ASCII letters - any uppercase US-ASCII letters in surface syntaxes > MUST be normalized into their US-ASCII lowercase equivalents". > > I think that just saying to treat the language tag (case?) insensitively > ends up with the same question as transforming to lower case. You would not be lower casing and exporting changed data if you retain the original and do a local sensitive comparison of strings. The world will not fall apart because of this ... but it has happened in the real world: https://issues.apache.org/jira/browse/JENA-407 Andy > > peter > > On Thu, Feb 28, 2013 at 9:26 AM, Andy Seaborne > <andy.seaborne@epimorphics.com <mailto:andy.seaborne@epimorphics.com>> > wrote: > > > Section 3.3: (of the editors draft): > > """ > a non-empty language tag as defined by [BCP47]. The language tag > must be well-formed according to section 2.2.9 of [BCP47], and must > be normalized to lowercase. > """ > > but "lowercase" is locale sensitive. > > What is lower case "I"? It's not always "i". > > It isn't in Turkish where there are different dotted and dotless > I-like letters. > > Upper case "I" (U+0049); lower case "ı" (U+0131) > != > Upper case "İ" (U+0130); lower case "i" (U+0049) > > http://www.i18nguy.com/__unicode/turkish.png > <http://www.i18nguy.com/unicode/turkish.png> > > The ideal solution is to say that the language tag is to be treated > as case insensitively. > > Andy > > (this email is in UTF-8) > >
Received on Thursday, 28 February 2013 19:35:05 UTC