- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Thu, 28 Feb 2013 19:34:33 +0000
- To: Peter Patel-Schneider <pfpschneider@gmail.com>
- CC: RDF-WG <public-rdf-wg@w3.org>
Peter,
If the rule for lower casing is qualified by US-ASCII it would be OK
as it meets:
"""
Implementers SHOULD specify a locale-neutral
casing operation to ensure that case folding of subtags does not
produce this value, which is illegal in language tags.
"""
[*] "this value" is the upper case situation.
and the current text is a bit better than 2004 concepts where the
case-changing was separate from the RFC 3066 mention. The problem only
arises if the transformation to lower case is separate from the RFC 3006
reference.
There is a canonicalization algorithm in 2.1.1
"""
An implementation can reproduce this format without accessing the
registry as follows. ....
"""
(Didn't know about the Lithuanian and Azeri issues)
On 28/02/13 18:07, Peter Patel-Schneider wrote:
> I'm not an expert in BCP47, and going through the grammar is painful
> (what *is* ALPHA?).
>
> However, it sure seems to me that language tags are US-ASCII characters,
> and BCP47 itself talks about upper and lower case (boy is that ever an
> old notion!). It thus seems to me that what is meant is perfectly clear
> in terms of BCP47, which even has a similar warning about how to change
> case in language tags. If the WG wanted to be more pendantic then the
> document could say something like, "does not contain any uppercase
> US-ASCII letters - any uppercase US-ASCII letters in surface syntaxes
> MUST be normalized into their US-ASCII lowercase equivalents".
>
> I think that just saying to treat the language tag (case?) insensitively
> ends up with the same question as transforming to lower case.
You would not be lower casing and exporting changed data if you retain
the original and do a local sensitive comparison of strings.
The world will not fall apart because of this ... but it has happened in
the real world:
https://issues.apache.org/jira/browse/JENA-407
Andy
>
> peter
>
> On Thu, Feb 28, 2013 at 9:26 AM, Andy Seaborne
> <andy.seaborne@epimorphics.com <mailto:andy.seaborne@epimorphics.com>>
> wrote:
>
>
> Section 3.3: (of the editors draft):
>
> """
> a non-empty language tag as defined by [BCP47]. The language tag
> must be well-formed according to section 2.2.9 of [BCP47], and must
> be normalized to lowercase.
> """
>
> but "lowercase" is locale sensitive.
>
> What is lower case "I"? It's not always "i".
>
> It isn't in Turkish where there are different dotted and dotless
> I-like letters.
>
> Upper case "I" (U+0049); lower case "ı" (U+0131)
> !=
> Upper case "İ" (U+0130); lower case "i" (U+0049)
>
> http://www.i18nguy.com/__unicode/turkish.png
> <http://www.i18nguy.com/unicode/turkish.png>
>
> The ideal solution is to say that the language tag is to be treated
> as case insensitively.
>
> Andy
>
> (this email is in UTF-8)
>
>
Received on Thursday, 28 February 2013 19:35:05 UTC