Re: Case of language tags from Peter Patel-Schneider on 2013-02-28 (public-rdf-wg@w3.org from February 2013)

From: Peter Patel-Schneider <pfpschneider@gmail.com>
Date: Thu, 28 Feb 2013 10:07:09 -0800
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: RDF-WG <public-rdf-wg@w3.org>
Message-ID: <CAMpDgVx5Vj1d6B9wUe9aTZBQkg=4BQ2eoRKMVPVL6AoQc8xHBw@mail.gmail.com>

I'm not an expert in BCP47, and going through the grammar is painful (what
*is* ALPHA?).

However, it sure seems to me that language tags are US-ASCII characters,
and BCP47 itself talks about upper and lower case (boy is that ever an old
notion!).  It thus seems to me that what is meant is perfectly clear in
terms of BCP47, which even has a similar warning about how to change case
in language tags.  If the WG wanted to be more pendantic then the document
could say something like, "does not contain any uppercase US-ASCII letters
- any uppercase US-ASCII letters in surface syntaxes MUST be normalized
into their US-ASCII lowercase equivalents".

I think that just saying to treat the language tag (case?) insensitively
ends up with the same question as transforming to lower case.

peter

On Thu, Feb 28, 2013 at 9:26 AM, Andy Seaborne <
andy.seaborne@epimorphics.com> wrote:

>
> Section 3.3: (of the editors draft):
>
> """
> a non-empty language tag as defined by [BCP47]. The language tag must be
> well-formed according to section 2.2.9 of [BCP47], and must be normalized
> to lowercase.
> """
>
> but "lowercase" is locale sensitive.
>
> What is lower case "I"?  It's not always "i".
>
> It isn't in Turkish where there are different dotted and dotless I-like
> letters.
>
> Upper case "I" (U+0049); lower case "ı" (U+0131)
> !=
> Upper case "İ" (U+0130); lower case "i" (U+0049)
>
> http://www.i18nguy.com/**unicode/turkish.png<http://www.i18nguy.com/unicode/turkish.png>
>
> The ideal solution is to say that the language tag is to be treated as
> case insensitively.
>
>         Andy
>
> (this email is in UTF-8)
>
>

Received on Thursday, 28 February 2013 18:07:39 UTC