RE: Language Tag Case Conflict (between RDF1.1 and BCP47) from Markus Lanthaler on 2014-01-21 (public-rdf-wg@w3.org from January 2014)

From: Markus Lanthaler <markus.lanthaler@gmx.net>
Date: Tue, 21 Jan 2014 10:00:44 +0100
To: <public-rdf-wg@w3.org>
Cc: "'Andy Seaborne'" <andy@apache.org>
Message-ID: <007d01cf1687$4592ac30$d0b80490$@lanthaler@gmx.net>

On Monday, January 20, 2014 11:08 PM, Andy Seaborne wrote:
> BCP47 has quite a lot to say about canonicalization including:
> 
> """
> 4.5. Canonicalization of Language Tags
> 
>     Since a particular language tag can be used by many processes,
>     language tags SHOULD always be created or generated in canonical
>     form.
> """
> 
> i.e. it is data creator responsibility.  An RDF parser is not creating
> or generating.

Did I say that? RDF Concepts does not talk about parsers or serializers it
just says

  A literal is a language-tagged string if the third element is
  present. Lexical representations of language tags MAY be converted
  to lower case. The value space of language tags is always in lower
  case.

That lowercasing of the *lexical* representation might happen anywhere, also
in tools that "create or generate" RDF.

> """
> All comparisons MUST be performed in a case-insensitive manner.
> """
> 
> We're not ignoring BCP47 - indeed by saying "The value space of language
> tags is always in lower case" we're following it about comparisons.

Neither did I say that. I just said that the language tags are typically
formatted like so "en-US" and not like "en-us". All I proposed is to mention
somewhere, somehow that such transformations of the *lexical* representation
are allowed. I expect that a lot of people will ask themselves this
question.

Anyway, it's a tiny change and we don't seem to agree. So let's just leave
it out.

--
Markus Lanthaler
@markuslanthaler

Received on Tuesday, 21 January 2014 09:01:18 UTC