Re: Language Tag Case Conflict (between RDF1.1 and BCP47)

On 20/01/14 17:08, Markus Lanthaler wrote:
> On Monday, January 20, 2014 3:41 PM, Richard Cyganiak wrote:
>> This is clearly not an editorial change.
>> A change to a "MAY" statement *is* a change to RDF.
> What would it change? In this case it would just make implicit knowledge
> explicit (the fact that implementations might format language tags as
> described in BCP47). I can see how a lot of people might wonder why we
> "ignored" BCP47's recommendation given that we cite it. That being said, I
> really don't care much about this as it doesn't change anything in practice.
>> Adding informative text (e.g., in a Note) would be considered a
>> clarification, and hence not a change to the language itself. It would
>> still be more than an editorial fix.
>> The AC members are currently reviewing the RDF 1.1 specs and are
>> encouraged to indicate their support (or lack thereof) for sending the
>> documents to REC. Does this process, in theory, still provide Ontotext
>> with an opportunity to object to the current design, request a change,
>> or whatever? Not that we want to encourage such behaviour; it's just
>> that we should mention all the options provided by the process before
>> we reply that it's too late to change anything now.
> I don't think so but the director will certainly take the comment into
> consideration when deciding whether to approve the PR-REC transition.
> --
> Markus Lanthaler
> @markuslanthaler

BCP47 has quite a lot to say about canonicalization including:

4.5. Canonicalization of Language Tags

    Since a particular language tag can be used by many processes,
    language tags SHOULD always be created or generated in canonical

i.e. it is data creator responsibility.  An RDF parser is not creating 
or generating.

All comparisons MUST be performed in a case-insensitive manner.

We're not ignoring BCP47 - indeed by saying "The value space of language 
tags is always in lower case" we're following it about comparisons.

What is more, the quote algorithmic canonicalization is an approximation 
and the full version requires access to the registry (and and hence can 
potentially change in the future).

"en-BU" is canonically "en-MM" as BU is superseded by MM
"hak-CN" is "zh-hak-CN"


Received on Monday, 20 January 2014 22:08:06 UTC