Case of language identifiers from Jeremy Carroll on 2002-10-23 (w3c-rdfcore-wg@w3.org from October 2002)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Wed, 23 Oct 2002 11:27:56 +0200
To: w3c-rdfcore-wg@w3.org
Message-Id: <200210231127.56628.jjc@hpl.hp.com>

Now we have the notion of value of a literal reasonably clear, a little 
issuette becomes clearer.

We have previously agreed that

"foo"-en

and

"foo"-EN

are the same.

(I think in Cannes).

We can rephase that in model theorteic terms as:

<rdf:Description xml:lang="en">
  <rdf:value>foo</rdf:value>
</rdf:Description>


<rdf:Description xml:lang="EN">
  <rdf:value>foo</rdf:value>
</rdf:Description>

entail one another.

The question that comes to mind is when do we do the case normalization on the 
language tag.
Just to be inconvenient, the convention for language tags is that the first 
component is lower case, the second upper case: e.g. en-US

Possible answers are:

1: ASAP, during parsing, the abstact syntax is in terms of lower case 
identifiers.

2: In the equality function in the abstract syntax, before datatyping and the 
model theory.
This is the current position. It has the defect that datatyping and the model 
theory should then be expressed as operations over equivalence classes, in 
some way or other.

3: During the datatype mapping for String and XML Literals
The abstract syntax is then defined in terms of any case identifiers.
But the case is normalized before we get to a value.
This is subtly different in that for unknown datatypes we don't know that they 
are insensitive to the case of the language identifier.
i.e. <a:datatype>"foo"-en and <a:datatype>"foo"-EN
might be different; it is just that that are the same for all the ones we talk 
about.


My preference is 1 which would be a change from what we have previously 
agreed.

Jeremy

Received on Wednesday, 23 October 2002 05:30:11 UTC