Re: Case of language identifiers

I also prefer/did case normalization on the language tag during parsing

-- ,
Jos De Roo, AGFA http://www.agfa.com/w3c/jdroo/


                                                                                                                        
                    Jeremy Carroll                                                                                      
                    <jjc@hpl.hp.com>          To:     w3c-rdfcore-wg@w3.org                                             
                    Sent by:                  cc:                                                                       
                    w3c-rdfcore-wg-requ       Subject:     Case of language identifiers                                 
                    est@w3.org                                                                                          
                                                                                                                        
                                                                                                                        
                    2002-10-23 11:27 AM                                                                                 
                                                                                                                        
                                                                                                                        






Now we have the notion of value of a literal reasonably clear, a little
issuette becomes clearer.

We have previously agreed that

"foo"-en

and

"foo"-EN

are the same.

(I think in Cannes).

We can rephase that in model theorteic terms as:

<rdf:Description xml:lang="en">
  <rdf:value>foo</rdf:value>
</rdf:Description>


<rdf:Description xml:lang="EN">
  <rdf:value>foo</rdf:value>
</rdf:Description>

entail one another.

The question that comes to mind is when do we do the case normalization on
the
language tag.
Just to be inconvenient, the convention for language tags is that the first

component is lower case, the second upper case: e.g. en-US

Possible answers are:

1: ASAP, during parsing, the abstact syntax is in terms of lower case
identifiers.

2: In the equality function in the abstract syntax, before datatyping and
the
model theory.
This is the current position. It has the defect that datatyping and the
model
theory should then be expressed as operations over equivalence classes, in
some way or other.

3: During the datatype mapping for String and XML Literals
The abstract syntax is then defined in terms of any case identifiers.
But the case is normalized before we get to a value.
This is subtly different in that for unknown datatypes we don't know that
they
are insensitive to the case of the language identifier.
i.e. <a:datatype>"foo"-en and <a:datatype>"foo"-EN
might be different; it is just that that are the same for all the ones we
talk
about.


My preference is 1 which would be a change from what we have previously
agreed.

Jeremy

Received on Wednesday, 23 October 2002 06:39:32 UTC