W3C home > Mailing lists > Public > public-rdf-comments@w3.org > January 2017

Re: Are literal language tags compared in lowercase?

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 11 Jan 2017 19:00:47 +0000
Cc: public-rdf-comments@w3.org
Message-Id: <3E7DA14B-6276-423B-AD37-716EB9E10EB7@cyganiak.de>
To: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>
Hi Stian,

An answer cannot be determined with 100% certainty from the text.

What is clear:

- "Hello"@en and "Hello"@EN have the same value
- One MAY normalise "Hello"@EN to "Hello"@en
- In RDF 2004, "Hello"@en and "Hello"@EN were clearly equal

RDF 2004 forced the language tag to be lower-cased in the abstract syntax. Implementations of RDF 2004 often did not do that, but retained the case when storing or transforming RDF, while still treating @en and @EN as equal. My recollection is that we wanted to change the language of the spec to make this behaviour legal. Unfortunately it seems the language came out less clear than it should be. I do not think that there was any intention to make @en and @EN not equal.

Best,
Richard



> On 11 Jan 2017, at 17:47, Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk> wrote:
> 
> This is a comment for RDF 1.1 Concepts
> http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/
> 
>> From https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-Graph-Literal 
> 
>> A literal is a language-tagged string if the third element is present.
>> Lexical representations of language tags may be converted to lower case. The
>> value space of language tags is always in lower case.
> 
> Followed by:
> 
>> Literal term equality: Two literals are term-equal (the same RDF literal) 
>> if and only if the two lexical forms, the two datatype IRIs, and the 
>> two language tags (if any) compare equal, character by character. Thus, 
>> two literals can have the same value without being the same RDF term. 
>> For example:
>> 
>>      "1"^^xs:integer
>>     "01"^^xs:integer
>> 
>> denote the same value, but are not the same literal RDF terms and are not
>> term-equal because their lexical form differs.
> 
> 
> Could you help me clarify how language tags should be compared for determining
> literal term equality? This came up in the Commons RDF discussion in
> https://issues.apache.org/jira/browse/COMMONSRDF-51
> 
> 
> There are two interpretations as far as I can see:
> 
> a) (Unicode) Character by character   
>   "Hello"@en-us  !=  "Hello"@EN-US  !=  "Hello"@en-US
> 
> b) (Lower case Unicode) Character by character
>  "Hello"@en  ==  "Hello"@EN  ==  "Hello"@en-US
> 
> 
> The general interpretation seems to be that because the lexical representations
> MAY be converted to lower case, plus the value space is lower case, language
> tags should be compared in lower case as in b).   
> 
> However the text does say literally "character by character" as in a) 
> 
> So I would suggest - if you agree on b) - an amendmend like:
> 
>  Literal term equality: Two literals are term-equal (the same RDF literal) 
>  if and only if the two lexical forms, the two datatype IRIs, and the 
>  two language tags (if any) compare equal, character by character
>  (but language tags must be compared in lower case).
>  Thus, two literals ...
> 
> 
> 
> -- 
> Stian Soiland-Reyes
> http://orcid.org/0000-0001-9842-9718
> 
> 
> 
Received on Wednesday, 11 January 2017 19:01:18 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:59:51 UTC