Re: Are literal language tags case sensitive? from Jeen Broekstra on 2017-01-13 (semantic-web@w3.org from January 2017)

From: Jeen Broekstra <jeen.broekstra@gmail.com>
Date: Fri, 13 Jan 2017 11:34:35 +1100
To: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>
Cc: semantic-web@w3.org
Message-Id: <0E2EFE47-2F29-46F5-B585-E418C24BDBB0@gmail.com>

I agree with Peter’s reading of the spec, and this is what RDF4J in fact implements (we had a fairly involved design discussion about this a few months back, see [0]). Language tag case is preserved but is always compared case-insensitively when determining term-equality.

As a further data point, the definition for literal comparison in the RDF 1.0 spec[1] actually specifically states that language tags are to be normalised to lower case for comparisons. Obviously that is superseded by the definitions in the RDF 1.1 spec, but it gives a further hint as to the intents, I reckon. 

Jeen 

[0] https://github.com/eclipse/rdf4j/issues/667
[1] https://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-language-identifier

> On 13 Jan 2017, at 03:14, Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk> wrote:
> 
> January.. just the right time for some semantic questions, right?
> 
> I just asked on public-rdf-comments@ about "Are literal language tags compared
> in lowercase?" [1] and I think the conclusion was that RDF 1.1 is slightly
> ambigious about this - depending on the reader.
> 
> Could RDF practicioners (in particular implementers) help me clarify 
> if RDF Literal's language tags are case sensitive?
> 
> This came up as a potential bug in Commons RDF [2]
> but I guess it is a more general question.
> 
> 
> Example:
> 
>    "Hello"@en-gb
>    "Hello"@en-GB
> 
> Are they equal?
> 
> We can agree they are _value equal_, as the *value space* of language tags is
> lower case [3] and BCP47 says casing MUST NOT be taken to carry meaning [4].
> 
> But are these literals _term equal_? Well, they won't compare directly
> "character by character" according to RDF 1.1 [5]:
> 
>> Literal term equality: Two literals are term-equal (the same RDF literal) if
>> and only if the two lexical forms, the two datatype IRIs, and the two
>> language tags (if any) compare equal, character by character. 
> 
> 
> However they COULD in some implemtations still be _term equal_, because [3]:
> 
>> Lexical representations of language tags may be converted to lower case.
> 
> And thus I think it is ambigious how to compare language tags when determining
> if two RDF literals are term equal or not in RDF 1.1 - or at least there might
> not be consistent behaviour across implementations.
> 
> So which one is it? What's the actual practice for comparing such language tags, for
> instance in SPARQL queries or graph.contains() kind of operations?
> 
> Note that reading of BCP47 [4] do recommend show/preserve language tag casing
> according to recommended casing style (e.g. "en-US") - so I think it's right if
> an RDF 1.1 implementations preserves the language tag -- (however it seems not
> currently permitted to magically transform them to the recommended style from
> lowercase!)
> 
> 
> Your views..? :-)
> 
> -- 
> [1] http://lists.w3.org/Archives/Public/public-rdf-comments/2017Jan/
> [2] https://issues.apache.org/jira/browse/COMMONSRDF-51
> [3] https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-Graph-Literal
> [4] https://tools.ietf.org/html/bcp47#section-2.1.1
> [5] https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#dfn-literal-term-equality
> 
> 
>

Received on Friday, 13 January 2017 00:35:13 UTC