W3C home > Mailing lists > Public > semantic-web@w3.org > January 2017

Are literal language tags case sensitive?

From: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>
Date: Thu, 12 Jan 2017 16:14:14 +0000
Message-ID: <20170112161414.GE16257@biggiebuntu.localdomain>
To: semantic-web@w3.org
January.. just the right time for some semantic questions, right?

I just asked on public-rdf-comments@ about "Are literal language tags compared
in lowercase?" [1] and I think the conclusion was that RDF 1.1 is slightly
ambigious about this - depending on the reader.

Could RDF practicioners (in particular implementers) help me clarify 
if RDF Literal's language tags are case sensitive?

This came up as a potential bug in Commons RDF [2]
but I guess it is a more general question.


Example:

    "Hello"@en-gb
    "Hello"@en-GB

Are they equal?

We can agree they are _value equal_, as the *value space* of language tags is
lower case [3] and BCP47 says casing MUST NOT be taken to carry meaning [4].

But are these literals _term equal_? Well, they won't compare directly
"character by character" according to RDF 1.1 [5]:

> Literal term equality: Two literals are term-equal (the same RDF literal) if
> and only if the two lexical forms, the two datatype IRIs, and the two
> language tags (if any) compare equal, character by character. 


However they COULD in some implemtations still be _term equal_, because [3]:

> Lexical representations of language tags may be converted to lower case.

And thus I think it is ambigious how to compare language tags when determining
if two RDF literals are term equal or not in RDF 1.1 - or at least there might
not be consistent behaviour across implementations.

So which one is it? What's the actual practice for comparing such language tags, for
instance in SPARQL queries or graph.contains() kind of operations?

Note that reading of BCP47 [4] do recommend show/preserve language tag casing
according to recommended casing style (e.g. "en-US") - so I think it's right if
an RDF 1.1 implementations preserves the language tag -- (however it seems not
currently permitted to magically transform them to the recommended style from
lowercase!)


Your views..? :-)

-- 
[1] http://lists.w3.org/Archives/Public/public-rdf-comments/2017Jan/
[2] https://issues.apache.org/jira/browse/COMMONSRDF-51
[3] https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-Graph-Literal
[4] https://tools.ietf.org/html/bcp47#section-2.1.1
[5] https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#dfn-literal-term-equality
Received on Thursday, 12 January 2017 16:16:47 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 12 January 2017 16:16:51 UTC