- From: Jeen Broekstra <jeen.broekstra@gmail.com>
- Date: Fri, 13 Jan 2017 11:34:35 +1100
- To: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>
- Cc: semantic-web@w3.org
I agree with Peter’s reading of the spec, and this is what RDF4J in fact implements (we had a fairly involved design discussion about this a few months back, see [0]). Language tag case is preserved but is always compared case-insensitively when determining term-equality. As a further data point, the definition for literal comparison in the RDF 1.0 spec[1] actually specifically states that language tags are to be normalised to lower case for comparisons. Obviously that is superseded by the definitions in the RDF 1.1 spec, but it gives a further hint as to the intents, I reckon. Jeen [0] https://github.com/eclipse/rdf4j/issues/667 [1] https://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-language-identifier > On 13 Jan 2017, at 03:14, Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk> wrote: > > January.. just the right time for some semantic questions, right? > > I just asked on public-rdf-comments@ about "Are literal language tags compared > in lowercase?" [1] and I think the conclusion was that RDF 1.1 is slightly > ambigious about this - depending on the reader. > > Could RDF practicioners (in particular implementers) help me clarify > if RDF Literal's language tags are case sensitive? > > This came up as a potential bug in Commons RDF [2] > but I guess it is a more general question. > > > Example: > > "Hello"@en-gb > "Hello"@en-GB > > Are they equal? > > We can agree they are _value equal_, as the *value space* of language tags is > lower case [3] and BCP47 says casing MUST NOT be taken to carry meaning [4]. > > But are these literals _term equal_? Well, they won't compare directly > "character by character" according to RDF 1.1 [5]: > >> Literal term equality: Two literals are term-equal (the same RDF literal) if >> and only if the two lexical forms, the two datatype IRIs, and the two >> language tags (if any) compare equal, character by character. > > > However they COULD in some implemtations still be _term equal_, because [3]: > >> Lexical representations of language tags may be converted to lower case. > > And thus I think it is ambigious how to compare language tags when determining > if two RDF literals are term equal or not in RDF 1.1 - or at least there might > not be consistent behaviour across implementations. > > So which one is it? What's the actual practice for comparing such language tags, for > instance in SPARQL queries or graph.contains() kind of operations? > > Note that reading of BCP47 [4] do recommend show/preserve language tag casing > according to recommended casing style (e.g. "en-US") - so I think it's right if > an RDF 1.1 implementations preserves the language tag -- (however it seems not > currently permitted to magically transform them to the recommended style from > lowercase!) > > > Your views..? :-) > > -- > [1] http://lists.w3.org/Archives/Public/public-rdf-comments/2017Jan/ > [2] https://issues.apache.org/jira/browse/COMMONSRDF-51 > [3] https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-Graph-Literal > [4] https://tools.ietf.org/html/bcp47#section-2.1.1 > [5] https://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#dfn-literal-term-equality > > >
Received on Friday, 13 January 2017 00:35:13 UTC