W3C home > Mailing lists > Public > semantic-web@w3.org > January 2017

Re: Are literal language tags case sensitive?

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Thu, 12 Jan 2017 09:45:14 -0800
To: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>, semantic-web@w3.org
Message-ID: <8ec0c770-884a-fc12-e24c-3a860da06305@gmail.com>
Sumary:  "Hello"@en-gb and "Hello"@en-GB are term equal.  If they are not term
equal then they have different literal values.


The answers to your questions should be easily available from
https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal
but of course the situation is somewhat murky.

>From that section a language-tagged string consists of three elements
1/ a lexical form, which is a Unicode string
2/ the datatype IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#langString
3/ a language tag which is well-formed according to section 2.2.9 of [BCP47].

So, is (the Turtle literal) "Hello"@en-GB a language-tagged string and, if so,
what is its language tag?  As en-GB meets all the requirements to be a
language tag in BCP47 "Hello"@en-GB is indeed a language-tagged string.  Its
lexical form is the Unicode string Hello.  BCP47 states that language tags are
to be treated as case insensitive, so en-GB is not distinct from en-gb.  The
language tag of "Hello"@en-GB is thus a case-insensitive string, i.e., one
where the Unicode character G is considered the same as the Unicode character
g.  The language tags en-gb and en-GB then compare equal character by character.

So there is a fairly strong argument to be made that "Hello"@en-gb and
"Hello"@en-GB are indeed term equal.  This is also a fairly strong argument
that RDF systems SHOULD (not just MAY) convert language tags to lower case.

I haven't said anything about the value space of language tags.  Indeed the
value space of language tags doesn't actually affect anything in RDF.   The
literal value for a language-tagged string is just "a pair consisting of its
lexical form and its language tag".  There is no conversion to value spaces
going on at all here.  So if "Hello"@en-gb and "Hello"@en-GB are not
term-equal then they have different literal values.  This is yet another
argument for their term equality.

peter

PS:  It shouldn't have been so difficult to tease this all out.  There should
have been tests in the RDF 1.1 test suite to cover this, but I can't find one.



On 01/12/2017 08:14 AM, Stian Soiland-Reyes wrote:
> January.. just the right time for some semantic questions, right?
> 
> I just asked on public-rdf-comments@ about "Are literal language tags compared
> in lowercase?" [1] and I think the conclusion was that RDF 1.1 is slightly
> ambigious about this - depending on the reader.
> 
> Could RDF practicioners (in particular implementers) help me clarify 
> if RDF Literal's language tags are case sensitive?
> 
> This came up as a potential bug in Commons RDF [2]
> but I guess it is a more general question.
> 
> 
> Example:
> 
>     "Hello"@en-gb
>     "Hello"@en-GB
> 
> Are they equal?
> 
> We can agree they are _value equal_, as the *value space* of language tags is
> lower case [3] and BCP47 says casing MUST NOT be taken to carry meaning [4].
> 
> But are these literals _term equal_? Well, they won't compare directly
> "character by character" according to RDF 1.1 [5]:
> 
>> Literal term equality: Two literals are term-equal (the same RDF literal) if
>> and only if the two lexical forms, the two datatype IRIs, and the two
>> language tags (if any) compare equal, character by character. 
> 
> 
> However they COULD in some implemtations still be _term equal_, because [3]:
> 
>> Lexical representations of language tags may be converted to lower case.
> 
> And thus I think it is ambigious how to compare language tags when determining
> if two RDF literals are term equal or not in RDF 1.1 - or at least there might
> not be consistent behaviour across implementations.
> 
> So which one is it? What's the actual practice for comparing such language tags, for
> instance in SPARQL queries or graph.contains() kind of operations?
> 
> Note that reading of BCP47 [4] do recommend show/preserve language tag casing
> according to recommended casing style (e.g. "en-US") - so I think it's right if
> an RDF 1.1 implementations preserves the language tag -- (however it seems not
> currently permitted to magically transform them to the recommended style from
> lowercase!)
> 
> 
> Your views..? :-)
> 
Received on Thursday, 12 January 2017 17:45:51 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 12 January 2017 17:45:55 UTC