W3C home > Mailing lists > Public > semantic-web@w3.org > January 2017

Re: Are literal language tags case sensitive?

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Thu, 12 Jan 2017 09:45:14 -0800
To: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>, semantic-web@w3.org
Message-ID: <8ec0c770-884a-fc12-e24c-3a860da06305@gmail.com>
Sumary:  "Hello"@en-gb and "Hello"@en-GB are term equal.  If they are not term
equal then they have different literal values.

The answers to your questions should be easily available from
but of course the situation is somewhat murky.

>From that section a language-tagged string consists of three elements
1/ a lexical form, which is a Unicode string
2/ the datatype IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#langString
3/ a language tag which is well-formed according to section 2.2.9 of [BCP47].

So, is (the Turtle literal) "Hello"@en-GB a language-tagged string and, if so,
what is its language tag?  As en-GB meets all the requirements to be a
language tag in BCP47 "Hello"@en-GB is indeed a language-tagged string.  Its
lexical form is the Unicode string Hello.  BCP47 states that language tags are
to be treated as case insensitive, so en-GB is not distinct from en-gb.  The
language tag of "Hello"@en-GB is thus a case-insensitive string, i.e., one
where the Unicode character G is considered the same as the Unicode character
g.  The language tags en-gb and en-GB then compare equal character by character.

So there is a fairly strong argument to be made that "Hello"@en-gb and
"Hello"@en-GB are indeed term equal.  This is also a fairly strong argument
that RDF systems SHOULD (not just MAY) convert language tags to lower case.

I haven't said anything about the value space of language tags.  Indeed the
value space of language tags doesn't actually affect anything in RDF.   The
literal value for a language-tagged string is just "a pair consisting of its
lexical form and its language tag".  There is no conversion to value spaces
going on at all here.  So if "Hello"@en-gb and "Hello"@en-GB are not
term-equal then they have different literal values.  This is yet another
argument for their term equality.


PS:  It shouldn't have been so difficult to tease this all out.  There should
have been tests in the RDF 1.1 test suite to cover this, but I can't find one.

On 01/12/2017 08:14 AM, Stian Soiland-Reyes wrote:
> January.. just the right time for some semantic questions, right?
> I just asked on public-rdf-comments@ about "Are literal language tags compared
> in lowercase?" [1] and I think the conclusion was that RDF 1.1 is slightly
> ambigious about this - depending on the reader.
> Could RDF practicioners (in particular implementers) help me clarify 
> if RDF Literal's language tags are case sensitive?
> This came up as a potential bug in Commons RDF [2]
> but I guess it is a more general question.
> Example:
>     "Hello"@en-gb
>     "Hello"@en-GB
> Are they equal?
> We can agree they are _value equal_, as the *value space* of language tags is
> lower case [3] and BCP47 says casing MUST NOT be taken to carry meaning [4].
> But are these literals _term equal_? Well, they won't compare directly
> "character by character" according to RDF 1.1 [5]:
>> Literal term equality: Two literals are term-equal (the same RDF literal) if
>> and only if the two lexical forms, the two datatype IRIs, and the two
>> language tags (if any) compare equal, character by character. 
> However they COULD in some implemtations still be _term equal_, because [3]:
>> Lexical representations of language tags may be converted to lower case.
> And thus I think it is ambigious how to compare language tags when determining
> if two RDF literals are term equal or not in RDF 1.1 - or at least there might
> not be consistent behaviour across implementations.
> So which one is it? What's the actual practice for comparing such language tags, for
> instance in SPARQL queries or graph.contains() kind of operations?
> Note that reading of BCP47 [4] do recommend show/preserve language tag casing
> according to recommended casing style (e.g. "en-US") - so I think it's right if
> an RDF 1.1 implementations preserves the language tag -- (however it seems not
> currently permitted to magically transform them to the recommended style from
> lowercase!)
> Your views..? :-)
Received on Thursday, 12 January 2017 17:45:51 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 12 January 2017 17:45:55 UTC