- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Thu, 12 Jan 2017 09:45:14 -0800
- To: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>, semantic-web@w3.org
Sumary: "Hello"@en-gb and "Hello"@en-GB are term equal. If they are not term equal then they have different literal values. The answers to your questions should be easily available from https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal but of course the situation is somewhat murky. >From that section a language-tagged string consists of three elements 1/ a lexical form, which is a Unicode string 2/ the datatype IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#langString 3/ a language tag which is well-formed according to section 2.2.9 of [BCP47]. So, is (the Turtle literal) "Hello"@en-GB a language-tagged string and, if so, what is its language tag? As en-GB meets all the requirements to be a language tag in BCP47 "Hello"@en-GB is indeed a language-tagged string. Its lexical form is the Unicode string Hello. BCP47 states that language tags are to be treated as case insensitive, so en-GB is not distinct from en-gb. The language tag of "Hello"@en-GB is thus a case-insensitive string, i.e., one where the Unicode character G is considered the same as the Unicode character g. The language tags en-gb and en-GB then compare equal character by character. So there is a fairly strong argument to be made that "Hello"@en-gb and "Hello"@en-GB are indeed term equal. This is also a fairly strong argument that RDF systems SHOULD (not just MAY) convert language tags to lower case. I haven't said anything about the value space of language tags. Indeed the value space of language tags doesn't actually affect anything in RDF. The literal value for a language-tagged string is just "a pair consisting of its lexical form and its language tag". There is no conversion to value spaces going on at all here. So if "Hello"@en-gb and "Hello"@en-GB are not term-equal then they have different literal values. This is yet another argument for their term equality. peter PS: It shouldn't have been so difficult to tease this all out. There should have been tests in the RDF 1.1 test suite to cover this, but I can't find one. On 01/12/2017 08:14 AM, Stian Soiland-Reyes wrote: > January.. just the right time for some semantic questions, right? > > I just asked on public-rdf-comments@ about "Are literal language tags compared > in lowercase?" [1] and I think the conclusion was that RDF 1.1 is slightly > ambigious about this - depending on the reader. > > Could RDF practicioners (in particular implementers) help me clarify > if RDF Literal's language tags are case sensitive? > > This came up as a potential bug in Commons RDF [2] > but I guess it is a more general question. > > > Example: > > "Hello"@en-gb > "Hello"@en-GB > > Are they equal? > > We can agree they are _value equal_, as the *value space* of language tags is > lower case [3] and BCP47 says casing MUST NOT be taken to carry meaning [4]. > > But are these literals _term equal_? Well, they won't compare directly > "character by character" according to RDF 1.1 [5]: > >> Literal term equality: Two literals are term-equal (the same RDF literal) if >> and only if the two lexical forms, the two datatype IRIs, and the two >> language tags (if any) compare equal, character by character. > > > However they COULD in some implemtations still be _term equal_, because [3]: > >> Lexical representations of language tags may be converted to lower case. > > And thus I think it is ambigious how to compare language tags when determining > if two RDF literals are term equal or not in RDF 1.1 - or at least there might > not be consistent behaviour across implementations. > > So which one is it? What's the actual practice for comparing such language tags, for > instance in SPARQL queries or graph.contains() kind of operations? > > Note that reading of BCP47 [4] do recommend show/preserve language tag casing > according to recommended casing style (e.g. "en-US") - so I think it's right if > an RDF 1.1 implementations preserves the language tag -- (however it seems not > currently permitted to magically transform them to the recommended style from > lowercase!) > > > Your views..? :-) >
Received on Thursday, 12 January 2017 17:45:51 UTC