Re: Proposal for ISSUE-12, string literals

On 13 May 2011, at 16:52, Alex Hall wrote:
>> I think the sensible way would be:
>> 1) every literal has *both* a datatype and a (possibly empty) language tag;
>> 2) of the built-in datatypes, only xsd:string can have non-empty language tags;
>> 3) plain literals and rdf:PlainLiterals don't exist;
>> 4) "foo" in concrete syntaxes is syntactic sugar for "foo"^^xsd:string.
>> 5) "foo"@en in concrete syntaxes is syntactic sugar for "foo"^^xsd:string@en.
>> 
...
> The main roadblock that I can see is that a datatype maps a single lexical string to a value; you'd have to define a special notion of datatyping for xsd:string which is essentially an identity mapping of <lexical, lang> pairs.  Otherwise you'd have "chat"^^xsd:string@en and "chat"^^xsd:string@fr with the same value, which won't fly.

Yes, that's right, RDF Semantics would have to be adapted to ensure that "foo"@en and "foo"@fr (which are now syntactic sugar for "foo"^^xsd:string@en and "foo"^^xsd:string@fr) are still different. But I think that's doable:

Let's write "xxx"^^yyy for a typed literal with *empty* language tag. Its interpretation is L2V("xxx"), where L2V is the lexical-to-value mapping of datatype yyy.

Let's write "xxx"^^yyy@zzz for a typed literal with *non-empty* language tag. Its interpretation is <L2V("xxx"), zzz>.

How exactly to distribute that logic between Simple Entailment and D-Entailment requires some thought. You can't remove plain literals from RDF without changing a couple lines of RDF Semantics ...

This entire proposal breaks backwards compatibility in two ways:

1. The following Turtle file would now contain only one triple instead of two:

   <a> <b> "foo", "foo"^^xsd:string .

This obviously has some serious knock-on effects, for example SPARQL stores that have already loaded this file now need to drop a triple, which changes the results of many queries.

2. In SPARQL, datatype("foo"@en) would now report xsd:string instead of ø. That seems like a good thing to me (it's explainable by saying that the language tag is “attached” to the “outside” of the typed literal). I believe this is *fairly* unlikely to cause interoperability issues with existing queries.

Best,
Richard

Received on Friday, 13 May 2011 17:42:54 UTC