Re: ISSUE-12: xs:string VS plain literals: proposed resolution from Pat Hayes on 2011-05-06 (public-rdf-wg@w3.org from May 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 6 May 2011 18:26:52 -0500
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-rdf-wg@w3.org
Message-Id: <791BCFA4-D238-4FF0-95E0-15651ECF7FBC@ihmc.us>
On May 6, 2011, at 12:11 PM, Andy Seaborne wrote:

> It was Sandro who introduced SPARQL into the thread.  I don't agree that its a "grave mistake" in SPARQL.  Treatment should be uniform whether using SPARQL or some other way of accessing the data (SPARQL engines are often written over a base API anyway).
> 
> The proposed text is:
> """
> Recommend that data publishers use plain literals instead of xs:string typed literals and tell systems to silently convert xs:string literals to plain literals without language tag
> """
> 
> This is an RDF-as-data view; this is not D-entailment.

But my understanding is that the main (only?) reason for this suggestion is to make RDF data more accessible to SPARQL querying, because at present a query has to be couched in both forms in order to find both kinds of literal. If there is any other reason for this suggestion (which runs directly counter to all the thinking and discussion and advice that has so far been published on this topic since 2004) then I would like to see it spelled out in detail. And we should actively request input from OWL 2 and RIF representatives before making this recommendation. 

If this is the primary reason for this suggestion, then my point is that this effect - of having one query find both kinds of literal as answers - can be achieved by SPARQL using {xsd:string}-entailment rather than simple entailment. And, further, that if this is the only reason for this suggestion, that this is business for the SPARQL WG to consider rather than us. I do not believe that it appropriate for us to recommend that people write their RDF graphs in a certain way, unless we have very strong reasons for this and can articulate them clearly (and then also explain why we did not alter RDF to make this suggestion mandatory, if the reasons are so strong.) 

>  It is not necessarily a change to SPARQL query, which has to work with old and new data.
> 
> :x :p "foo" .
> :x :p "foo"^^xsd:string .
> 
> One triple or two? The proposal says (ideally) one.

Actually, the proposal as written does not say this. This is definitely two literals. The proposal would rewrite this graph to one with a single literal, but it would not be the same graph. 

Pat

> 
> [[
> The strongest I can find in the RDF docs is in Concepts: sec 6.5.2 as a note.
> ]]
> 
> 	Andy
> 
> On 06/05/11 15:35, Pat Hayes wrote:
>> 
>> On May 6, 2011, at 9:09 AM, Andy Seaborne wrote:
>> 
>>> See
>>> 
>>> http://www.w3.org/TR/sparql11-entailment/#id35808654
>> 
>> OK, so how many SPARQL engines support D-entailment? How do they indicate to the world which form of D-entailemnt they use (ie what D is, exactly) ?
>> 
>> Why not include xsd:string into the basic SPARQL entailment regime? It wouldnt be difficult to make this change in the specs wording, though the test cases would need some revision.
>> 
>> BTW, if the answer is, it would screw up existing implementations, then this is also an argument against RDF making any changes.
>> 
>> Pat
>> 
>>> 
>>> which depends on RDFS entailment
>>> 
>>> 	Andy
>>> 
>>> On 06/05/11 14:57, Pat Hayes wrote:
>>>> This discussion illustrates in a nutshell the essential tension at the core of SPARQL. Should a query be 'semantic', entirely about meanings, or should it be basically a process of syntactic matching? If one believes the semantic position, then it is natural to express the basic process in terms of entailment (the graph entails the query instance) and natural to treat semantically equivalent things as indistinguishable. However, it is also natural to not have such things as answer counts, no-match filters, and most of the actual apparatus of SPARQL, since none of this is *entailed* by the query graph, indeed by any graph at all. All of this is essentially syntactic information *about* the graph. Which is why I slowly came to the realization that to even talk about entailment in the context of querying is wrong. Querying is not a semantic operation, it is about the syntactic form of the graph.
>>>> 
>>>> OK, we can always talk about simple entailment to make us feel warm and fuzzy, but simple  )entailment is so simple that it amounts to a syntactic match anyway. But consider the following resolution of this meaningless issue: SPARQL should use {xsd:string}-entailment rather than simple entailment. (That is, D-entaiment where D is {xsd:string}. This will give exactly the behavior Sandro wants, and the required ideas and definitions have been in the RDF spec since 2004. So why are we, the RDF WG, even discussing this at all? We have already given SPARQL enough room in the RDF specs to do it properly.
>>>> 
>>>> Now, this resolution will not fly, I predict, because SPARQL does not want to get into any richer kind of entailment than simple entailment, but wants RDF to make things work out nicely even while it is doing simple syntactic matching. Because simple *syntactic* matching is the only kind of matching that is fine-grained enough to satisfy people who want to write filters on query results.
>>>> 
>>>> Pat
>>>> 
>>>> 
>>>> 
>>>> On May 6, 2011, at 7:32 AM, Andy Seaborne wrote:
>>>> 
>>>>> 
>>>>> 
>>>>> On 06/05/11 13:16, David Wood wrote:
>>>>>> On May 6, 2011, at 7:44, Sandro Hawke<sandro@w3.org>    wrote:
>>>>>> 
>>>>>>> On Fri, 2011-05-06 at 09:33 +0100, Andy Seaborne wrote:
>>>>>>>> 
>>>>>>>> I wonder if most people would be happen if we emphasised that it's
>>>>>>>> the
>>>>>>>> value that matters.  xsd:string and simple literal have the same
>>>>>>>> value,
>>>>>>>> as do 00123 and +123.
>>>>>>> 
>>>>>>> I guess it depends what you mean by 'emphasise'...
>>>>>>> 
>>>>>>> I was shocked to discover SPARQL cared about the difference, and thought
>>>>>>> it was a grave mistake at the time (but I didn't notice until it was too
>>>>>>> late).  I had assumed everyone already knew you should just care about
>>>>>>> the value, and that every API should convert for you, hiding the
>>>>>>> difference.  But I was wrong, and I don't really know how to get people
>>>>>>> to use the "Semantic Web" technologies at a "semantic" level.
>>>>>> 
>>>>>> +1. Of course, it would help if we standardized it that way :)
>>>>> 
>>>>> And better
>>>>> "if we *had* standardized it that way"  :-)
>>>>> 
>>>>>> Regards,
>>>>>> Dave
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>>    -- Sandro
>>>>> 
>>>>> There are a couple of factors that matter here:
>>>>> 
>>>>> 1/ Users expect what goes to be the same as what comes out.
>>>>> (tools do as well sometimes)
>>>>> 
>>>>> If they read in
>>>>> 
>>>>> :x :p "foo"^^xsd:string .
>>>>> 
>>>>> and get back:
>>>>> 
>>>>> :x :p "foo" .
>>>>> 
>>>>> enough of them are surprised (=>   they send email to support lists asking about it).
>>>>> 
>>>>> 2/ SPARQL FILTERs don't care - it's graph matching that does because graph matching is simple entailment.  And that's what most toolkit provide - the direct manipulation of the RDF terms, lexical form, datatype and all.
>>>>> 
>>>>> :x :p "foo" .
>>>>> :x :p "foo"^^xsd:string .
>>>>> 
>>>>> One triple or two?
>>>>> 
>>>>> 	Andy
>>>>> 
>>>>> (For the record : "foo"^^xsd:string matches "foo" in a Jena memory model -- there would be two triples.)
>>>>> 
>>>>> 
>>>> 
>>>> ------------------------------------------------------------
>>>> IHMC                                     (850)434 8903 or (650)494 3973
>>>> 40 South Alcaniz St.           (850)202 4416   office
>>>> Pensacola                            (850)202 4440   fax
>>>> FL 32502                              (850)291 0667   mobile
>>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 6 May 2011 23:27:38 UTC