Re: Every CONSTRUCT is DISTINCT?

Bijan Parsia wrote:
> 
> On 15 Oct 2007, at 19:46, Lee Feigenbaum wrote:
> 
>> Bijan Parsia wrote:
>>> On 15 Oct 2007, at 15:49, Lee Feigenbaum wrote:
> [snip]
>>> We discussed this on IRC and this is a clever bit of spec reading. It 
>>> does then highlight the need for a CONSTRUCT DISTINCT.
>>
>> Hmm, I don't see why... The spec. defines CONSTRUCT and SELECT in 
>> terms of the mathematical (for lack of a better word) results - in 
>> CONSTRUCT's case it's a set of triples and in SELECT's result it's a 
>> solution sequence. The only time the query language spec. refers to 
>> serializaiton is in an informative example of RDF/XML results and in 
>> references to the SPARQL Query Results XML Format.
> 
> That doesn't mean that it couldn't. Frankly, I'm no where nearly as 
> blase as you about treating this as merely a serialization issue. I 
> concede it can be treated that way. However, it's not like 
> implementations produce a graph in some internal representation, then 
> say, "oh what the heck, let's insert some dups". I believe they are 
> streaming out the answers and it's exactly analogous to streaming out 
> xml results. The dups stem from dups in the results, not from artifacts 
> of the serialization.
> 
>>> Be that as it may, I as an implementor and a user would find it 
>>> helpful if there were a note pointing out this aspect. I confess that 
>>> I would never in this lifetime have come up with that reading. So, if 
>>> it would be possible to add a bit of text somewhere that clarified 
>>> this point, I think that'd be swell.
>>
>> What would it say?
> 
> "Please note that due to serialization freedom, the serialized results 
> may contain, syntactically, duplicate triples. There is no way in SPARQL 
> to force the endpoint to return a syntactically duplicate free 
> CONSTRUCTed graph."
> 
>> As far as I can see, any confusion about whether to expect duplicates 
>> or not is really a product of the serialization rather than of the 
>> query language.
> 
> I don't see why we can't informatively mention this from the query 
> language spec. The consequence is that, as implementor, I don't have to 
> distinct my results before constructing anything. That seems perfectly 
> relevant in the query document.
> 
>> Even the protocol doesn't mandate any particular serialization of an 
>> RDF graph. If there existed a serialization that prohibited listing 
>> the same triple twice (are there?), then I'd imagine that it would 
>> work fine with the protocol as-is.
> 
> So we can serialize to Turtle? Isn't this a pretty big interoperability 
> hole?
> 
>> I'm not saying I object to a bit of (informative) text giving a 
>> heads-up somewhere... I'm just not sure where it would go and what it 
>> would say.
> 
> I would put it right after the passage I quoted. I would put some 
> wordsmithed version of what I wrote above.
> 
> Cheers,
> Bijan.
> 

I was confused by that exchange on IRC:

http://www.w3.org/TR/rdf-concepts/#section-data-model says:

"A set of such triples is called an RDF graph"

The result of CONSTRUCT is an RDF graph.

The serializations of RDF allow multiple occurrences of a triple - it's 
convenient sometimes; it can even be very hard for say, GRDDL, to transform to 
a set of triples.

This isn't spec weaselling.  Duplicates happen in RDF serializations anyway. 
The system in question is merely making use of that feature (which is nothing 
to do with SPARQL) for specific performance goals.  If some system streams 
with duplicates and users don't like that, discuss it with the system 
developers.  They have their reasons for their implementation; there's a 
discussion to be had between user and developer.

If you ask the SPARQL query "{ :s :p :o }" on the CONSTRUCT results , there is 
zero or one matches. Two or more would be wrong. If your RDF system reveals 
duplicates, you need to file a bug report with the developers.

	Andy

-- 
  Hewlett-Packard Limited
  Registered Office: Cain Road, Bracknell, Berks RG12 1HN
  Registered No: 690597 England

Received on Monday, 15 October 2007 19:30:08 UTC