Re: Every CONSTRUCT is DISTINCT? from Bijan Parsia on 2007-10-15 (public-rdf-dawg@w3.org from October to December 2007)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Mon, 15 Oct 2007 20:04:59 +0100
To: Lee Feigenbaum <lee@thefigtrees.net>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <909C2C13-44BF-4B97-978A-4C91D172752C@cs.man.ac.uk>

On 15 Oct 2007, at 19:46, Lee Feigenbaum wrote:

> Bijan Parsia wrote:
>> On 15 Oct 2007, at 15:49, Lee Feigenbaum wrote:
[snip]
>> We discussed this on IRC and this is a clever bit of spec reading.  
>> It does then highlight the need for a CONSTRUCT DISTINCT.
>
> Hmm, I don't see why... The spec. defines CONSTRUCT and SELECT in  
> terms of the mathematical (for lack of a better word) results - in  
> CONSTRUCT's case it's a set of triples and in SELECT's result it's  
> a solution sequence. The only time the query language spec. refers  
> to serializaiton is in an informative example of RDF/XML results  
> and in references to the SPARQL Query Results XML Format.

That doesn't mean that it couldn't. Frankly, I'm no where nearly as  
blase as you about treating this as merely a serialization issue. I  
concede it can be treated that way. However, it's not like  
implementations produce a graph in some internal representation, then  
say, "oh what the heck, let's insert some dups". I believe they are  
streaming out the answers and it's exactly analogous to streaming out  
xml results. The dups stem from dups in the results, not from  
artifacts of the serialization.

>> Be that as it may, I as an implementor and a user would find it  
>> helpful if there were a note pointing out this aspect. I confess  
>> that I would never in this lifetime have come up with that  
>> reading. So, if it would be possible to add a bit of text  
>> somewhere that clarified this point, I think that'd be swell.
>
> What would it say?

"Please note that due to serialization freedom, the serialized  
results may contain, syntactically, duplicate triples. There is no  
way in SPARQL to force the endpoint to return a syntactically  
duplicate free CONSTRUCTed graph."

> As far as I can see, any confusion about whether to expect  
> duplicates or not is really a product of the serialization rather  
> than of the query language.

I don't see why we can't informatively mention this from the query  
language spec. The consequence is that, as implementor, I don't have  
to distinct my results before constructing anything. That seems  
perfectly relevant in the query document.

> Even the protocol doesn't mandate any particular serialization of  
> an RDF graph. If there existed a serialization that prohibited  
> listing the same triple twice (are there?), then I'd imagine that  
> it would work fine with the protocol as-is.

So we can serialize to Turtle? Isn't this a pretty big  
interoperability hole?

> I'm not saying I object to a bit of (informative) text giving a  
> heads-up somewhere... I'm just not sure where it would go and what  
> it would say.

I would put it right after the passage I quoted. I would put some  
wordsmithed version of what I wrote above.

Cheers,
Bijan.

Received on Monday, 15 October 2007 19:03:47 UTC