Re: RDF Graph from Frank Manola on 2004-10-11 (www-rdf-interest@w3.org from October 2004)

From: Frank Manola <fmanola@acm.org>
Date: Mon, 11 Oct 2004 11:59:27 -0400
To: Graham Klyne <GK@ninebynine.org>
CC: "www-rdf-interest@w3.org" <www-rdf-interest@w3.org>
Message-ID: <416AADDF.8020507@acm.org>

Graham Klyne wrote:
> 
snip
> 
> Thus, according to the RDF specification, a graph with duplicate 
> statements is not distinguishable in any way from the same graph with 
> duplicate statements removed.  To this extent, the choice to remove or 
> not remove duplicate statements is an implementation decision, which the 
> RDF specifications (rightly IMO) do not constrain.
> 
> However, this is a point that the DAWG work may need to consider, since 
> the difference may become visible in a graph query.  Certainly, in my 
> own work, if I stored a graph with duplicate statements such duplication 
> would become visible in the results of a graph query in a way that I 
> think is not really desirable, particularly when an RDF graph is 
> described as a *set* of statements.
> 

IMO, "not really desirable" in the last sentence above is a considerable 
understatement.  Either duplicate statements are supposed to be 
individually significant, or they're not (and the specs say they're 
not).  If they're individually significant, then there needs to be a way 
to distinguish among them (and there isn't really a good one).  If 
they're not individually significant, then they should be ignored 
(physically deleting duplicates may be the most straightforward way to 
ignore them, but I suppose you could implement the operators to ignore 
them without physically deleting them).  There is a vast literature on 
the complexity that has resulted from allowing duplicate rows into 
relational databases and query languages (think about defining the 
results of a COUNT function, for example).  Please let's not go there in 
RDF!

--Frank

Received on Monday, 11 October 2004 15:55:31 UTC