- From: Frank Manola <fmanola@acm.org>
- Date: Mon, 11 Oct 2004 11:59:27 -0400
- To: Graham Klyne <GK@ninebynine.org>
- CC: "www-rdf-interest@w3.org" <www-rdf-interest@w3.org>
Graham Klyne wrote: > snip > > Thus, according to the RDF specification, a graph with duplicate > statements is not distinguishable in any way from the same graph with > duplicate statements removed. To this extent, the choice to remove or > not remove duplicate statements is an implementation decision, which the > RDF specifications (rightly IMO) do not constrain. > > However, this is a point that the DAWG work may need to consider, since > the difference may become visible in a graph query. Certainly, in my > own work, if I stored a graph with duplicate statements such duplication > would become visible in the results of a graph query in a way that I > think is not really desirable, particularly when an RDF graph is > described as a *set* of statements. > IMO, "not really desirable" in the last sentence above is a considerable understatement. Either duplicate statements are supposed to be individually significant, or they're not (and the specs say they're not). If they're individually significant, then there needs to be a way to distinguish among them (and there isn't really a good one). If they're not individually significant, then they should be ignored (physically deleting duplicates may be the most straightforward way to ignore them, but I suppose you could implement the operators to ignore them without physically deleting them). There is a vast literature on the complexity that has resulted from allowing duplicate rows into relational databases and query languages (think about defining the results of a COUNT function, for example). Please let's not go there in RDF! --Frank
Received on Monday, 11 October 2004 15:55:31 UTC