- From: Bob MacGregor <bmacgregor@siderean.com>
- Date: Tue, 29 May 2007 17:03:10 -0700
- To: Pat Hayes <phayes@ihmc.us>
- Cc: public-rdf-dawg-comments@w3.org, "Eric Prud'hommeaux" <eric@w3.org>, "Richard Newman" <rnewman@franz.com>
- Message-Id: <47D358C3-730D-4FCF-B17F-BF1FE01EC8EE@siderean.com>
Hi Pat, On May 29, 2007, at 1412, Pat Hayes wrote: > >> Hi Richard, >> >> On May 28, 2007, at 1435, Richard Newman wrote: >> >>> Hi Bob, >>> >>> <snip> >>> >>> Regarding point 2: yes, AllegroGraph allows you to store >>> whatever you like in the graph field of a triple. Other stores >>> might not. I'm not sure that I agree with you about naming -- why >>> not mint URIs, or use UUID URNs? You can cram almost anything >>> into a URI! -- but you can certainly use variables in your queries. >>> >> >> The phrase "mint URIs" raises a red flag, since it is frequently >> contrary to the whole point of a URI. That is definitely true in >> this case. >> Suppose I have two graphs with identical triples, and identical >> provenance attached to their "graph names". I claim that these >> two graphs should be considered equivalent. If the graphs are >> identified with blank nodes, then that is indeed the case. Otherwise, >> its not. The presence of a URI overdefines the semantics of the >> provenance. Does this matter? Indeed it does. Our quad store >> does union and collapsing operations on provenance to increase >> performance (sometimes by orders of magnitude). The operations >> it performs are not valid if URIs are present. I would not be >> surprised if AllegroGraph does not yet incorporate these >> optimizations. >> However, once you start to use sufficiently aggressive provenance, >> its likely you will want to do the same. > ?? Bob, what are you talking about? Lets agree for the moment with > your claim that the two graphs should be equivalent (though Im > having trouble understanding how they can have *identical* > provenance information if one is a copy of another; perhaps we mean > something different by 'provenance'). You say that if they have > different names, they cannot be equivalent. Why not? The entire RDF/ > URI model allows a single entity to have more than one name. The > point of URIs is to identify, but not to identify uniquely. So in > fact the two graphs can be identical, if you like, like two > imprints of the same edition of a novel. > I guess I need to be a bit more explicit about the phrase 'equivalent'; since we deal with quads in our own system instead of triples, our notion of equivalence has evolved. So I will be more careful here: I didn't say that one graph was a copy of the other. I said that they had identical triples, i.e., an equivalence test that ignored provenance would return true. If the graph names are N1 and N2, I also asserted that provenance assertions/triples about N1 and N2 are also the same (same dc:source, same dc:date, etc.). I'm not asking if the two graphs can be identical, I'm asking if they ARE identical. If names matter (and they do), then absent an owl:sameAs assertion between N1 and N2, the graphs cannot be assumed to be the same. If names don't matter, e.g., if blank nodes are substituted for the names, then logically the graphs, including their provenance, are indistinguishable. In general, the kind of merging we want to do to preserve scalability in the presence of large scale provenance includes the ability to merge two graphs into one when their provenance triples are the same. Specifically, we don't usually care about equivalence between the contents of two graphs, but we do care about equivalence between the provenance statements attached to graphs. > Why are your optimizing collapsings not valid if URIs are present? > You can simply declare that your identity criteria on graphs allow > a graph (not a named graph, but an RDF graph) to have more than one > name without being a different graph. You are free to impose extra > semantics on the basic RDF model if you find it useful. I could also declare that for us, URIs don't matter within a graph, and we can collapse arbitrary triples if the literals are the same. But that would be absurd. I am assuming that if URIs are used to name graphs, then their is some reason why they are used in preference to blank nodes, which are currently illegal, as far as I understand. Of course, I'm not using a blank node to name a graph, I'm using it to refer to a graph. > Nothing in RDF or SPARQL suggests that different names cannot > denote the same thing. I never said or implied that they can't. > > A further puzzle is that you are happy if the name is a blank > node... do I have that right? That simply does not make sense to > me. Blank nodes cannot be used as names or identifiers. The meaning > of a blank node is to express an existential assertion. Using a > blank node as an identifier is meaningless. > My claim is that I should be able to manipulate graphs and assign them provenance without the need for naming the graphs. We are dealing with applications where the we may have 150,000 graphs (give or take an order of magnitude). There is no benefit to be derived by naming them. I've observed that people's thinking is frequently circumscribed by the nomenclature they use. This is likely the case for "named graphs". The SPARQL spec says that we can have only one unnamed graph; all of the others must have names. In our applications, we have very large numbers of unnamed graphs. To follow what I'm driving at, you will have to adjust your thinking, and toss out the notion of naming a graph. Cheers, Bob > Pat > > -- > --------------------------------------------------------------------- > IHMC (850)434 8903 or (650)494 3973 home > 40 South Alcaniz St. (850)202 4416 office > Pensacola (850)202 4440 fax > FL 32502 (850)291 0667 cell > phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes > > Bob MacGregor Chief Scientist Siderean Software, Inc. 310.647.5690 bmacgregor@siderean.com
Received on Wednesday, 30 May 2007 00:03:31 UTC