Re: RDF-ISSUE-120 (set-of-triples-are-graphs): Is any set of RDF triples an RDF graph? [RDF Concepts] from Pat Hayes on 2013-03-14 (public-rdf-wg@w3.org from March 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 14 Mar 2013 10:58:14 -0500
To: RDF Working Group <public-rdf-wg@w3.org>
Message-Id: <3C207649-F69B-426B-A31F-222AD6394D7C@ihmc.us>
On Mar 14, 2013, at 4:39 AM, RDF Working Group Issue Tracker wrote:

> RDF-ISSUE-120 (set-of-triples-are-graphs): Is any set of RDF triples an RDF graph? [RDF Concepts]
> 
> http://www.w3.org/2011/rdf-wg/track/issues/120
> 
> Raised by: Antoine Zimmermann
> On product: RDF Concepts
> 
> All work on RDF until early 2013 have been made under the assumption that a set of RDF triples is an RDF Graph (and vice versa).

No, that is not true. The 2004 specs and the current Concepts both say that a graph is a set of triples. It does not follow, and it has never been accepted in practice, that *every* set of triples is a graph.  Treating a graph as a set is a convenience of abstract syntax, but it does not imply that all triple sets are graphs. We do not have a comprehension principle for RDF graphs.

While it is true that the 2004 specs did not give any criteria for distinguishing graphs from arbitrary triple sets, actual RDF practice has always maintained such a distinction. For example, consider a set of triples comprising one triple from every RDF document ever published. Unless this set is actually assembled somewhere (which I am confident has never been done and never will be done) then this is not an actual RDF graph. (If it ever were assembled, than the document or triple store containing it would define its scope graph.) Graphs are sets of triples that are defined by RDF documents or datastructures or sources in one way or another. Unless they are so defined (in some way, and the variety of such ways may be open-ended and extensible) they are not RDF graphs. 

Now, it is also true that for many purposes, such as defining the semantic rules for RDF, this distinction is unimportant and we can simply talk of RDF graphs as sets of triples and vice versa (rather in the way that we can treat all interpretations as applying to all IRIs when we all know that we really only need to consider the IRIs actually in the graphs we are considering at any given time.) And this is useful, because then we make sure that the semantics applies to any RDF graphs that havnt been written yet, which is as it should be. But when talking about blank nodes, whatever they are, it is necessary (and closer to actual practice) to be slightly more careful. 

The point about blank nodes is that they have no actual identity beyond being a 'place' in a graph. So to say that two graphs share a blank node makes no sense, unless these two graphs are part of a larger structure - a larger graph, or an enclosing dataset - which can be the location of this 'place' in a larger graph. The set-theoretic abstract syntax treats bnodes as real things (as it must, because set theory treats all things alike as being real things) just as it treats IRIs and literals, and THIS IS A PROBLEM with the 2004 abstract syntax. it is actually a bug in the 2004 abstract syntax, because no deployed RDF system has ever treated blank nodes in this way, nor ever will. To emphasise: we are not talking about blank node IDs here, but the "actual" blank nodes they identify. Suppose you have a graph described using Ntriples using the bnodeID _:x, and another graph described in another document, unrelated to the first, described using the bnodeID _:y. These are different bnodeIDs in different graph-describing documents from unrelated sources. But do they identify the same "actual" bnode, or not? As far as the 2004 account goes, they might. If blank nodes are a kind of smooth pebble, then it is quite possible for these two unrelated graphs to have picked up the same pebble and both be using it. This is obviously nonsensical, but nothing in the pure-set-theoretic 2004 abstract syntax prohibits it. 

How could you possibly tell? The question isn't even meaningful: you can't open up the "actual" blank node and look at it, or compare this one to that one to "see" if they are the same. These blank node IDs don't identify real pebbly things that have an identity: they only identify 'nodes' in an abstract notion of the structure of the graph being described. Of course, we (and all RDF processors) simply assume that if these IDs are inside distinct, unrelated, documents - if they are not obviously being used in the same local scope, as local variables, or were somehow extracted from a common source where they were so used as local variables - then the must mean distinct blank nodes. (You should be able to smell scopes at this point.) Which is obviously the only coherent way to interpret RDF syntax. But it is not what you should do, if blank nodes really do have an identity of their own, as the 2004 specs seem to imply that they do. 

Adding scopes (either of bnodeIDs or of bnodes themselves, as I outlined in the previous email) fixes this completely artificial problem in the 2004 specs by bringing the description of blank nodes into line with actual computational practice. It is a minimal change to make the specs coherent.

> Recent discussions on bnode scope suggest that there are combinations of RDF triples that do not form a graph. Precisely, the idea is that only the triples that belong to the same "scope" (whatever that means) can be in the same RDF graph.

But that is not a restriction, as one can simply define "scope" so that all graphs are in scopes. That is, if you want to keep a graph=set comprehension principle, you are free to define your scopes so that every graph is in its own scope. Good luck interoperating with other RDF users, of course. 

> This also impact the definition of an RDF triple, as there can be two blank nodes in the same triple.

Um... yes, there can. So what? 

Pat

> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 14 March 2013 15:58:46 UTC