Re: [All] Proposal: RDF Graph Identification (definition of "named graph") from Pat Hayes on 2012-08-16 (public-rdf-wg@w3.org from August 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 16 Aug 2012 16:13:30 -0500
To: Sandro Hawke <sandro@w3.org>
Cc: public-rdf-wg@w3.org
Message-Id: <3795B742-67EA-4C0D-9B25-CF342DA84116@ihmc.us>
On Aug 16, 2012, at 2:52 PM, Sandro Hawke wrote:

> On 08/16/2012 01:45 PM, Pat Hayes wrote:
>> 4. "Note that “named graph” is a relation, not a class: we say that something is a named graph of a dataset, not simply that it is a named graph."  What does this mean? Is it intended to convey the idea that the naming is local to the dataset? If not, what is it supposed to convey? Put another way, what is wrong with saying that something just is a named graph? 
> 
> That text is something I wrote a while ago, trying to put forward a definition of the term "named graph" that was consistent with popular usage and also made sense to me.   Clearly I didn't express it well enough.   Let's see if I can do it better now, not writing in a spec.
> 
> As you may remember, the term "named graph" used to bother me a lot.   It bothered me because:
> 
>  * when I hear "graph" I really do think of RDF Graphs (g-snaps).  I think most people comfortable with the term "named graph" are actually thinking of a g-box when they hear "graph" in this context.

True, but that applies just as much to the useage of the term "graph" by itself. It has nothing to do with naming. 

>  * when I think of "naming" in RDF, I think of picking an IRI for some entity and encouraging everyone in the world to use that same IRI for that same entity, like http://www.w3.org/People/Berners-Lee/card#i for TimBL.   

Right, and this is exactly the sense envisioned in the original paper. Graphs published on the Web can be given names which can then be used to refer to those graphs in other RDF graphs. 

> Often people working with "named graphs" are really just associating a string with the graph, in some local name-binding relationship.

Some are, but not all of them. IMO it is a pity that this kind of local noodling is called anything to do with naming at all. 

> Given those meaning, making a "named graph" would be kind of silly.   Richard captured this beautifully by pointing out http://en.wikipedia.org/wiki/Gallery_of_named_graphs .  Those are named graphs, given my understanding of the words "named" and "graph" (although using non-RDF variants of both those terms).

Yes, but what is your point? That only non-RDF graphs can have real names? Surely not. Consider for just one example the RDF graph www.w3.org/TR/owl-guide/wine.rdf . 

>   I don't know how common my understanding here is; I do know it's shared by TimBL, so it's not just me.

If I understand your argument here, it seems to be similar to the argument that because some children play at being postmen, houses shouldn't have real addresses. 

> There's also the problem that the formal definition provided by Carroll et al [*] and used by SPARQL is that the "named graph" is not a graph at all, but is rather a pair of a name and a graph.    As I hear people talk, using the term "named graph", they don't seem to be referring to a pair.   I've managed to find a few cases in English that use this kind of construct: the weight of a clothed person is the sum of the weight of the clothing and the clothed person (that is, the person who happens to be clothed).   So, yes, a named graph can be both the pair of a name and a graph *and* the graph that is paired with the name.  But that seems really awkward.

I think it is what mathematicians call a "harmless abuse of terminology". Formally, and exactly, a named graph is a <name, graph> pair, but we often, when no confusion would arise, speak of just hte graph as a 'named graph'. Strictly speaking, Gi s a named graph if <n, G> is in some asserted dataset for some n. 

> 
> The way I hear the term "Named Graph" used, in diverse settings including the SPARQL WG and intro-to-semweb classes, is as a place within a collection of RDF triples where some triples are set aside, kept somehow separate.   Someone has a bunch of triples and to help manage them better they subdivide the collection into "Named Graphs".   Often, but not always, this is a mutable collection -- certain triples are added to certain named graphs from time to time, as circumstances change.   
> 
> I'm pretty sure the term is always used in the context of a larger dataset/graphstore.   People don't refer to a single file or web page of RDF triples as a Named Graph. It's my understanding that when people said they really wanted Named Graphs [1] [2], this is what they were talking about -- the ability to segment or subdivide a triplestore, to help with various kinds of data management, including managing changes and provenance.   In a sense, it might better be called a "subdivision", or a "named subgraph".

Well, OK. But then several things. 

(1) This is most definitely not what was meant by the original 'named graph' proposal in Carroll et. al. , so when the term was re-used in the SPARQL specs, it was apparently mis-re-used. I am not at all convinced that everyone who uses the term uses it in this sense and not in the original sense. For the Provenance group, for example, I will have to report to them (if we adopt this understanding) that 'named graphs' are not adequate to support their notion of a bundle, and they are on their own. They are assuming it has something like its original meaning. 

(2) This is not captured by the semantics in the document.  I do not know how to even start trying to formalize this "local"  idea, and suspect it would be better left unformalized. 

(3) Your examples of timed data using named graphs as 'snalshots', and similar uses of named graphs to assert some RDF in a local "context", do not seem to me to square with this understanding. But then it does not square with the current semantics, either, so maybe this is not so imporant. 

> So, back to spec text.
> 
> SPARQL formally defines a named graph, to be any of the (name, graph) pairs in a dataset.
> 
> True.  And I wish we could propose a transition path to a less confusing terminology.    I think those things should be called "name-graph pairs".  Hard to change now, I know.

And in this repect, at least, SPARQL does follow the original paper, where we included the name in the named graph. The motivation there, as I recall, was to allow two 'copies' of the same graph to be distinguished. 

> In practice, the term is often used to refer to the graph part of those pairs. This is the usage we follow in this document, saying that a graph is a named graph in some dataset if and only if it appears as the graph part of a (name, graph) pair in that dataset.
> I'm still happy with that definition, and comfortable using the term "Named Graph" when defined this way.  The "graph" part is an RDF Graph.  The name denotes some object in the normal (global) RDF way, and that denoted object is associated with the graph in a dataset-local name-graph binding pair.  

OK, but....

> Nearly all of the value of particular dataset is its name-graph pairs, so of course they're local.

... What? What do you mean by "local" ?? I take this to mean that it cannot be seen from 'outside'. (As in 'local variable' or 'local binding'.) Is this what you mean? How does this follow "of course'?

> 
> Note that “named graph” is a relation, not a class: we say that something is a named graph of a dataset, not simply that it is a named graph.
> 
> It seems to me that it's nonsense to ask whether the graph { <a> <b> 1 } is a named graph.    There is no class of named graphs.  Instead we'd have to ask whether the graph { <a> <b> 1 } is a named graph of some particular dataset.
> 
> Linguistically, the term "named graph" seems like the name of a class of things, like "red car".  But it's more like "descendant",  "friend", and "neighbor".    If I say Joe is a descendant, ... well, that doesn't make much sense.  Instead, a complete sentence would have to be more like: Joe is a descendant of Irish Immigrants.

but you put yourself in this hole by insisting that the named graph was not the pair :-) Stick to established usage, and this grammatical quibble goes away.

> 
> It's like saying "7 is a prime factor" instead of "7 is a prime factor of 8638".
> 
> You asked:
> 
>> What does this mean? Is it intended to convey the idea that the naming is local to the dataset? If not, what is it supposed to convey? Put another way, what is wrong with saying that something just is a named graph? 
> 
> Have I answered that, now?    

Yes, but see my response above. 

Pat



> 
>      -- Sandro
> 
> 
> 
> 
> 
> 
> 
> [*] I think I've heard both you and Jeremy express that you don't think we should stick to that any more.
> [1] http://www.w3.org/2010/06/rdf-work-items/table
> [2] http://www.w3.org/2002/09/wbs/1/rdf-2010/results
> 
> 
> 
> And assigning IRIs to g-snaps in a global name mapping -- the way we name  is kind of a silly thing, in general.  
> 
>  but I eventually found an way of thinking about it that I was comfortable with.  I was trying to capture that, but i clearly failed to capture it clearly.
> 
> 
> 
> 
> One of the problems with the term is that linguistically it looks a named graph should be a kind of graph,  

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 16 August 2012 21:14:02 UTC