Re: What we mean by "graph" / Named Graphs in SD from Axel Polleres on 2010-07-20 (public-rdf-dawg@w3.org from July to September 2010)

From: Axel Polleres <axel.polleres@deri.org>
Date: Tue, 20 Jul 2010 22:26:03 +0100
To: "Sandro Hawke" <sandro@w3.org>
Cc: "SPARQL Working Group" <public-rdf-dawg@w3.org>
Message-Id: <969A9225-591F-44DE-A1D7-F7034F5ED8C3@deri.org>
Frankly, I am still confused. Anyone else finding named graphs so misleading?

Named graphs have been in SPARQL by the concept of a Dataset quite a long time, 
and I haven't heard heavy complaints about that before (again, those firm in "DAWG history" 
might have more on that, but not since I started to care about SPARQL). Also, in the recent
RDF workshop, I had the impression that a lot of people actually wanted named graphs to become 
part of RDF in one or the other way, cf. discussion at the RDF workshop [1]. It seems 
to me that possible ambiguities between named graphs or other possible forms of graph identification 
are not something that we can solve/tackle now, and starting a discussion about this is well placed 
in a next RDF working group and thus I am frankly hesitant to open up fundamental discussions 
in this group now that would shake concepts from SPARQL1.0 [2]. 

more comments inline below...

On 20 Jul 2010, at 19:22, Sandro Hawke wrote:

> One of the downsides of trying to do a lot is that we make mistakes.
> Or, at least, I do...
> 
> On the Service Description issue, I made the mistake of thinking a
> NamedGraph was a Graph.  
> 
> Andy made a comment that led me to actually look in Carroll et al 2005
> and see that, no, formally speaking "Named Graphs" would be better
> called "graph namings".  They are disjoint from RDF graphs; they consist
> of at least a "name" and an "rdfgraph".   So when I say it's crazy that
> named graphs are named with themselves -- the craziness really rests in
> Carroll et al calling something a named graph when it is not a graph.
> The NamedGraph may well have a name, but that's distinct from its
> "name", which is what it associates with its "rdfgraph".   (Arg!)
> 
> I wonder if there's a way out of this mess....    I guess we can at
> least clarify it in the SD document, with some warnings.   Anyone
> interesting in us getting away from the misleading term "NamedGraph"?
> It may seem like it's entrenched in SPARQL 1.0, but this is just
> editorial, as long as it keeps the GRAPH and FROM NAMED keywords, which
> actually seem fine to me.   (I expect we'll have a new RDF Core WG
> working on this issue in a few months; that's when this will really get
> entrenched.  I expect to strongly oppose this misleading use of the term
> "named graph" during that work, if necessary.)
> 
> Aside from that, it would help a little to change sd:name to
> sd:graphName, but it should still be a string if it's like that.
> (Maybe we should add RIF's pref:iri-string as a builtin, to address
> Andy's use case.

That could be an option, conversion between strings and URIs in both directions 
would I think be useful in the function library.

>  Remind me why RIF and SPARQL have their own builtins
> and you can't just borrow from the other?)

because, for one, some of the SPARQL built-ins (datatype(), blank(), isIRI(), IRI()) 
just can't be expressed in RIF's generic model for built-ins, that is purely "syntactic" 
built-ins don't "fit" with the model-theoretic semantics of RIF.

>   Alternatively just use the
> name as the URI label on the graph node; that seems fine...
> 
> Hmmm.
> 
> Meanwhile, I've been meaning to send a question about our use of the
> term "Graph", which is connected here.
> 
> It seems to me there are two different common meanings for the term
> "RDF Graph".  To use the AI terms for each of them:
> 
>         1. A Knowledge Base (KB); a specific repository or store of RDF
>         triples.  As in, "Please update your graph to remove the triple
>         <a> <b> <c>."

I always understood SPARQL to take this point of view so far, and that is 
why I don't understand what is the issue...

>         2. A Formula; a mathematical set of RDF triples.   As in, "Graph
>         G1 entails infinite other graphs".
>        
> The most crisp distinction may be around identity. Two formulas are
> identical if and only if they contain the same triples.  Meanwhile, KBs
> can have the same triples while remaining distinct.

... just like the named graphs in a dataset.

> It also makes
> sense to talk about the state of a KB, and a KB changing over time.

... which we need to, e.g. when defining the semantics of update.

> It
> makes no sense to say such things about a formula; it's just a pure
> mathematical set.
> 
> I think we can agree that formally, technically, only definition 2
> (formulas) is correct.

in what sense do you mean "correct"?

>  But I think meaning two is in common use;

You mean meaning 1. ?

> I
> expect most of us use it often.  When I say "graph" in the sense of
> definition 1, I mean it as shorthand for "graph storage location",
> "graph data structure", or "graph store".
> In spoken language, the
> context usually makes it clear whether people mean KB or formula.
> 
> The distinction also didn't matter much in SPARQL 1, I think, because
> it was agnostic on mutability, identity, etc.  I guess this will come
> up in the update semantics for 1.1.

Agreed, but I think it is not necessarily a problem, from the viewpoint of what you call definition 1., is it? 

>  And it may come up in the SD
> issue, above.
> 
> I wonder if we could agree on a standard name for sense 1,

But well... "Named Graphs" is just sense 1, isn't it?

> and try to
> use it.   (Or maybe we already did, and I missed it.)   As long as it's
> a term like "graph storage location", then using the keyword "GRAPH" as
> we do in the query language seems fine.
> 
>     -- Sandro

I am still wondering what is the exact issue... I have a vague conjecture (tell me if I am wrong)...
is your concern related to e.g. the fact that inferred triples can't be updated or, resp., that the semantics of 
updates of a "formula" is not clear? If so, yes, this is an issue, and we discussed it and decided not to tackle it 
in this round of SPARQL, cf. [3]. The entailment regimes document is quite explicit about this:

"SPARQL 1.1 Query [SPARQL 1.1 Query] defines basic graph pattern matching only for simple entailment, but it defines a set of conditions that have to be met when defining what correct results are for SPARQL queries under different entailment regimes. The goal of this document is to specify conditions such that SPARQL can be used with some other entailment regimes beyond simple entailment."

i.e. this document defines some BGP matching extensions as by the extension mechanism outlined in SPARQL1.0.
also, we explicitly said we will not tackle whether/how entailments interfer with updates, cf. [3]

best,
Axel

1. Quoting, http://www.w3.org/2009/12/rdf-ws/Report.html 
"the term "named graph", though widely used by the community, is ambiguous, and often refers to what could rather be referred to as quoted graphs, graph literals, etc. It was therefore decided to use the term “graph identification” for the purposes of reporting the workshop’s results, though this term is by no means definitive."

2. *Personally*, I think that graph identification for future RDF could well be based on the named graph concept as we use it in SPARQL's dataset definition, cf.
http://www.w3.org/2001/sw/wiki/RDF/NextStepWorkshop/AxelWishlist (note that I mean rdf:subGraphOf as a sheer syntactic construct similar to "imports").

3. http://www.w3.org/TR/sparql11-entailment/#id35812045
Received on Tuesday, 20 July 2010 21:26:36 UTC