Re: SPARQL, named graphs and default graph from Chimezie Ogbuji on 2006-09-13 (public-sparql-dev@w3.org from July to September 2006)

From: Chimezie Ogbuji <ogbujic@bio.ri.ccf.org>
Date: Wed, 13 Sep 2006 09:45:42 -0400 (EDT)
To: Richard Cyganiak <richard@cyganiak.de>
cc: Nuutti Kotivuori <naked@iki.fi>, public-sparql-dev@w3.org
Message-ID: <Pine.GSO.4.60.0609130920150.3877@joplin.bio.ri.ccf.org>
On Wed, 13 Sep 2006, Richard Cyganiak wrote:

> Hi Nuutti,
>
> Without having thought through all the consequences ...
>
> Some of your options are not really possible with named graphs because graphs 
> need to be *named*, that is, the name *must* be a URI and not a blank node.

I don't agree.  What's the source of this assertion? I think the core 
issue here is that there is *no* concensus formalism for named graphs WRT RDF, yet SPARQL is dependent 
on an RDF model that supports named graphs.  If there is one, please 
point me to it, because I ran across the same problem when constructing 
programming APIs for named graphs.  The only formalism I know of is Graham Kyle, John McCarthy's work [1].

> Blank nodes are always scoped to a single graph, and using blank nodes as 
> graph labels would make it impossible to refer to a named graph from the 
> outside world. This excludes #3 and #4.

Well, Blank nodes used within a graph can't be referred to 
directly but they can still be matched by SPARQL - doesn't make them any 
less useful.  The problem isn't the use of Blank nodes for graph names but
a the lack of a mechanism [2] to match the graph name(s) associated with a 
node.  Given how closely coupled SPARQL is with (admittedly informal) 
named graph semantics, I would expect to be able to answer questions such as:

"What are the graph names in which all the statements about <someIRI> are 
asserted?"

Assuming I could answer this question, then graph labels that are blank 
nodes become as accessible as blank nodes asserted *within* a graph and it 
becomes a question of what is the appropriate use for a bnode as a graph 
label?

If BNodes are used for existential assertions about nodes, why wouldn't 
they be used as existential assertions about graphs? And if there is 
some semantic consequence, it furthers the argument that the formalisms 
for named graphs should be well articulated before they are tightly integrated into a query language.

> I would suggest that Alice and Bob each mint a new URI for the graph 
> containing the statements of unknown origin *in their own store*. Or mint a 
> new URI to hold each individual statement, or anything in between. Since the 
> owner of a URI gets to say what the meaning of the URI is, they can declare 
> that this chunk of URI space is reserved for this purpose (assuming Alice and 
> Bob each own a chunk of URI space).
>
> I wonder why you discounted this solution?

I don't think it's an elegant solution when we already have the means 
(within 'vanilla' RDF Model Theory) to express 
existential assertions - which is exactly the scenario here.

If a graph label is nothing but a name associated with a set of graphs, 
why should it not behave the same as the name associated with a node 
within a graph?

> I also question the existence of "statements without a known origin". They 
> surely didn't just pop up magically inside your triple store, eh? I guess 
> it's more like "statements whose origin I don't want to model".

How different is this from "nodes whose names I don't care to maintain / 
model?"

[1] http://ninebynine.org/RDFNotes/UsingContextsWithRDF.html#xtocid-6303976
[2] http://copia.ogbuji.net/blog/2006-07-14/querying-named-rdf-graph-aggregate

Chimezie Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic@ccf.org


>
>
> On 11 Sep 2006, at 19:51, Nuutti Kotivuori wrote:
>
>> 
>> This isn't exactly a SPARQL question, but it is very closely
>> related. I will first outline the question context.
>> 
>> Assume an RDF statement store, which has a mechanism for tracking
>> statement origin (scope, context, graph, source whatever). Many of the
>> statements have a distinct origin, or source graph, they were imported
>> from. But there are also those which either seemingly have no origin,
>> or the origin is not known. The origin of these statements have to be
>> handled somehow. We'll come to the specific choices later on.
>> 
>> This statement store offers a SPARQL query interface into it. The
>> facilities for querying named graphs in SPARQL would obviously be used
>> to query the different origins in the store. But there are two things
>> to decide. First, how should statements without an origin be accessed
>> in SPARQL? There are several choices on this, which I will outline
>> below. And related to the first one, second, what should the default
>> graph be for the queries if none is given explicitly.
>> 
>> I will list a few possibilities and mention the problems and benefits
>> that seem to result from them as a basis for discussion.
>> 
>>  1. Unknown origin is a distinct node, but separate from all uris,
>>     blank nodes or literals. The default graph for the query is the
>>     graph of the unknown origin nodes.
>> 
>>     - Separation of identifier spaces, no fear of any overlap. The
>>       graph of statements with unknown origin is separate from any
>>       named graph.
>> 
>>     - Since there is no way to represent the unknown origin in SPARQL
>>       syntax, the default graph is the only way to access the nodes in
>>       that graph.
>> 
>>     - The nodes in the unknown origin graph are not matched by any
>>       graph query, since the name of the graph could not be returned
>>       reasonably. That is:
>> 
>>       SELECT ?g ?s ?o ?p
>>       WHERE { GRAPH ?g { ?s ?p ?o } }
>> 
>>       cannot return ?g for the unknown origin graph.
>> 
>>  2. Unknown origin is a distinct node, as above. The default graph is
>>     the RDF merge of all graphs in the store, including the statements
>>     with an unknown origin.
>> 
>>     - The problems above.
>> 
>>     - In addition, there is no way to select nodes that explicitly
>>       have an unknown origin. (Or is there? Could one match all the
>>       statements for which there is no graph with the same statement?
>>       In any case, this would be quite contorted.)
>> 
>>  3. Unknown origin is represented by a distinct blank node; that is,
>>     every statement has it's own blank node as the graph name, which
>>     is not shared with any of the other statements. The default graph
>>     is the RDF merge of all graphs in the store, including the
>>     statements with an unknown origin.
>> 
>>     - This is probably closest to accurate modelling of the
>>       situation. We know every statement has an origin, we just don't
>>       know what it is - a situation commonly modelled with a blank
>>       node. Also, we don't know which statements might share an
>>       origin, so until we know better, we make them all distinct.
>> 
>>     - The origin of the statements is nicely queryable with SPARQL
>>       queries and every statement has an origin, even if unknown.
>> 
>>     - Queries which specify several statements from a single graph
>>       will not match the statements with unknown origins as it cannot
>>       be confirmed that they would be from the same graph.
>> 
>>     - There is no way to match the origin of a single statement as
>>       there is no way to match a certain blank node explicitly. The
>>       current SPARQL treats it as an open variable(?).
>> 
>>     - There is no way to explicitly match statements that have an
>>       unknown origin, since the origins are just distinct blank nodes.
>> 
>>     - Possibly hard to implement, because of the number of distinct
>>       blank nodes.
>> 
>>  4. Unknown origin is represented by a singleton blank node; that is,
>>     every statement with an unknown origin shares one single blank
>>     node as the graph name. The default graph is the RDF merge of all
>>     graphs in the store.
>> 
>>     - Lumps all statements with an unknown origin under a single named
>>       graph. Queries which match several statements from a single
>>       graph will match statement sets from unknown origin as well.
>> 
>>     - The origin of the statements is nicely queryable with SPARQL
>>       queries and every statement has an origin, even if unknown.
>> 
>>     - There is no way to explicitly match statements that have an
>>       unknown origin, since the origin is a single blank node. If the
>>       application provided a magic type for this blank node (_:x a
>>       rdfx:UnknownOrigin), this could be matched with:
>> 
>>       SELECT ?s ?o ?p
>>       WHERE { ?g a rdfx:UnknownOrigin .
>>               GRAPH ?g { ?s ?o ?p } }
>> 
>>       But this again is quite contorted. (The same could be applied to
>>       the third case as well, but the implementation of that would be
>>       really tricky to be effecient.)
>> 
>>  5. Unknown origin is represented by a singleton blank node as
>>     above. The default graph is the singleton blank node of unknown
>>     origin.
>> 
>>     - Mostly as above, but in the common case, explictly matching
>>       statements that have an unknown origin would be easy in just
>>       matching the statements from the default graph.
>> 
>>  6. Unknown origin is represented by a well known URI that is shared
>>     universally. The default graph is the RDF merge of all graphs in
>>     the store.
>> 
>>     - Somewhat incorrectly asserts that the statements have a certain
>>       origin, even though we don't know the origin.
>> 
>>     - The origin of the statements is nicely queryable with SPARQL.
>> 
>>     - Statements with an unknown origin can be easily explicitly
>>       matched by comparing them against the well known URI.
>> 
>>     - Assigns a special meaning to an URI.
>> 
>>     - Hard to coordinate with a number of people implementing similar
>>       solutions if not standardized.
>> 
>> Some other variants of the above were omitted, since their problems
>> and benefits are easily reasoned.
>> 
>> On irc, 'chimenzie' outlined the problem as such:
>> 
>> 17:35 chimezie:#swig => Hmm.. well, seems like what is missing is a good
>>       definition of a 'name for nodes that don't have an explicit context'
>> 17:36 chimezie:#swig => or rather 'a name for the context of nodes that 
>> aren't
>>       assigned to a context explicitely'
>> 
>> So, I'm out for some input on what might be the sanest route to
>> through this.
>> 
>> TIA,
>> -- Naked
>> 
>> 
>> 
>
>
Received on Wednesday, 13 September 2006 13:46:12 UTC