Re: SPARQL, named graphs and default graph from Dan Connolly on 2006-09-13 (public-sparql-dev@w3.org from July to September 2006)

From: Dan Connolly <connolly@w3.org>
Date: Wed, 13 Sep 2006 09:33:18 -0500
To: Chimezie Ogbuji <ogbujic@bio.ri.ccf.org>
Cc: Richard Cyganiak <richard@cyganiak.de>, Nuutti Kotivuori <naked@iki.fi>, public-sparql-dev@w3.org
Message-Id: <da264e37b3358c53b04e0331f17d2b2b@w3.org>
On Sep 13, 2006, at 8:45 AM, Chimezie Ogbuji wrote:
> On Wed, 13 Sep 2006, Richard Cyganiak wrote:
>
>> Hi Nuutti,
>>
>> Without having thought through all the consequences ...
>>
>> Some of your options are not really possible with named graphs 
>> because graphs need to be *named*, that is, the name *must* be a URI 
>> and not a blank node.
>
> I don't agree.  What's the source of this assertion?

Richard is probably just appealing to the definition of dataset in the 
SPARQL spec:

An RDF dataset is a set:
      { G, (<u1>, G1), (<u2>, G2), . . . (<un>, Gn) }
where G and each Gi are graphs, and each <ui> is an IRI.
  -- http://www.w3.org/TR/rdf-sparql-query/#defn_RDFDataset


>  I think the core issue here is that there is *no* concensus formalism 
> for named graphs WRT RDF, yet SPARQL is dependent on an RDF model that 
> supports named graphs.  If there is one, please point me to it, 
> because I ran across the same problem when constructing programming 
> APIs for named graphs.  The only formalism I know of is Graham Kyle, 
> John McCarthy's work [1].

One could say that there is a growing acceptance of the SPARQL dataset 
design itself.
I argued against doing multiple-graph queries in this first version of 
SPARQL...
   "DanC argued against taking us out of the scope of positive 
conjunctive queries against RDF graphs."
   -- http://www.w3.org/2001/sw/DataAccess/ftf4.html

... but I wasn't able to convince the WG that SPARQL would be 
worthwhile without it.

And by W3C definition of consensus, there is indeed not consensus 
around this
part of the design. There are outstanding formal objections in our 
request for CR (which was
granted):

[[
4. Objective 4.2 Data Integration and Aggregation was accepted 
2004-09-16 over the objection of Network Inference/Rob Shearer:

     The only technology that I think we all really agree on is RDF and 
the RDF data model. It strikes me as blatantly wrong to attempt a query 
standard based on some other data model, and "RDF+some meta 
information" is some other data model. If the meta information can be 
exposed in RDF, then our query language should support it by default. 
If it can't be exposed in RDF, then why are we considering native 
support in an RDF query language?

A comment from outside the WG also says:

     I think these should be removed from the basic SPARQL core, since I 
feel they add a fair deal of implementation complexity and an 
application can achieve the same result by submitting multiple queries, 
possibly to different query processors.

     I also feel it would be premature to standardize an approach to 
multi-graph querying ahead of there being a consensus/standard for 
something like RDF named graphs.
     Klyne 08 Apr 2005

The FROM NAMED and GRAPH features seems to be specified to the 
satisfaction of a critical mass of the community, supported in several 
implementations, and required by number of use cases and applications.

5. The fromUnionQuery  issue was resolved in our 2005-06-07 meeting 
over the objection of Steve Harris. This was a design issue where the 
group had a lot of difficulty finding consensus, and the chair chose to 
act in the interest of schedule concerns:

     DanC summarized by observing 3 designs that seemed to be coherent
     and had been developed and advocated sufficiently that we might
     be able to finish them in a timely manner:

     OPTIONS:
       (a) without FROM/FROM_NAMED, dataset is unconstrained; with
        FROM/FROM_NAMED, dataset is bounded from below by given 
references.
       (b) like (a) but FROM/FROM named completely specify the dataset
       (c) datasets have "aggregate graph" rather than background/default
        graph, and it always contains the merge of the named graphs

     By "bounded from below," DanC clarified that he meant D1 >= D2 iff
     	D1's background/aggregate graph has everything that D2's has,
     		i.e. D1's bg graph rdf-simply-entails D2's
     	and D1 has all the named graphs that D2 has; i.e.
     	for every named graph (U, G) in D2, (U, G) is also in D1's named
     	graphs.

     KC observed that this is basically a web-social question of
     constraining what publishers do.

     DC observed that constraining publishers might be responsive
     to comments on this part of our spec, in the interest of
     interoperability at the expense of flexibility.

     Polling showed significant opposition to (b); after that option
     was removed, the WG was split nearly 50-50 between (a) and (c).
     In the interest of time, the chair chose one of the proposals
     and we

     RESOLVED: to go option (a) without FROM/FROM_NAMED, dataset is
     unconstrained; with FROM/FROM_NAMED, dataset is bounded from below
     by given references.
     SH objects. abstaining: EricP, DaveB

The feature seems to be specified to the satisfaction of a critical 
mass of the community, and it seems unlikely that further deliberation 
of this issue would result in substantially more consensus.
]]
  -- http://www.w3.org/2001/sw/DataAccess/crq349


-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Wednesday, 13 September 2006 14:33:22 UTC