Re: “Default” vs “unnamed” graph from Robert Scanlon on 2011-03-15 (public-rdb2rdf-wg@w3.org from March 2011)

From: Robert Scanlon <rscanlon@revelytix.com>
Date: Tue, 15 Mar 2011 17:58:32 -0500
To: Souripriya Das <souripriya.das@oracle.com>
Cc: Richard Cyganiak <richard@cyganiak.de>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <AANLkTimBD1VBiuX1NQVv0DyJ6ta36weKQJ7Fa2P2SEt=@mail.gmail.com>
Souri, et al,

I'm not sure it's necessarily helpful/useful to think of datasets in the
context of the *query service* (which is basically what will be executing
R2RML mappings and exposing the generated triples).  As I mentioned in my
last email to Richard, datasets are normally within the purview of the
SPARQL query (aside from the minor incursion into the service realm by the
upcoming Service Description standard).

Per the SPARQL spec, a query's dataset defines the scope of the graphs for
matching graph patterns; GRAPH graph patterns get matched against named
graphs, and other graph patterns against the default graph.  The graphs can
be defined explicitly in the query (named graphs in FROM NAMED, default
graph in one of more FROM), or the protocol message, per the SPARQL spec; or
if undefined then the query service determines what default and named graphs
are used by queries.  But the last situation is more of a fall-back
(although, unhelpfully, most examples use this mode); it is not in general
'correct' to think of the query service as serving up 'datasets' (imo) -- it
serves up triples in the context of graphs, which may be 'carved up' in the
context of an individual query's dataset, referencing the exposed graphs, to
control graph pattern matching.

I don't think that R2RML spec should be getting into the whole SPARQL
graph/dataset morass aside from allowing a modeler (defining the R2RML
mappings) to specify which triples should go in which named graphs, and
which should go in the 'default' graph exposed by R2RML.  As I mentioned in
my last email, the query service itself does not have to honor the
'suggestions' of the modeler defining the R2RML mappings (although it
certainly could, and generally would by default).

Bob Scanlon
Revelytix


On Tue, Mar 15, 2011 at 5:33 PM, Souripriya Das
<souripriya.das@oracle.com>wrote:

>  Richard,
>
> I had a long chat with Eric after the telecon today. Seema and another
> colleague of mine, Matt Perry, too joined. Following the discussion, now we
> are okay with the use of the term "default graph" to refer to the unnamed
> graph in an R2RML-based RDF store.
>
> So, please go ahead and make the minor changes needed in the current draft
> to replace unnamed graph with default graph.
>
> If interested, here is how I managed to convince myself in an informal way,
> by considering triples vs. quads:
>
>    - In general, a default graph (DG, for short) can be thought of as a
>    container of *triples* in a dataset whereas the named graphs contain
>    *quads*.
>     - R2RML mapping causes triples and quads to (virtually) come into
>    existence. Among these, the *triples* (by birth) make up the DG of an
>    R2RML-based RDF store.
>     - A DG in the context of a SPARQL query on the other hand could
>    consist of triples-by-birth (from an unnamed graph) OR
>    triples-generated-via-UNION-of-SPO-projections-from-quads in an RDF store.
>    - So, it is quite possible to have (DG of a SPARQL query against an
>    R2RML-based RDF store) != (DG of the target R2RML-based RDF store). But the
>    two DGs always share the characteristic that both of them consist only of
>    triples -- triples-by-birth only in R2RML and triples-by-birth or
>    triples-by-transformation in SPARQL -- but neither has any quads.
>
> Thanks,
> - Souri.
>
>
> On 3/15/2011 5:18 PM, Richard Cyganiak wrote:
>
> I'd like to re-iterate my position from this call that we should define the output of an R2RML mapping as an RDF Dataset in the SPARQL sense, as it already says in the introduction, and consistently use the SPARQL's terminology.
>
> This would imply using the terms “named graph” and “default graph”. The term “unnamed graph” would be removed from the spec.
>
> The objection raised in the call was that the default graph used in a SPARQL query can actually be constructed on the fly, on a query-by-query basis, by using the FROM keyword or SPARQL protocol parameters.
>
> This is a valid observation. But I argue that this doesn't conflict at all with the use of the RDF Dataset concept and the term “default graph”.
>
> To quote from the SPARQL spec [1]:
>
>
>  A SPARQL query may specify the dataset to be used for matching by using the FROM clause and the FROM NAMED clause to describe the RDF dataset. If a query provides such a dataset description, then it is used in place of any dataset that the query service would use if no dataset description is provided in a query.
>
>  This makes clear that if FROM/FROM NAMED are used, then one queries a *different* dataset from the one that the query service offers *by default* if FROM/FROM NAMED were not used.
>
> I'm proposing that we think of the R2RML-generated dataset as the dataset which a query service would use by default in absence of a specific dataset description. This doesn't preclude the possibility of overriding the default graph or any other graph with FROM/FROM NAMED and the SPARQL protocol.
>
> This would be a simple change in terms of spec text (s/unnamed graph/default graph/ and check the early sections for anyplace that should say “RDF dataset” instead of “RDF graph”). So I propose that we do this before the WD release.
>
> If there are no objections (on- or off-list), I'll go ahead and do this.
>
> Best,
> Richard
>
> [1] http://www.w3.org/TR/rdf-sparql-query/#unnamedGraph
>
>
>
Received on Tuesday, 15 March 2011 22:59:04 UTC