Re: “Default” vs “unnamed” graph from Souripriya Das on 2011-03-16 (public-rdb2rdf-wg@w3.org from March 2011)

From: Souripriya Das <souripriya.das@oracle.com>
Date: Wed, 16 Mar 2011 10:03:32 -0400
To: Lee Feigenbaum <lee@thefigtrees.net>
CC: Robert Scanlon <rscanlon@revelytix.com>, Richard Cyganiak <richard@cyganiak.de>, RDB2RDF WG <public-rdb2rdf-wg@w3.org>
Message-ID: <4D80C334.80809@oracle.com>
I agree with all of the viewpoints expressed so far.

Just for my own understanding, I am distinguishing between two types of 
triples:

    * "native" triples: they exist as triples in the (virtual) store
    * "converted" triples: they exist as quads in the (virtual) store,
      but were converted to triples by projecting out the graph information

Using these two terms, I distinguish the nature of content of Default 
Graph in a store vs. that of Default Graph in the context of a SPARQL 
query's execution as follows:

    * Default graph for a store consist only of "native" triples.
    * Default graph for a SPARQL query execution consist of "native"
      triples and/or "converted" triples.

This seems to work for me at the moment, until I get a better 
understanding based upon what comes out from the RDF WG.

Thanks,
- Souri.

Lee Feigenbaum wrote:
> I agree with everything Bob says here.
>
> Furthermore, I'd suggest that while it makes some logistical sense to 
> reuse the concepts of default and named graphs from SPARQL, in the 
> long-run it probably makes more sense to specify R2RML in terms of the 
> work being done by the new RDF working group on quads/named graphs. 
> *Hopefully*, by defining the meaning of R2RML in terms of that, the 
> ability to (dynamically) compose a (SPARQL) RDF dataset and query 
> against the results of an R2RML mapping will then come naturally.
>
> Lee
>
> On 3/15/2011 6:58 PM, Robert Scanlon wrote:
>> Souri, et al,
>>
>> I'm not sure it's necessarily helpful/useful to think of datasets in the
>> context of the _query service_ (which is basically what will be
>> executing R2RML mappings and exposing the generated triples).  As I
>> mentioned in my last email to Richard, datasets are normally within the
>> purview of the SPARQL query (aside from the minor incursion into the
>> service realm by the upcoming Service Description standard).
>>
>> Per the SPARQL spec, a query's dataset defines the scope of the graphs
>> for matching graph patterns; GRAPH graph patterns get matched against
>> named graphs, and other graph patterns against the default graph.  The
>> graphs can be defined explicitly in the query (named graphs in FROM
>> NAMED, default graph in one of more FROM), or the protocol message, per
>> the SPARQL spec; or if undefined then the query service determines what
>> default and named graphs are used by queries.  But the last situation is
>> more of a fall-back (although, unhelpfully, most examples use this
>> mode); it is not in general 'correct' to think of the query service as
>> serving up 'datasets' (imo) -- it serves up triples in the context of
>> graphs, which may be 'carved up' in the context of an individual query's
>> dataset, referencing the exposed graphs, to control graph pattern 
>> matching.
>>
>> I don't think that R2RML spec should be getting into the whole SPARQL
>> graph/dataset morass aside from allowing a modeler (defining the R2RML
>> mappings) to specify which triples should go in which named graphs, and
>> which should go in the 'default' graph exposed by R2RML.  As I mentioned
>> in my last email, the query service itself does not have to honor the
>> 'suggestions' of the modeler defining the R2RML mappings (although it
>> certainly could, and generally would by default).
>>
>> Bob Scanlon
>> Revelytix
>>
>>
>> On Tue, Mar 15, 2011 at 5:33 PM, Souripriya Das
>> <souripriya.das@oracle.com <mailto:souripriya.das@oracle.com>> wrote:
>>
>>     Richard,
>>
>>     I had a long chat with Eric after the telecon today. Seema and
>>     another colleague of mine, Matt Perry, too joined. Following the
>>     discussion, now we are okay with the use of the term "default graph"
>>     to refer to the unnamed graph in an R2RML-based RDF store.
>>
>>     So, please go ahead and make the minor changes needed in the current
>>     draft to replace unnamed graph with default graph.
>>
>>     If interested, here is how I managed to convince myself in an
>>     informal way, by considering triples vs. quads:
>>
>>         * In general, a default graph (DG, for short) can be thought of
>>           as a container of *triples* in a dataset whereas the named
>>           graphs contain *quads*.
>>         * R2RML mapping causes triples and quads to (virtually) come
>>           into existence. Among these, the *triples* (by birth) make up
>>           the DG of an R2RML-based RDF store.
>>         * A DG in the context of a SPARQL query on the other hand could
>>           consist of triples-by-birth (from an unnamed graph) OR
>>           triples-generated-via-UNION-of-SPO-projections-from-quads in
>>           an RDF store.
>>         * So, it is quite possible to have (DG of a SPARQL query against
>>           an R2RML-based RDF store) != (DG of the target R2RML-based RDF
>>           store). But the two DGs always share the characteristic that
>>           both of them consist only of triples -- triples-by-birth only
>>           in R2RML and triples-by-birth or triples-by-transformation in
>>           SPARQL -- but neither has any quads.
>>
>>     Thanks,
>>     - Souri.
>>
>>
>>     On 3/15/2011 5:18 PM, Richard Cyganiak wrote:
>>>     I'd like to re-iterate my position from this call that we should 
>>> define the output of an R2RML mapping as an RDF Dataset in the 
>>> SPARQL sense, as it already says in the introduction, and 
>>> consistently use the SPARQL's terminology.
>>>
>>>     This would imply using the terms “named graph” and “default 
>>> graph”. The term “unnamed graph” would be removed from the spec.
>>>
>>>     The objection raised in the call was that the default graph used 
>>> in a SPARQL query can actually be constructed on the fly, on a 
>>> query-by-query basis, by using the FROM keyword or SPARQL protocol 
>>> parameters.
>>>
>>>     This is a valid observation. But I argue that this doesn't 
>>> conflict at all with the use of the RDF Dataset concept and the term 
>>> “default graph”.
>>>
>>>     To quote from the SPARQL spec [1]:
>>>
>>>>     A SPARQL query may specify the dataset to be used for matching 
>>>> by using the FROM clause and the FROM NAMED clause to describe the 
>>>> RDF dataset. If a query provides such a dataset description, then 
>>>> it is used in place of any dataset that the query service would use 
>>>> if no dataset description is provided in a query.
>>>     This makes clear that if FROM/FROM NAMED are used, then one 
>>> queries a *different* dataset from the one that the query service 
>>> offers *by default* if FROM/FROM NAMED were not used.
>>>
>>>     I'm proposing that we think of the R2RML-generated dataset as 
>>> the dataset which a query service would use by default in absence of 
>>> a specific dataset description. This doesn't preclude the 
>>> possibility of overriding the default graph or any other graph with 
>>> FROM/FROM NAMED and the SPARQL protocol.
>>>
>>>     This would be a simple change in terms of spec text (s/unnamed 
>>> graph/default graph/ and check the early sections for anyplace that 
>>> should say “RDF dataset” instead of “RDF graph”). So I propose that 
>>> we do this before the WD release.
>>>
>>>     If there are no objections (on- or off-list), I'll go ahead and 
>>> do this.
>>>
>>>     Best,
>>>     Richard
>>>
>>>     [1]http://www.w3.org/TR/rdf-sparql-query/#unnamedGraph
>>
>>
Received on Wednesday, 16 March 2011 14:06:01 UTC