Re: [TF-ENT] Querying datasets with default plus named graphs from Birte Glimm on 2009-10-07 (public-rdf-dawg@w3.org from October to December 2009)

From: Birte Glimm <birte.glimm@comlab.ox.ac.uk>
Date: Wed, 7 Oct 2009 14:42:00 +0100
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <492f2b0b0910070642o2ecc1729i89c2aefb7282fa1@mail.gmail.com>
2009/10/7 Seaborne, Andy <andy.seaborne@hp.com>:
>
>
>> -----Original Message-----
>> From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
>> On Behalf Of Birte Glimm
>> Sent: 07 October 2009 12:53
>> To: SPARQL Working Group
>> Subject: [TF-ENT] Querying datasets with default plus named graphs
>>
>> Hi all,
>> I skimmed the minutes of yesterday's telecon and I updated the
>> entailment doc to include the newly generated issues. I would like to
>> start collecting opinions for the issue of querying data sets that
>> have more than the default graph and whether inferences work on all
>> graphs in the datasets or are local to their particular graph. Here is
>> an example that Steve originally created:
>> We have a data set with the two named graphs http://example.org/a.rfd
>> and http://example.org/b.rdf (empty default graph).
>> http://example.org/a.rdf:
>>   :p rdfs:domain :A .
>> http://example.org/b.rdf:
>>   :x :p :y .
>
> Is anyone advocating this should be covered?

Well, Steve asked how entailment regimes would/should behave in that
case, so it was on my list of things to clarify (it wasn't entirely
clear to me) and in the minutes of yesterday's telecon we have:
ISSUE: should entailment-regimes be declared over the whole dataset or
indicidual graphs?

>From that I take that it is not clear at least not without spending
some time on careful reading of the spec. According to your comment, I
looked into 12.6 again and I agree that one can conclude that
entailment should indeed be local to the graph. There we have:
A SPARQL extension to E-entailment must satisfy the following conditions.
1 -- The scoping graph, SG, corresponding to any consistent active
graph AG is uniquely specified and is E-equivalent to AG.
2 -- ...
3 -- For any scoping graph SG and answer set {P1 ... Pn} for a basic
graph pattern BGP, and where {BGP1 .... BGPn} is a set of basic graph
patterns all equivalent to BGP, none of which share any blank nodes
with any other or with SG
    SG E-entails (SG union P1(BGP1) union ... union Pn(BGPn))
4 - ...

Now, since SG is AG (or more precisely E-equivalent to AG) by 1 and it
must entail the answers (Pi(BGP)) by 3, we could only return answers
that depend on more than one graph (a.rdf and b.rdf in the example) if
these graphs were part of the active graph, but they are not. In Sec 8
it says:
The graph that is used for matching a basic graph pattern is the active graph.
and neither a.rdf nor b.rdf corresponds to the graph that is used.

Since his seems to be clear now, I suggest we point that out in the
entailment doc and give the references to the relevant sections in the
SPARQL 1.0 spec.

>> The question is, what bindings ?g should take if we query:
>>   SELECT ?g WHERE { GRAPH ?g { :x a ?type .  } }
>>
>> If we assume that entailments always work over all graphs in the DS,
>> then ?type can be mapped to :A, but this entailment depends on both
>> graphs. Taking any one out, means the entailment no longer holds, so
>> ?g must be both a.rdf and b.rdf and possibly the default graph since
>> there is no from clause in the query and we in fact query the default
>> graph. .
>>
>> Just to check that I get this right: If we take the same datat set and
>> issue the query
>>   SELECT ?o WHERE { :x :p ?o . }
>> I would get no answer under simple entailment because the default
>> graph is empty.
>
> Not quite - there is no dataset description so it will be whatever the processor provides as the dataset (i.e. it's set externally - common case).

Just got your second email while writing and yes I meant that the
processor provided dataset (sorry for not being clear), so I would get
no answers because the processor provides an empty default graph.

>> If I ask
>>   SELECT ?o FROM NAMED <http://example.org/b.rdf> WHERE { :x :p ?o . }
>> I would get { (o, y) }, right?
>
> There is a dataset description, it does not mention the default graph, so it is empty. So { :x :p ?o . } is on the empty graph and does not match.
>
> { GRAPH <http://example.org/b.rdf> {:x :p ?o . } }
>
> returns { (?o, y) }

Ah, so I was confused there, but I think now I get it :-)

>> If I ask
>>   SELECT ?o FROM <http://example.org/b.rdf> WHERE { :x :p ?o . }
>> I would get { (o, y) } again, but this time I implicitly created a
>> default graph that contains all triples from b.rdf, right?
>
> Yes - although I'd say 'explicit' because you used FROM.
>
>> I guess
>> this default graph would be temporary, right and if I query again
>> without the from clause, I would again get no results, right?
>>
>> Ok, assuming I understand that right, I would much prefer to keep
>> entailments local to the graph.
>
> +1
>
> And I believe this follows from "12.6 Extending SPARQL Basic Graph Matching" which does not mention datasets.

see above, I agree but suggest clarification in the entailment doc +
references to the relevant SPARQL 1.0 sections.

Birte

> ----
>
> Mixed entailment regimes in one query do happen already.  I don't see any sensible way to specify entailment across graphs and have a mix.
>
> This is not to say that matching a BGP under entailment can't take into account information not in the graph (presumably, rules entailment do this anyway - the rules are not in the graph).  We don't necessary need to make the T-Box visible do we?  Then "GRAPH <b.rdf> { :x a ?type .  }" works if <b.rdf> is set up in some way (not part of the spec) to use the vocabulary in <a.rdf>.  The fact the information used for matching <b.rdf> happens to also be accessible via <a.rdf> is neither here nor there.
>
>> I think this goes well with SPARQL 1.0
>> because it says in Sec 8.1
>> (http://www.w3.org/TR/rdf-sparql-query/#exampleDatasets) below Example
>> 1: In this example, the default graph contains the names of the
>> publishers of two named graphs. The triples in the named graphs are
>> not visible in the default graph in this example.
>>
>> Let me also argue from an OWL viewpoint (because I am an OWL person):
>> I would see the IRIs in a FROM (NAMED) clause as ontology IRIs. An
>> ontology contains everything it needs and might use imports to include
>> resources that it does not physically contain. I have to load those
>> imported rsources anyway as part of the graph. As I understand it, an
>> implementor can now choose to have several ontologies loaded more or
>> less permanently as (named) graphs/ontologies (which means one can do
>> all preprocessing to them, check them for consistency, and possibly
>> classify them (build the sub-/superclass hierarchy), so that most
>> queries can be answered quickly). If I decide to have the pizza
>> ontology (often used for Protege tutorials) and Snomded (large medical
>> ontology) loaded as named graphs, then I do not want that pizzas have
>> any effect on my medical ontology and I do want entailments to be
>> local to the ontology. If users wants to merge two ontologies on the
>> fly for querying, they can ask
>> SELECT ?x FROM IRI_1, IRI_2 WHERE { some_BGP }
>> which would (according to Sec 8.2 of the SPARQL spec) result in the
>> query being valuated over a default graph that contains the RDF merge
>> of tuples from IRIR_1 and IRI_2.
>>
>> This would also allow for removing (named) graphs without having to do
>> soething like belief revision to find out what inferences are no
>> longer valid after the delete or having to reload and redo all
>> infrences for the remaining graphs.
>>
>> What would that mean for Steve's example? It has an empty answer, but
>> be no longer have to assign a.rf, b.rdf, and the default graph all
>> atthe same time to ?g.
>>
>> If there are no major objections, I can go and add a section about
>> data sets to the entailment doc similar to Sec 8 in the SPARQL doc,
>> which outlines how one can query a merge of resources and that
>> normally entailments are local to the graph. If you have objections, I
>> would be happy about suggestions for different ways of doing it.
>
> If it helps for clarity, then fine but it seems redundant to me once 12.6 is referenced.
>
>        Andy
>
>>
>> Cheers,
>> Birte
>



-- 
Dr. Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529
Received on Wednesday, 7 October 2009 13:42:36 UTC