[TF-ENT] Querying datasets with default plus named graphs

Hi all,
I skimmed the minutes of yesterday's telecon and I updated the
entailment doc to include the newly generated issues. I would like to
start collecting opinions for the issue of querying data sets that
have more than the default graph and whether inferences work on all
graphs in the datasets or are local to their particular graph. Here is
an example that Steve originally created:
We have a data set with the two named graphs http://example.org/a.rfd
and http://example.org/b.rdf (empty default graph).
http://example.org/a.rdf:
  :p rdfs:domain :A .
http://example.org/b.rdf:
  :x :p :y .

The question is, what bindings ?g should take if we query:
  SELECT ?g WHERE { GRAPH ?g { :x a ?type .  } }

If we assume that entailments always work over all graphs in the DS,
then ?type can be mapped to :A, but this entailment depends on both
graphs. Taking any one out, means the entailment no longer holds, so
?g must be both a.rdf and b.rdf and possibly the default graph since
there is no from clause in the query and we in fact query the default
graph. .

Just to check that I get this right: If we take the same datat set and
issue the query
  SELECT ?o WHERE { :x :p ?o . }
I would get no answer under simple entailment because the default
graph is empty. If I ask
  SELECT ?o FROM NAMED <http://example.org/b.rdf> WHERE { :x :p ?o . }
I would get { (o, y) }, right?
If I ask
  SELECT ?o FROM <http://example.org/b.rdf> WHERE { :x :p ?o . }
I would get { (o, y) } again, but this time I implicitly created a
default graph that contains all triples from b.rdf, right? I guess
this default graph would be temporary, right and if I query again
without the from clause, I would again get no results, right?

Ok, assuming I understand that right, I would much prefer to keep
entailments local to the graph. I think this goes well with SPARQL 1.0
because it says in Sec 8.1
(http://www.w3.org/TR/rdf-sparql-query/#exampleDatasets) below Example
1: In this example, the default graph contains the names of the
publishers of two named graphs. The triples in the named graphs are
not visible in the default graph in this example.

Let me also argue from an OWL viewpoint (because I am an OWL person):
I would see the IRIs in a FROM (NAMED) clause as ontology IRIs. An
ontology contains everything it needs and might use imports to include
resources that it does not physically contain. I have to load those
imported rsources anyway as part of the graph. As I understand it, an
implementor can now choose to have several ontologies loaded more or
less permanently as (named) graphs/ontologies (which means one can do
all preprocessing to them, check them for consistency, and possibly
classify them (build the sub-/superclass hierarchy), so that most
queries can be answered quickly). If I decide to have the pizza
ontology (often used for Protege tutorials) and Snomded (large medical
ontology) loaded as named graphs, then I do not want that pizzas have
any effect on my medical ontology and I do want entailments to be
local to the ontology. If users wants to merge two ontologies on the
fly for querying, they can ask
SELECT ?x FROM IRI_1, IRI_2 WHERE { some_BGP }
which would (according to Sec 8.2 of the SPARQL spec) result in the
query being valuated over a default graph that contains the RDF merge
of tuples from IRIR_1 and IRI_2.

This would also allow for removing (named) graphs without having to do
soething like belief revision to find out what inferences are no
longer valid after the delete or having to reload and redo all
infrences for the remaining graphs.

What would that mean for Steve's example? It has an empty answer, but
be no longer have to assign a.rf, b.rdf, and the default graph all
atthe same time to ?g.

If there are no major objections, I can go and add a section about
data sets to the entailment doc similar to Sec 8 in the SPARQL doc,
which outlines how one can query a merge of resources and that
normally entailments are local to the graph. If you have objections, I
would be happy about suggestions for different ways of doing it.

Cheers,
Birte




Some relevant parts from the SPARQL 1.0 spec:

8.2 Specifying RDF Datasets

A SPARQL query may specify the dataset to be used for matching by
using the FROM clause and the FROM NAMED clause to describe the RDF
dataset. If a query provides such a dataset description, then it is
used in place of any dataset that the query service would use if no
dataset description is provided in a query. The RDF dataset may also
be specified in a SPARQL protocol request, in which case the protocol
description overrides any description in the query itself. A query
service may refuse a query request if the dataset description is not
acceptable to the service.

The FROM and FROM NAMED keywords allow a query to specify an RDF
dataset by reference; they indicate that the dataset should include
graphs that are obtained from representations of the resources
identified by the given IRIs (i.e. the absolute form of the given IRI
references). The dataset resulting from a number of FROM and FROM
NAMED clauses is:

    * a default graph consisting of the RDF merge of the graphs
referred to in the FROM clauses, and
    * a set of (IRI, graph) pairs, one from each FROM NAMED clause.

If there is no FROM clause, but there is one or more FROM NAMED
clauses, then the dataset includes an empty graph for the default
graph.


8.1 -> below Example 1: In this example, the default graph contains
the names of the publishers of two named graphs. The triples in the
named graphs are not visible in the default graph in this example.


8.2.1 Specifying the Default Graph

Each FROM clause contains an IRI that indicates a graph to be used to
form the default graph. This does not put the graph in as a named
graph.

In this example, the RDF Dataset contains a single default graph and
no named graphs:

# Default graph (stored at http://example.org/foaf/aliceFoaf)
@prefix  foaf:  <http://xmlns.com/foaf/0.1/> .

_:a  foaf:name     "Alice" .
_:a  foaf:mbox     <mailto:alice@work.example> .

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT  ?name
FROM    <http://example.org/foaf/aliceFoaf>
WHERE   { ?x foaf:name ?name }

name
"Alice"

If a query provides more than one FROM clause, providing more than one
IRI to indicate the default graph, then the default graph is based on
the RDF merge of the graphs obtained from representations of the
resources identified by the given IRIs.

-- 
Dr. Birte Glimm, Room 306
Computing Laboratory
Parks Road
Oxford
OX1 3QD
United Kingdom
+44 (0)1865 283529

Received on Wednesday, 7 October 2009 11:53:15 UTC