Re: [TF-ENT] Querying datasets with default plus named graphs from Ivan Herman on 2009-10-12 (public-rdf-dawg@w3.org from October to December 2009)

From: Ivan Herman <ivan@w3.org>
Date: Mon, 12 Oct 2009 12:04:22 +0200
To: "Seaborne, Andy" <andy.seaborne@hp.com>
CC: Birte Glimm <birte.glimm@comlab.ox.ac.uk>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-ID: <4AD2FF26.4090502@w3.org>
Hi Andy,

I must admit that I do not have a very detailed view of how the algebra
works, I always have to look it up again...

With this caveat:

Seaborne, Andy wrote:
>> -----Original Message-----
>> From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
>> On Behalf Of Ivan Herman
>> Sent: 12 October 2009 09:52
>> To: Birte Glimm
>> Cc: SPARQL Working Group
>> Subject: Re: [TF-ENT] Querying datasets with default plus named graphs
>>
>> Hi Birte,
>>
>> no, I think we were talking about different things, but the extra text
>> you put into the document actually answers my questions. So there is no
>> reason to go into the details of your comments below...
>>
>> Just for my understanding, based on your latest text (thanks for having
>> added it, b.t.w.!)... if I have
>>
>> <A> standing for the graph
>> :p rdfs:range :AA
>>
>> <B> standing for the graph
>> :p rdfs:domain :BB
>>
>> <C> standing for the graph
>> :x :p :y
>>
>> then the query:
>>
>> SELECT ?g
>> FROM NAMED <A>
>> FROM NAMED <B>
>> FROM <C>
>> WHERE {
>>    GRAPH ?g { :y a ?type }
>> }
>>
>> will return ?g-><A>, right? (b.t.w., I think the example on the wiki is
>> wrong, the first (negative) example should say :y a ?type, shouldn't it?)
> 
> Ivan - could you quote the text you think justifies this. I don't see it.
> 

My layman's reaction is that for the GRAPH ?g a replacement of <A> is
done for ?g, ie, evaluating the patterns in the corresponding {} the
graph on which the entailement should be performed is the merge of <A>
and <C>. RDFS entailement on that will provide a ?type. If <B> is used
instead of <A> then the entailement does not hold.

I am not sure whether the current SPARQL text says that (or whether we
should say that in the Entailement part). But is looks, sort of, logical
to me, ie, that is what I would expect as a user I guess...

> Based on
> [[
> Under an entailment regime E other than simple entailment, we do not only consider the triples that are in the graph, but also triples that are E-entailed by the graph.
> ]]
> (I acknowledge that which is below "Editor's Note: Please ignore what is said below for now:")
> 
> I read it as not returning that unless there is hidden information such as <A> uses the information that just happens to exposed in <B> and <C>.  FROM NAMED does not cause that.  To put it another way, the spec does not imply anything in additional entailment based on the structure of the dataset that I can see but please correct me here as it is important.
> 
> There is deployed experience here.  Systems exist where entailment is based on BGP matching is an aspect of the graph, not the dataset.  Different graphs in the same dataset may have different entailment regimes (often, entailment and no entailment).
> 
> By spec, I expect it to return no results as no graph on it's own entails that without such additional information.
> 
>> A practical example may then be another type of request which is
>>
>> SELECT ?type
>> FROM NAMED <A>
>> FROM NAMED <B>
>> FROM <C>
>> WHERE {
>>    GRAPH <A> { :y a ?type }
>> }
>>
>> which, essentially, specifies a specific vocabulary for a portion of the
>> query, right? (It may be worth adding this example to the text, too, to
>> make the situation clearer.)
> 
> In SPARQL/Query 1.0, this says match BGP { :y a ?type } against graph <A> (GRAPH is part of the algebra). Entailment is allowed.  It makes no connection to <B> and <C>.  You get the same answers as:
> 
> SELECT ?type
> FROM NAMED <A>
> WHERE {
>     GRAPH <A> { :y a ?type }
> }

Do you mean that FROM <C> (ie, the default graph) is ignored for the
entailement? That would seem fairly unnatural to me...:-(

Ivan



> 
>  Andy
> 
>> B.t.w. a practical consequence (if all this is true) is that the user
>> will have to specify explicitly all the vocabularies it uses in terms of
>> FROM or FROM NAMED clauses to get the right entailements. Which is an
>> unfortunate duplication of the @prefix clauses. Ie, one will have to write
>>
>> @prefix dc: <URI-FOR-DC>
>> SELECT *
>> FROM <URI-FOR-DC>
>> WHERE {
>>    ... something that involves an RDFS entailement with dc:
>> }
>>
>> My reference to the owl:import in my earlier mails is that this may
>> become easier when using owl, because one can prepare one RDF files that
>> says
>>
>> <> owl:import <URI-FOR-DC> .
>> <> owl:import <URI-FOR-SOMETHING-ELSE>
>>
>> and make a unique FROM in the query on that file; OWL entailement may
>> process the owl:import clauses before making the entailement. Somewhat
>> simpler for the users.
>>
>> (Note that OWL 2 RL does not define owl:import as one of its accepted
>> terms, although OWL 2 Full does. I wonder whether this is not a simple
>> extension of OWL 2 RL that we should allow... Not sure...)
>>
>> Cheers
>>
>> Ivan
>>
>>
>>
>> Birte Glimm wrote:
>>> [snip]
>>>>> As I understand it, from named can be used to access graphs in the
>>>>> data set of the query processor. You can do merges into a fresh
>>>>> default graph. Even though this might not be nicest thing in
>>>>> particular for some entailment regimes, this is something that needs
>>>>> to be addressed in the SPARQL query document. The requirement might
>>>>> come from entailment regimes, but entailment regimes are based on
>>>>> SPARQL and if SPARQL does not define it, then we cannot use it. I
>>>>> personally do not want to raise an issue and a request for that, but
>>>>> if others feel like doing it...
>>>> I must say I am  a little bit mixed up here, maybe you can help... We
>> discussed the
>>>> issues of restricting entailements specific graphs when those graphs are
>> defined through
>>>> the named graph mechanism of sparql. But I am now messed up on how the
>> FROM NAMED and
>>>> the GRAPH statements would exactly influence entailement, ie when is
>> anything
>>>> restricted. Could you try to summarize this for a better understanding?
>> Maybe this is
>>>> where my confusion comes from... but I am lost a bit:-(
>>> I added a section on this into the entailment regimes doc:
>>>
>> http://www.w3.org/2009/sparql/wiki/Design:EntailmentRegimes#Entailment_Regime
>> s_and_Data_Sets
>>> but I have the impression that it will not answer your question.
>>> Basically, triples in one graph of the data set do not have any
>>> influence on any other graph in the data set. For a system supporting
>>> RDFS entailment, for examle, you could take the triples from one RDF
>>> document, load it into graph A, built a partial RDFS closure (using
>>> the ter Horst algorithm) and answer queries by using simple entailment
>>> on the partial RDFS closure. Now if you additionally load the triples
>>> from another RDF document into graph B, then this has no influence on
>>> graph A, so even if graph A contains
>>> :a rdf:type :B . (inferred or stated in the originally loaded document)
>>> and the document loaded into graph B contains
>>> :B rdfs:subClassOf :C
>>> you cannot use this to get
>>> :a rdf:type :C .
>>> as a query answer from graph A. The triples in one graph are not
>>> visible in another graph.
>>> I am not quite sure what you mean with "restricting entailments
>>> specific graphs". Do you have in mind that a query processor provides
>>> a certain data set description, say with some default graph, graph A,
>>> and graph B, and one of the named graphs, say A, is for queries with
>>> RDFS entailement, while the other one (B) is for queries with simple
>>> entailment?
>>> At the moment that would not be possible in my understanding and, in
>>> general, the ways of choosing what entailment regime you want seems
>>> not very flexible (but I might overlook something). Let us assume you
>>> have a query processor that can do simple, RDF, and RDFS entailment
>>> (not too unreasonable I think). As I understand it, that would mean
>>> that you can have three endpoints, one for each entailment regime and
>>> depending on which endpoint I choose when I query, I get one of the
>>> three entailment regimes and I can ask that endpoint via service
>>> descriptions what data sets it has etc. What we cannot do at the
>>> moment (if I understand it correctly) is to mix entailment regimes in
>>> one endpoint, so you cannot say the your query should contain results
>>> from graph A under RDFS entailments unioned/joined with results from
>>> graph B for the graph B results you want simple entailment. There is
>>> no way to specify that in the query and there is no way for an
>>> endpoint to communicate that it will use simple entailment for some
>>> data set and RDF(S) for another.
>>> Provided I get that right, I am not sure how much of an issue that is.
>>> I can live with it, but that is my personal opinion.
>>>
>>> For OWL I can see just what you mention above as something that needs
>>> to be addressed, i.e., how can users query for things that are not
>>> entailed, but are stated in the ontology and that are important to
>>> users (annotations most notable, but imports also fall into this
>>> category). If we allow some way of specifying in a query that some
>>> part of the query has to be evaluated under one entailment regime and
>>> other parts of the query under other regimes, that is fine. Then you
>>> can use simple entailment for annotations and OWL or whatever for the
>>> rest. If we do not want to go that way, we could also define OWL
>>> entailment in a way that does not employ OWL semantics to annotation
>>> queries. That is not as nice in my opinion, but it would be a
>>> workaround that does not require changes in other specs.
>>>
>>> Birte
>>>
>>>> [snip]
>>>>>> And what you say is perfectly o.k. in view of the RIF specification.
>>>>>> However: in SPARQL, FROM and FROM NAMED are defined  to specify RDF
>>>>>> datasets. OWL and RDFS are (or can be expressed in) RDF. RIF rules
>> cannot.
>>>>>> That actually may create problems for OWL, too. There is no problem if
>> the
>>>>>> OWL ontology in the FROM clause is in RDF. But would the spec allow to
>> refer
>>>>>> too OWL ontologies in functional and/or Manchester syntax via the FROM
>> or
>>>>>> FROM NAMED clauses?
>>>>> Question to the SPARQL implementors/experts. Can I specify my RDF data
>>>>> in turtle and query that in accordance with the spec? If not in
>>>>> accordance with the spec, do systems support turtle input?
>>>>> If yes, then I cannot see, why not functional or manchester syntax.
>>>>> This is obviously not normative. Any system might reject non-RDF-XML
>>>>> input, but many systems might happily take it.
>>>>> If not even turtle is allowed, are there any plans for doing that as
>>>>> an optional syntax? If not, I guess we have to live with RDF XML. That
>>>>> would probably be the end for RIF though, for OWL RDF ML is normative
>>>>> and any conformant system must support it anyway, so it is not as bad
>>>>> for OWL.
>>>>>
>>>> Hm (again:-). Yes, you are actually right, I am not sure the spec says
>> anything. My
>>>> impression is that the spec is silent at that point and a URI to a graph
>> amy refer to
>>>> any format that the processor understands. If that is so, we may not have
>> a problem with
>>>> OWL if the processor understands non RDF/XML formats. Maybe it is worth to
>> add this to a
>>>> possible service descriptions, though.
>>>>
>>>> But it is certainly a problem with RIF. Indeed, turtle may not be a
>> standard format but
>>>> it is an RDF serialization syntax. In this sense, both the OWL 2
>> functional syntax and
>>>> the M'ter syntax can be considered as an RDF serialization syntax, because
>> they can be
>>>> converted, in a standard way, to RDF. But an RIF rule set _cannot_:-(
>>>>
>>>> Thanks
>>>>
>>>> Ivan
>>>>
>>>>>> I would expect we should be able to do that, but that might affect the
>> query
>>>>>> language specification.
>>>>> Again, that is up to the general SPARQL/Query spec and however want to
>>>>> raise an issue for that can do so.
>>>>>
>>>>> Birte
>>>>>
>>>>>> I remember Axel and I had some corridor chat at some point that would
>> allow
>>>>>> adding a media type to the FROM (NAMED) clause...
>>>>>>
>>>>>> Ivan
>>>>>>
>>>>>>> Birte
>>>>>>>
>>>>>>>> Ivan
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>>>> mobile: +31-641044153
>>>>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>>>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>>>>>>
>>>>>>>
>>>>>> --
>>>>>>
>>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>> mobile: +31-641044153
>>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>>>>
>>>>>
>>>>> --
>>>>> Dr. Birte Glimm, Room 306
>>>>> Computing Laboratory
>>>>> Parks Road
>>>>> Oxford
>>>>> OX1 3QD
>>>>> United Kingdom
>>>>> +44 (0)1865 283529
>>>>>
>>>> --
>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>> URL: http://www.w3.org/People/Ivan/
>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>>
>>>>
>>>
>>>
>> --
>>
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Monday, 12 October 2009 10:04:55 UTC