Issue #rdfSemantics / #owlDisjunction from Seaborne, Andy on 2005-10-07 (public-rdf-dawg@w3.org from October to December 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Fri, 07 Oct 2005 13:38:27 +0100
To: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <43466C43.4010006@hp.com>
This text hasn't appeared in email but I have reproduced it here. I wasn't 
sure of the origins of
http://www.mindswap.org/~bparsia/rdfssemsparql.html
but I hope that's OK.  Also, I don't know the original author so "I" is unbound.

> I think that Pat summary in [1] is a fair starting point for considering the 
> changes to the SPARQL draft. I see three main points requiring major 
> (non-editorial) changes:
> 
> 1) Simple patterns and complex queries.
> 
> The description of the semantics of a query expression should be understood in 
> term of simple query pattern and subsequent algebraic operations on what we 
> may call solution sets (it might be just tables, but bnodes complicate the 
> picture). The current draft doesn't emphasise these two levels; however I'm 
> fairly convinced that it's going to be just a rewording of what's already 
> there. I gather that this first point isn't really controversial; since it's 
> basically as they already understand SPARQL.

Agreed - this is the best way forward I see - change the definition of basic 
pattern matching and leave the composition of basic patterns in the algrebra.

I see basic pattern matching to work in one of several modes:

+ abstract syntax
+ entailment : simple, RDF for the charter requirements.

I would expect the entailment mode to allow entailment semantics that are not 
enumerated in rq23 itself - it's an open set.  The algebra is not by entailment.

The current tests suite could either all be described as "abstract syntax" 
tests or we can check through to see which are which.

It would be helpful to have tests that are explicitly designed to cover the 
entailment cases.

A given query request would only work in one mode for all patterns in the 
query.  It is a characteristic of the service as to the matching mode.  This 
may be a parameter; this may be by endpoint.

Changes: Pattern Solution defn + need for explanatory text.

> 
> 2) Bnodes
> 
> 2a) Role of bnodes in answers.
> 
> I start from the assumption that they want them in a pattern solution. 
> Currently, the document is ambiguous on whether the bnode name returned in a 
> pattern solution is significant (i.e. the same of the corresponding bnode in 
> the graph) or arbitrary.

BNode labels are not significant in the final answer - see example in 2.7

http://www.w3.org/2001/sw/DataAccess/rq23/#BlankNodes

[But we have had a comment requesting at least leaving the possibility of 
direct handling of blank node labels to enable remote traversal of the 
abstract syntax.]

Could you suggest text to make this clearer?

 > They kind of state that it's not significant, but
> they all expect that they should be exactly the names in the graph. I don't 
> think that the alternative way proposed by Pat:
> 
>      huddle: KB simply entails (KB union B(Q))
> 
> is a solution, since substituting variables with bnodes not appearing in the 
> graph (nothing in the definitions prevents this) would yield to the same 
> behaviour as
> 
>      remote: KB simply entails B(Q)
> 
> and minimality is going to be pretty hairy (minimal answers would be those 
> with bnodes from the graph itself... aargh). Still skolemisation seems to me 
> the best option, since treating bnodes as URIs just in some cases looks to me 
> more confusing.
 >
> 2b) Role of bnodes in queries.
> 
> If names of bnodes are significant, they want to use them to formulate new 
> queries. But they're not sure about it (in fact, the document is contradictory 
> as Peter pointed out in [2]). I think we should prevent this. In general we 
> want bnodes as existential variables in simple query patterns, because without 
> them we loose expressiveness for querying languages able to "force" the 
> existence of objects without any bnode (e.g. OWL-lite). Moreover, not knowing 
> which are the names of bnodes in a graph makes the use of bnodes in queries 
> rather confusing, since they'd behave differently when the name appear in the 
> graph (constant vs exist. variable).

The current text says:
"""
Queries can include blank nodes; the blank nodes in a query are disjoint from 
all blank nodes in the RDF graphs being matched and members of the set of 
variables.
"""

We have a syntax that involves bNodes so we need an account of that and we 
have a comment requesting leaving room for them as constants in queries (this 
seems to be resolved by noting that <_:abc> is possible at the lowest syntax 
level).  This would be the way to get a query string with a bNodes of the KB 
in it.  Not licensed explictly in rq23.

Local usage may also imply bNodes from the programming language nevironment 
but I don't think we need to worry about that as it does not apply across the web.

This leaves the algebra which I think is where some of the confusion arises from.
A bNode in a solution one basic pattern matching will need to behave as that 
bNodes
in another pattern within the same query.

BNodes are significant within a query (with the algebra) - otherwise two basic 
patterns, using the same variable, in different parts of the same query don't 
match.

     { ?s :p1 ?o . ?s :p2 ?o }
might not be the same as
     { { ?s :p1 ?o } . {?s :p2 ?o } }
which is bizarre.

Could you suggest text for structuring this?

> 
> 3) Minimality of answer sets.
> 
> Pattern solutions can be redundant or not according to the semantics of bnodes 
> in answers. Note that our definition of redundancy is based on answer sets 
> only, and *doesn't* depend on the graph. Pat's example in [1] seems to imply 
> that the graph itself plays a role (although, I think that his example is more 
> related to 2b above).
> 
> [1] http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0471
> 
> [2] http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0477.html
> 
> Specific items affected by the changes in the editors working draft. 
> http://www.w3.org/2001/sw/DataAccess/rq23/
> 
> * Pattern Solution
> http://www.w3.org/2001/sw/DataAccess/rq23/#PatternSolutions
> 
> With entailment we don't need bindings for bnodes.

We do for the algebra over the pattern matching to work.  Making the syntax 
bNodes in queries explicily be variables, not bNodes at all might be one way. 
  I would like input on this from existing systems.

The alternative might be to have:

BQ = set of bNodes in queries disjoint from the KB.

then "Let W = V union BQ"

> 
> * Basic Graph Pattern
> http://www.w3.org/2001/sw/DataAccess/rq23/#BasicGraphPatternMatching
> 
> Definition of 'matching' in terms of (some) entailment.
> 
> * Optional Pattern Matching
> http://www.w3.org/2001/sw/DataAccess/rq23/#OptionalMatching
> 
>  From the algebraic point of view it seems to be a left outer join; although 
> this doesn't account for {{} OPTIONAL Q } = Q unless you do some tricks with 
> the meaning of the empty pattern.

{} matches with one solution of no rows.

Would it help to explicitly say this?

> 
> * Multiple Optional Graph Patterns
> http://www.w3.org/2001/sw/DataAccess/rq23/#MultipleOptionals
> 
> With the outer join perspective, I'm pretty sure it should be:
> 
>      { Q OPTIONAL Q1 OPTIONAL Q2 } = {{ Q OPTIONAL Q1 } OPTIONAL Q2 }
> 
> * Ditch the whole GRAPH/NAMED thing (or push it in soma appendix).

In writing a recommendation that we hope people will implement and use, there 
is a trade-off to be made between attempting some loose standardization of 
approaches and doing nothing and leaving alternative approaches to be 
incompatible.  This is tricky when new work is appearing.  The restriction to 
URIs to name graphs is already less than existing systems already do.

This part of SPARQL is independent of the rest (you can have the QL and 
Protocol without the GRAPH/NAMED thing) so the risk here is reduced.

The paper:
http://www.wiwiss.fu-berlin.de/suhl/bizer/SWTSGuide/carroll-ISWC2004.pdf
gives possible semantics.

> 
> * CONSTRUCT
> http://www.w3.org/2001/sw/DataAccess/rq23/#construct
> 
> The definition is plain wrong, since you don't want to "RDF merge" bnodes 
> coming from the graph, just the ones in the template:
> 
>      CONSTRUCT { ?x :looks ?y } with answer set [?x/_:a,?y/:here], 
> [?x/_:a,?y/:there]
> 
> results in the graph
> 
>      _:a :looks :here . _:a1 :looks :there .
> 
> where the coreference on _:a is lost.
> 

What you outline looks right - I would find it helpful if you could suggest 
some replacement text for the definition of CONSTRUCT in rq23.

 Andy
Received on Friday, 7 October 2005 12:38:40 UTC