Re: Subgraph results counter-example(s) from Eric Prud'hommeaux on 2004-06-11 (public-rdf-dawg@w3.org from April to June 2004)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Fri, 11 Jun 2004 15:29:03 +0900
To: "Seaborne, Andy" <andy.seaborne@hp.com>
Cc: Rob Shearer <Rob.Shearer@networkinference.com>, public-rdf-dawg@w3.org, connolly@w3.org
Message-ID: <20040611062903.GA15831@w3.org>
comments about KB's with non-RDF data 112 lines down:

On Wed, Jun 09, 2004 at 10:29:26AM +0100, Seaborne, Andy wrote:
> 
> Rob,
> 
> I now understand the matter better - the "they are connected but by unknown
> means" is a strong argument.  I agree with removing the text "that the query
> matches" that was in the document.
> 
> I think we should avoid "explanation" but I didn't think that was the
> intention anyway.  Things I think are important are:
> 
> 1/ that queries can be considered to execute by RDF graph pattern matching.
> 
> This is a building block requirement: it helps with the charter item "2.2
> RDF Rules".
> 
> 2/ one return format where part of the larger queried graph is returned.
> 
> This is two things: the need for a "tell me about <thing described by
> pattern or just URI>" query and the ability to get results as RDF graphs
> which are part of the information in the large, remote knowledge base.
> 
> The current text is acceptable to me.  The variant is more explicit about
> the derived information.
> 
> """
> 3.4 Subgraph Results
> It must be possible for query results to be returned as a subgraph of the
> original queried graph.
> 
> Status: Pending.
> 
> 3.4a Subgraph Results (variant)
> It must be possible to select an entailed subgraph of a queried graph, in
> which case the query results are an RDF graph.
> """
> 
> (it removes the "that the query matched" part which I agree is unhelpful).
> 
> Discussion in:
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2004AprJun/0484.html
> 
> 	Andy
> 
> 
> More inline:
> 
> -------- Original Message --------
> > From: Rob Shearer <>
> > Date: 8 June 2004 15:48
> > 
> > The current requirement says to return the subgraph "that the query
> > matches", which suggests that any information used to determine whether
> > solutions should be returned is part of the result.
> > 
> > The most trivial problems occur when the query contains disjunction:
> > "find everyone who is friends with Eddie or friends with Jane". If
> > someone is friends with both of these people, what results should be
> > returned? Either a query processor is required to follow all disjunction
> > even thought it's redundant (and return both the triples if they both
> > exist), or the results are nondeterministic.
> 
> ??
> 
> As it asked for "find everyone" it is a variable bindings results with
> projection to one variable:
> 
> SELECT ?x WHERE (?x friend (:Eddie OR :Jane))
> 
> (or other flavour of query with explicit union)
> 
> I don't view this requirement as having anything to do with explanation so I
> wouldn't approach it as a subgraph query.
> 
> > 
> > The more serious issue is that such a requirement can completely kill
> > the use of "supplementary" information about RDF graphs. OWL, SWRL, or
> > other languages can encode information which allows you to derive the
> > fact that Eddie is friends with Jane. This may be the result of
> > interaction between dozens or hundreds of axioms. If some of these
> > axioms are encoded in RDF, are we supposed to return them?
> 
> But if the query is (Eddie friend ?x) then "Eddie friend Jane" is a
> returnable triple.  Its not explanation but derived facts.
> 
> > What about
> > the supplementary information which is not in RDF? What about such
> > information which does have an RDF representation but also has other
> > formats, and one of those other formats was used?
> > 
> > Worse, such other languages can encode disjunction in the language
> > itself: one can assert that someone is friends with either Eddie or Jane
> > without specifying which one. Now there isn't even a final "result"
> > triple than can be returned in the result; *now* do we have to go back
> > and try to figure out what bits of supplementary information need to be
> > returned?
> 
> This is the argument that, for me, hits the spot.  This is problematic
> without bArcs and there may well be worse cases as well where bArcs aren't
> enough.
> 
> Example query:
> 	<x> ?p <y>
> 
> and we can know they are not how (we can't instantiate ?p to an RDF graph
> arc let alone a URIref).
> 
> If the query is "are <x> and <y> related?" the answer is "yes"
> If the query is "how are <x> and <y> related?" the answer is "unknown"

I don't think we're supposed to be addressing this case, and thusly,
it shouldn't constrain our requirements.

It's been a couple weeks since I quoted this so here goes:

http://www.w3.org/2003/12/swa/dawg-charter#scope
[[
The principal task of the RDF Data Access Working Group is to gather
requirements and to define an HTTP and/or SOAP-based protocol for
selecting instances of subgraphs from an RDF graph.
]]

and http://www.w3.org/2003/12/swa/dawg-charter#derivedGraphs
[[
The working group must recognize that RDF graphs are often constructed
by aggregation from multiple sources and through logical inference,
and that sometimes the graphs are never materialized. Such graphs may
be arbitrarily large or infinite.
]]

If the data in the KB can be written in RDF and we ask

Q: who is friends with either Eddie or Jane?
KB: Bob is friends with Eddie.
    Sue is friends with Eddie or Jane.

the the KB can say, in whatever ontology is appropriate:

A: Bob		Bob friendsWith Eddie.
   Sue		Sue friendsWith disjunctionA.
		disjunctionA leftSide Eddie.
		disjunctionA rightSide Jane.

If the data can't be expressed in RDF, isn't that out of scope? chair?

> > This also clashes with a few other objectives, including the querying
> > for non-existent results. (Does *every* triple affect this?)
> > 
> > The problem is that this requirement as written introduces the notion of
> > "explanation", which is actually a *very* hard problem in general, and
> > is almost always more information than the user really wanted.

-- 
-eric

office: +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Friday, 11 June 2004 02:28:54 UTC