- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Wed, 24 Mar 2004 07:45:48 -0500
- To: Patrick Stickler <patrick.stickler@nokia.com>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Patrick asserts that use case federatedAnnotFoaf is out of scope (look
for XXX below). I intended it to me a fairly detailed example of a
fedarated query; one we could use to track what information (and
expressivity) is needed where in order to do such federation. I would
like to hear from others in the WG on this.
On Wed, Mar 24, 2004 at 11:47:33AM +0200, Patrick Stickler wrote:
>
>
> On Mar 23, 2004, at 13:00, ext Eric Prud'hommeaux wrote:
>
> >
> >On Mon, Mar 22, 2004 at 11:38:33AM +0200, Patrick Stickler wrote:
> >>(a) expressing queries and query results in RDF
> >
> >Expressing queries in RDF:
> >
> >In LiteralDB+OWL, (a new name for PS-6 in deference to DanC's naming
> >comments), you described a scenario where a server could use
> >owl:sameAs, rdfs:subPropertyOf, owl:<class constraints> to manipulate
> >the graph of an RDF query *because* it was expressed in RDF. To my
> >eye, this treads on dangerous territory -- expressing the query as a
> >simple graph results in it being an answer to the question,
>
> It wouldn't be an answer to any question if it isn't syndicated into
> a knowledge store against which the question is asked (i.e. poor
> organization/management/engineering can result in lots of "dangers")
>
> If the query isn't within the focus of interest, then don't put it into
> the knowledge base that *is* the focus of interest.
>
> >...
>
> >which would give back the rather unsatisfactory bindings:
> > ?who: <http:...query5#who>
> > ?first: <http:...query5#first>
> > ?last: <http:...query5#last>
> >
> >By what mechanism could this happen? Maybe queries are stored in a
> >queue. A triple store compulsively scoops up stuff in the queue and
> >writes it down. A query engine pops off the queue that look like
> >queries. It finds this query, asks a lot of resources, ends of getting
> >back an answer from the compulsive scooper.
>
> Like I said, poor organization/management/engineering...
>
> If a harvestor is gobbling up stuff with little to no discrimination,
> and/or if the same queue is being used by a query engine *and* a
> harvester gathering knowledge that may fall within the focus of
> executed queries, then that's just poor engineering plain and simple.
>
> >
> >The assertions in that RDF form of the query are not actually
> >assertions. But they look like assertions so we'd have to keep them
> >insulated from the RDF world all their life.
>
> Sure. And we can have a nice non-normative section of the spec that
> covers all sorts of "don't do this" scenarios.
>
> >(For those notstalgic
> >about historical spam, "Poor little graph31825 can't leave his
> >bubble. Please send postcards...") While it may be handy to use OWL
> >and RDFS inferencing tools, to manipulate RDF-like forms of this
> >query, I think the risk of graph "assertions" like this is very high.
>
> I don't. As long as folks realize that graphs containing queries should
> not (usually) be mixed with graphs containing general assertions, all
> will be well.
>
> If some folks are careless, sloppy, or ignorant and merge such graphs
> then strangeness could result (though I'm not convinced
> that anything bad would actually happen, just that query results
> could be of degraded utility).
>
> A query graph is essentially a claim. There exist some target resources
> which have certain characteristics, etc. and execution of the query
> is figuring out how to make the claims true, and providing all the
> evidence.
>
> So if you merge a query graph with your main knowledge base, the claims
> are still valid -- you're saying "some" resource exists that has... --
> yet
> since all you have are bnodes, you don't know exactly which one it is,
> and any query executed against those query-based claims would be *true*,
> just not very informative.
>
> >
> >An alternative would be to reify the query,
>
> Ugh. Please no.
>
> And there's no need. A query expressed in RDF is making certain claims.
> And those claims are true no matter what other graphs they get
> syndicated
> into. Whether you *should* syndicate those claims into other graphs is
> the
> real issue here, not the fact that the query is expressed in RDF.
>
> Indiscriminate syndication will always lead to headaches. Be careful
> what you eat!
>
> >
> >Expressing results in RDF:
> >
> >It is not necessary that RDF query results be expressed as statements
> >for query federation.
>
> Never said it was necessary, only that it was very useful, because from
> start to finish an agent is able to work with RDF graphs rather than
> multiple serializations.
>
> >Let's look at a fairly flushed out federation
> >scenario.
> >
> >federatedAnnotFoaf:
> >Client query: the name and email addrs of everyone who has created
> >Annotea
> >annotations:
> >
> > ?annot dc:created ?when
> > ?annot dc:creator ?who
> > ?who a:Email ?email
> > ?who foaf:givenName ?first
> > ?who foaf:surname ?last
> >
> >We send this query to http://www.w3.org/?DAWG and it break the query
> >up into the pieces that it knows there is an agent to handle.
>
> We've now dipped below the specifics of what the DAWG spec would
> define, so everthing up to XXX below is now out of scope...
>
> > It sends
> >
> >ask(?annot dc:created ?when
> > ?annot dc:creator ?who
> > ?who a:Email ?email)
> >collect (?email ?when)
> >
> >to the Annotea server. The server gives back a list of email addres
> >and dates those accounts created annotations. (Annotea account names
> >are email addresses.) Let's assume first entry in this list is
> >mailto:joe@example.com .
> >
> >The query federator knows that a:Email and foaf:mbox have ranges of
> >the same data type (may 'cause one is a subPropertyOf the other) and
> >knows (maybe some heuristic based on a service advertisement) that a
> >foaf server is more likely to know foaf:mbox.
> >
> >For each of the email addresses that came back from the Annotea
> >server, the unifier composes a new query that it sends to a foaf
> >server:
> >
> > ?who a:Email <mailto:joe@example.com>
> > ?who foaf:givenName ?first
> > ?who foaf:surname ?last
> >
> >The server gives back all the combinations of first and last name for
> >joe@example.com (probably 1, modula some problems spelling Joe
> >Lambda's name).
>
> >The federator of the query drops these results into
> >the bindings table, eliminating or duplicating rows when the number of
> >results is not 1:
> >
> > date email
> > 20040311 mailto:joe@example.com
> > 20040309 mailto:bob@example.com
> > ... ...
> >
> >becomes
> >
> > date email first last
> > 20040311 mailto:joe@example.com Joe Lamda
> > 20040311 mailto:joe@example.com Joe Lambda
> > 20040309 mailto:bob@example.com Bob Robertson
> > ... ...
>
> XXX
>
> At which point, the original federator recieving the original query
> returns the final set of bindings -- which could just as well be
> expressed in RDF using the Result Set Vocabulary, so that the
> requesting agent need not have to parse yet another serialization.
>
> Thus, for DAWG to specify that variable bindings (if such are requested)
> be communicated in query results as RDF does in no way prevent or even
> complicate any of the above scenario you present above.
>
> >
> >This can continue down through as many levels of federator/proxy as
> >were involved in delivering the query. Every agent involved, including
> >the client's, has the capacity to *extract* a graph given the query
> >that it originally say and a set of bindings. This can provide the RDF
> >analog of relatoinal closure.
>
> And at each level, the same could be achieved if those bindings were
> expressed in RDF rather than some other encoding.
>
> Sorry, I fail to see any issues with expressing bindings in RDF in
> the scenario you are presenting.
>
> Specifically, how does returning the equivalent of
>
> > date email first last
> > 20040311 mailto:joe@example.com Joe Lamda
> > 20040311 mailto:joe@example.com Joe Lambda
> > 20040309 mailto:bob@example.com Bob Robertson
> > ... ...
>
> expressed in RDF using something akin to the Result Set Vocabulary
> cause you problems or in any way preventing you from doing what you
> have described above?
>
> Patrick
>
> --
>
> Patrick Stickler
> Nokia, Finland
> patrick.stickler@nokia.com
--
-eric
office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
Shonan Fujisawa Campus, Keio University,
5322 Endo, Fujisawa, Kanagawa 252-8520
JAPAN
+1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell: +1.857.222.5741 (does not work in Asia)
(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Wednesday, 24 March 2004 07:50:26 UTC