- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Wed, 24 Mar 2004 07:45:48 -0500
- To: Patrick Stickler <patrick.stickler@nokia.com>
- Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Patrick asserts that use case federatedAnnotFoaf is out of scope (look for XXX below). I intended it to me a fairly detailed example of a fedarated query; one we could use to track what information (and expressivity) is needed where in order to do such federation. I would like to hear from others in the WG on this. On Wed, Mar 24, 2004 at 11:47:33AM +0200, Patrick Stickler wrote: > > > On Mar 23, 2004, at 13:00, ext Eric Prud'hommeaux wrote: > > > > >On Mon, Mar 22, 2004 at 11:38:33AM +0200, Patrick Stickler wrote: > >>(a) expressing queries and query results in RDF > > > >Expressing queries in RDF: > > > >In LiteralDB+OWL, (a new name for PS-6 in deference to DanC's naming > >comments), you described a scenario where a server could use > >owl:sameAs, rdfs:subPropertyOf, owl:<class constraints> to manipulate > >the graph of an RDF query *because* it was expressed in RDF. To my > >eye, this treads on dangerous territory -- expressing the query as a > >simple graph results in it being an answer to the question, > > It wouldn't be an answer to any question if it isn't syndicated into > a knowledge store against which the question is asked (i.e. poor > organization/management/engineering can result in lots of "dangers") > > If the query isn't within the focus of interest, then don't put it into > the knowledge base that *is* the focus of interest. > > >... > > >which would give back the rather unsatisfactory bindings: > > ?who: <http:...query5#who> > > ?first: <http:...query5#first> > > ?last: <http:...query5#last> > > > >By what mechanism could this happen? Maybe queries are stored in a > >queue. A triple store compulsively scoops up stuff in the queue and > >writes it down. A query engine pops off the queue that look like > >queries. It finds this query, asks a lot of resources, ends of getting > >back an answer from the compulsive scooper. > > Like I said, poor organization/management/engineering... > > If a harvestor is gobbling up stuff with little to no discrimination, > and/or if the same queue is being used by a query engine *and* a > harvester gathering knowledge that may fall within the focus of > executed queries, then that's just poor engineering plain and simple. > > > > >The assertions in that RDF form of the query are not actually > >assertions. But they look like assertions so we'd have to keep them > >insulated from the RDF world all their life. > > Sure. And we can have a nice non-normative section of the spec that > covers all sorts of "don't do this" scenarios. > > >(For those notstalgic > >about historical spam, "Poor little graph31825 can't leave his > >bubble. Please send postcards...") While it may be handy to use OWL > >and RDFS inferencing tools, to manipulate RDF-like forms of this > >query, I think the risk of graph "assertions" like this is very high. > > I don't. As long as folks realize that graphs containing queries should > not (usually) be mixed with graphs containing general assertions, all > will be well. > > If some folks are careless, sloppy, or ignorant and merge such graphs > then strangeness could result (though I'm not convinced > that anything bad would actually happen, just that query results > could be of degraded utility). > > A query graph is essentially a claim. There exist some target resources > which have certain characteristics, etc. and execution of the query > is figuring out how to make the claims true, and providing all the > evidence. > > So if you merge a query graph with your main knowledge base, the claims > are still valid -- you're saying "some" resource exists that has... -- > yet > since all you have are bnodes, you don't know exactly which one it is, > and any query executed against those query-based claims would be *true*, > just not very informative. > > > > >An alternative would be to reify the query, > > Ugh. Please no. > > And there's no need. A query expressed in RDF is making certain claims. > And those claims are true no matter what other graphs they get > syndicated > into. Whether you *should* syndicate those claims into other graphs is > the > real issue here, not the fact that the query is expressed in RDF. > > Indiscriminate syndication will always lead to headaches. Be careful > what you eat! > > > > >Expressing results in RDF: > > > >It is not necessary that RDF query results be expressed as statements > >for query federation. > > Never said it was necessary, only that it was very useful, because from > start to finish an agent is able to work with RDF graphs rather than > multiple serializations. > > >Let's look at a fairly flushed out federation > >scenario. > > > >federatedAnnotFoaf: > >Client query: the name and email addrs of everyone who has created > >Annotea > >annotations: > > > > ?annot dc:created ?when > > ?annot dc:creator ?who > > ?who a:Email ?email > > ?who foaf:givenName ?first > > ?who foaf:surname ?last > > > >We send this query to http://www.w3.org/?DAWG and it break the query > >up into the pieces that it knows there is an agent to handle. > > We've now dipped below the specifics of what the DAWG spec would > define, so everthing up to XXX below is now out of scope... > > > It sends > > > >ask(?annot dc:created ?when > > ?annot dc:creator ?who > > ?who a:Email ?email) > >collect (?email ?when) > > > >to the Annotea server. The server gives back a list of email addres > >and dates those accounts created annotations. (Annotea account names > >are email addresses.) Let's assume first entry in this list is > >mailto:joe@example.com . > > > >The query federator knows that a:Email and foaf:mbox have ranges of > >the same data type (may 'cause one is a subPropertyOf the other) and > >knows (maybe some heuristic based on a service advertisement) that a > >foaf server is more likely to know foaf:mbox. > > > >For each of the email addresses that came back from the Annotea > >server, the unifier composes a new query that it sends to a foaf > >server: > > > > ?who a:Email <mailto:joe@example.com> > > ?who foaf:givenName ?first > > ?who foaf:surname ?last > > > >The server gives back all the combinations of first and last name for > >joe@example.com (probably 1, modula some problems spelling Joe > >Lambda's name). > > >The federator of the query drops these results into > >the bindings table, eliminating or duplicating rows when the number of > >results is not 1: > > > > date email > > 20040311 mailto:joe@example.com > > 20040309 mailto:bob@example.com > > ... ... > > > >becomes > > > > date email first last > > 20040311 mailto:joe@example.com Joe Lamda > > 20040311 mailto:joe@example.com Joe Lambda > > 20040309 mailto:bob@example.com Bob Robertson > > ... ... > > XXX > > At which point, the original federator recieving the original query > returns the final set of bindings -- which could just as well be > expressed in RDF using the Result Set Vocabulary, so that the > requesting agent need not have to parse yet another serialization. > > Thus, for DAWG to specify that variable bindings (if such are requested) > be communicated in query results as RDF does in no way prevent or even > complicate any of the above scenario you present above. > > > > >This can continue down through as many levels of federator/proxy as > >were involved in delivering the query. Every agent involved, including > >the client's, has the capacity to *extract* a graph given the query > >that it originally say and a set of bindings. This can provide the RDF > >analog of relatoinal closure. > > And at each level, the same could be achieved if those bindings were > expressed in RDF rather than some other encoding. > > Sorry, I fail to see any issues with expressing bindings in RDF in > the scenario you are presenting. > > Specifically, how does returning the equivalent of > > > date email first last > > 20040311 mailto:joe@example.com Joe Lamda > > 20040311 mailto:joe@example.com Joe Lambda > > 20040309 mailto:bob@example.com Bob Robertson > > ... ... > > expressed in RDF using something akin to the Result Set Vocabulary > cause you problems or in any way preventing you from doing what you > have described above? > > Patrick > > -- > > Patrick Stickler > Nokia, Finland > patrick.stickler@nokia.com -- -eric office: +81.466.49.1170 W3C, Keio Research Institute at SFC, Shonan Fujisawa Campus, Keio University, 5322 Endo, Fujisawa, Kanagawa 252-8520 JAPAN +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA cell: +1.857.222.5741 (does not work in Asia) (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Wednesday, 24 March 2004 07:50:26 UTC