Re: Use case: federatedAnnotFoaf (and a bit about queries in RDF) from Eric Prud'hommeaux on 2004-03-24 (public-rdf-dawg@w3.org from January to March 2004)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Wed, 24 Mar 2004 07:45:48 -0500
To: Patrick Stickler <patrick.stickler@nokia.com>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-ID: <20040324124548.GB13553@w3.org>
Patrick asserts that use case federatedAnnotFoaf is out of scope (look
for XXX below). I intended it to me a fairly detailed example of a
fedarated query; one we could use to track what information (and
expressivity) is needed where in order to do such federation. I would
like to hear from others in the WG on this.

On Wed, Mar 24, 2004 at 11:47:33AM +0200, Patrick Stickler wrote:
> 
> 
> On Mar 23, 2004, at 13:00, ext Eric Prud'hommeaux wrote:
> 
> >
> >On Mon, Mar 22, 2004 at 11:38:33AM +0200, Patrick Stickler wrote:
> >>(a) expressing queries and query results in RDF
> >
> >Expressing queries in RDF:
> >
> >In LiteralDB+OWL, (a new name for PS-6 in deference to DanC's naming
> >comments), you described a scenario where a server could use
> >owl:sameAs, rdfs:subPropertyOf, owl:<class constraints> to manipulate
> >the graph of an RDF query *because* it was expressed in RDF. To my
> >eye, this treads on dangerous territory -- expressing the query as a
> >simple graph results in it being an answer to the question,
> 
> It wouldn't be an answer to any question if it isn't syndicated into
> a knowledge store against which the question is asked (i.e. poor
> organization/management/engineering can result in lots of "dangers")
> 
> If the query isn't within the focus of interest, then don't put it into
> the knowledge base that *is* the focus of interest.
> 
> >...
> 
> >which would give back the rather unsatisfactory bindings:
> >       ?who:   <http:...query5#who>
> >       ?first: <http:...query5#first>
> >       ?last:  <http:...query5#last>
> >
> >By what mechanism could this happen? Maybe queries are stored in a
> >queue. A triple store compulsively scoops up stuff in the queue and
> >writes it down. A query engine pops off the queue that look like
> >queries. It finds this query, asks a lot of resources, ends of getting
> >back an answer from the compulsive scooper.
> 
> Like I said, poor organization/management/engineering...
> 
> If a harvestor is gobbling up stuff with little to no discrimination,
> and/or if the same queue is being used by a query engine *and* a
> harvester gathering knowledge that may fall within the focus of
> executed queries, then that's just poor engineering plain and simple.
> 
> >
> >The assertions in that RDF form of the query are not actually
> >assertions. But they look like assertions so we'd have to keep them
> >insulated from the RDF world all their life.
> 
> Sure. And we can have a nice non-normative section of the spec that
> covers all sorts of "don't do this" scenarios.
> 
> >(For those notstalgic
> >about historical spam, "Poor little graph31825 can't leave his
> >bubble. Please send postcards...") While it may be handy to use OWL
> >and RDFS inferencing tools, to manipulate RDF-like forms of this
> >query, I think the risk of graph "assertions" like this is very high.
> 
> I don't. As long as folks realize that graphs containing queries should
> not (usually) be mixed with graphs containing general assertions, all
> will be well.
> 
> If some folks are careless, sloppy, or ignorant and merge such graphs
> then strangeness could result (though I'm not convinced
> that anything bad would actually happen, just that query results
> could be of degraded utility).
> 
> A query graph is essentially a claim. There exist some target resources
> which have certain characteristics, etc. and execution of the query
> is figuring out how to make the claims true, and providing all the
> evidence.
> 
> So if you merge a query graph with your main knowledge base, the claims
> are still valid -- you're saying "some" resource exists that has... -- 
> yet
> since all you have are bnodes, you don't know exactly which one it is,
> and any query executed against those query-based claims would be *true*,
> just not very informative.
> 
> >
> >An alternative would be to reify the query,
> 
> Ugh. Please no.
> 
> And there's no need. A query expressed in RDF is making certain claims.
> And those claims are true no matter what other graphs they get 
> syndicated
> into. Whether you *should* syndicate those claims into other graphs is 
> the
> real issue here, not the fact that the query is expressed in RDF.
> 
> Indiscriminate syndication will always lead to headaches. Be careful 
> what you eat!
> 
> >
> >Expressing results in RDF:
> >
> >It is not necessary that RDF query results be expressed as statements
> >for query federation.
> 
> Never said it was necessary, only that it was very useful, because from
> start to finish an agent is able to work with RDF graphs rather than
> multiple serializations.
> 
> >Let's look at a fairly flushed out federation
> >scenario.
> >
> >federatedAnnotFoaf:
> >Client query: the name and email addrs of everyone who has created 
> >Annotea
> >annotations:
> >
> >    ?annot dc:created     ?when
> >    ?annot dc:creator     ?who
> >    ?who   a:Email        ?email
> >    ?who   foaf:givenName ?first
> >    ?who   foaf:surname   ?last
> >
> >We send this query to http://www.w3.org/?DAWG and it break the query
> >up into the pieces that it knows there is an agent to handle.
> 
> We've now dipped below the specifics of what the DAWG spec would
> define, so everthing up to XXX below is now out of scope...
> 
> > It sends
> >
> >ask(?annot dc:created     ?when
> >    ?annot dc:creator     ?who
> >    ?who   a:Email        ?email)
> >collect (?email ?when)
> >
> >to the Annotea server. The server gives back a list of email addres
> >and dates those accounts created annotations. (Annotea account names
> >are email addresses.) Let's assume first entry in this list is
> >mailto:joe@example.com .
> >
> >The query federator knows that a:Email and foaf:mbox have ranges of
> >the same data type (may 'cause one is a subPropertyOf the other) and
> >knows (maybe some heuristic based on a service advertisement) that a
> >foaf server is more likely to know foaf:mbox.
> >
> >For each of the email addresses that came back from the Annotea
> >server, the unifier composes a new query that it sends to a foaf
> >server:
> >
> >    ?who   a:Email        <mailto:joe@example.com>
> >    ?who   foaf:givenName ?first
> >    ?who   foaf:surname   ?last
> >
> >The server gives back all the combinations of first and last name for
> >joe@example.com (probably 1, modula some problems spelling Joe
> >Lambda's name).
> 
> >The federator of the query drops these results into
> >the bindings table, eliminating or duplicating rows when the number of
> >results is not 1:
> >
> >    date      email
> >    20040311  mailto:joe@example.com
> >    20040309  mailto:bob@example.com
> >    ...       ...
> >
> >becomes
> >
> >    date      email                   first   last
> >    20040311  mailto:joe@example.com  Joe     Lamda
> >    20040311  mailto:joe@example.com  Joe     Lambda
> >    20040309  mailto:bob@example.com  Bob     Robertson
> >    ...       ...
> 
> XXX
> 
> At which point, the original federator recieving the original query
> returns the final set of bindings -- which could just as well be
> expressed in RDF using the Result Set Vocabulary, so that the
> requesting agent need not have to parse yet another serialization.
> 
> Thus, for DAWG to specify that variable bindings (if such are requested)
> be communicated in query results as RDF does in no way prevent or even
> complicate any of the above scenario you present above.
> 
> >
> >This can continue down through as many levels of federator/proxy as
> >were involved in delivering the query. Every agent involved, including
> >the client's, has the capacity to *extract* a graph given the query
> >that it originally say and a set of bindings. This can provide the RDF
> >analog of relatoinal closure.
> 
> And at each level, the same could be achieved if those bindings were
> expressed in RDF rather than some other encoding.
> 
> Sorry, I fail to see any issues with expressing bindings in RDF in
> the scenario you are presenting.
> 
> Specifically, how does returning the equivalent of
> 
> >    date      email                   first   last
> >    20040311  mailto:joe@example.com  Joe     Lamda
> >    20040311  mailto:joe@example.com  Joe     Lambda
> >    20040309  mailto:bob@example.com  Bob     Robertson
> >    ...       ...
> 
> expressed in RDF using something akin to the Result Set Vocabulary
> cause you problems or in any way preventing you from doing what you
> have described above?
> 
> Patrick
> 
> --
> 
> Patrick Stickler
> Nokia, Finland
> patrick.stickler@nokia.com

-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741 (does not work in Asia)

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
Received on Wednesday, 24 March 2004 07:50:26 UTC