Re: Use case: federatedAnnotFoaf (and a bit about queries in RDF) from Patrick Stickler on 2004-03-24 (public-rdf-dawg@w3.org from January to March 2004)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Wed, 24 Mar 2004 11:47:33 +0200
To: "ext Eric Prud'hommeaux" <eric@w3.org>
Cc: RDF Data Access Working Group <public-rdf-dawg@w3.org>
Message-Id: <464FB932-7D78-11D8-858C-000A95EAFCEA@nokia.com>
On Mar 23, 2004, at 13:00, ext Eric Prud'hommeaux wrote:

>
> On Mon, Mar 22, 2004 at 11:38:33AM +0200, Patrick Stickler wrote:
>> (a) expressing queries and query results in RDF
>
> Expressing queries in RDF:
>
> In LiteralDB+OWL, (a new name for PS-6 in deference to DanC's naming
> comments), you described a scenario where a server could use
> owl:sameAs, rdfs:subPropertyOf, owl:<class constraints> to manipulate
> the graph of an RDF query *because* it was expressed in RDF. To my
> eye, this treads on dangerous territory -- expressing the query as a
> simple graph results in it being an answer to the question,

It wouldn't be an answer to any question if it isn't syndicated into
a knowledge store against which the question is asked (i.e. poor
organization/management/engineering can result in lots of "dangers")

If the query isn't within the focus of interest, then don't put it into
the knowledge base that *is* the focus of interest.

> ...

> which would give back the rather unsatisfactory bindings:
>        ?who:   <http:...query5#who>
>        ?first: <http:...query5#first>
>        ?last:  <http:...query5#last>
>
> By what mechanism could this happen? Maybe queries are stored in a
> queue. A triple store compulsively scoops up stuff in the queue and
> writes it down. A query engine pops off the queue that look like
> queries. It finds this query, asks a lot of resources, ends of getting
> back an answer from the compulsive scooper.

Like I said, poor organization/management/engineering...

If a harvestor is gobbling up stuff with little to no discrimination,
and/or if the same queue is being used by a query engine *and* a
harvester gathering knowledge that may fall within the focus of
executed queries, then that's just poor engineering plain and simple.

>
> The assertions in that RDF form of the query are not actually
> assertions. But they look like assertions so we'd have to keep them
> insulated from the RDF world all their life.

Sure. And we can have a nice non-normative section of the spec that
covers all sorts of "don't do this" scenarios.

> (For those notstalgic
> about historical spam, "Poor little graph31825 can't leave his
> bubble. Please send postcards...") While it may be handy to use OWL
> and RDFS inferencing tools, to manipulate RDF-like forms of this
> query, I think the risk of graph "assertions" like this is very high.

I don't. As long as folks realize that graphs containing queries should
not (usually) be mixed with graphs containing general assertions, all
will be well.

If some folks are careless, sloppy, or ignorant and merge such graphs
then strangeness could result (though I'm not convinced
that anything bad would actually happen, just that query results
could be of degraded utility).

A query graph is essentially a claim. There exist some target resources
which have certain characteristics, etc. and execution of the query
is figuring out how to make the claims true, and providing all the
evidence.

So if you merge a query graph with your main knowledge base, the claims
are still valid -- you're saying "some" resource exists that has... -- 
yet
since all you have are bnodes, you don't know exactly which one it is,
and any query executed against those query-based claims would be *true*,
just not very informative.

>
> An alternative would be to reify the query,

Ugh. Please no.

And there's no need. A query expressed in RDF is making certain claims.
And those claims are true no matter what other graphs they get 
syndicated
into. Whether you *should* syndicate those claims into other graphs is 
the
real issue here, not the fact that the query is expressed in RDF.

Indiscriminate syndication will always lead to headaches. Be careful 
what you eat!

>
> Expressing results in RDF:
>
> It is not necessary that RDF query results be expressed as statements
> for query federation.

Never said it was necessary, only that it was very useful, because from
start to finish an agent is able to work with RDF graphs rather than
multiple serializations.

> Let's look at a fairly flushed out federation
> scenario.
>
> federatedAnnotFoaf:
> Client query: the name and email addrs of everyone who has created 
> Annotea
> annotations:
>
>     ?annot dc:created     ?when
>     ?annot dc:creator     ?who
>     ?who   a:Email        ?email
>     ?who   foaf:givenName ?first
>     ?who   foaf:surname   ?last
>
> We send this query to http://www.w3.org/?DAWG and it break the query
> up into the pieces that it knows there is an agent to handle.

We've now dipped below the specifics of what the DAWG spec would
define, so everthing up to XXX below is now out of scope...

>  It sends
>
> ask(?annot dc:created     ?when
>     ?annot dc:creator     ?who
>     ?who   a:Email        ?email)
> collect (?email ?when)
>
> to the Annotea server. The server gives back a list of email addres
> and dates those accounts created annotations. (Annotea account names
> are email addresses.) Let's assume first entry in this list is
> mailto:joe@example.com .
>
> The query federator knows that a:Email and foaf:mbox have ranges of
> the same data type (may 'cause one is a subPropertyOf the other) and
> knows (maybe some heuristic based on a service advertisement) that a
> foaf server is more likely to know foaf:mbox.
>
> For each of the email addresses that came back from the Annotea
> server, the unifier composes a new query that it sends to a foaf
> server:
>
>     ?who   a:Email        <mailto:joe@example.com>
>     ?who   foaf:givenName ?first
>     ?who   foaf:surname   ?last
>
> The server gives back all the combinations of first and last name for
> joe@example.com (probably 1, modula some problems spelling Joe
> Lambda's name).

> The federator of the query drops these results into
> the bindings table, eliminating or duplicating rows when the number of
> results is not 1:
>
>     date      email
>     20040311  mailto:joe@example.com
>     20040309  mailto:bob@example.com
>     ...       ...
>
> becomes
>
>     date      email                   first   last
>     20040311  mailto:joe@example.com  Joe     Lamda
>     20040311  mailto:joe@example.com  Joe     Lambda
>     20040309  mailto:bob@example.com  Bob     Robertson
>     ...       ...

XXX

At which point, the original federator recieving the original query
returns the final set of bindings -- which could just as well be
expressed in RDF using the Result Set Vocabulary, so that the
requesting agent need not have to parse yet another serialization.

Thus, for DAWG to specify that variable bindings (if such are requested)
be communicated in query results as RDF does in no way prevent or even
complicate any of the above scenario you present above.

>
> This can continue down through as many levels of federator/proxy as
> were involved in delivering the query. Every agent involved, including
> the client's, has the capacity to *extract* a graph given the query
> that it originally say and a set of bindings. This can provide the RDF
> analog of relatoinal closure.

And at each level, the same could be achieved if those bindings were
expressed in RDF rather than some other encoding.

Sorry, I fail to see any issues with expressing bindings in RDF in
the scenario you are presenting.

Specifically, how does returning the equivalent of

>     date      email                   first   last
>     20040311  mailto:joe@example.com  Joe     Lamda
>     20040311  mailto:joe@example.com  Joe     Lambda
>     20040309  mailto:bob@example.com  Bob     Robertson
>     ...       ...

expressed in RDF using something akin to the Result Set Vocabulary
cause you problems or in any way preventing you from doing what you
have described above?

Patrick

--

Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com
Received on Wednesday, 24 March 2004 05:02:39 UTC