Re: Use case: federatedAnnotFoaf (and a bit about queries in RDF)

On Mar 24, 2004, at 14:45, ext Eric Prud'hommeaux wrote:

>
>
> Patrick asserts that use case federatedAnnotFoaf is out of scope (look
> for XXX below). I intended it to me a fairly detailed example of a
> fedarated query; one we could use to track what information (and
> expressivity) is needed where in order to do such federation. I would
> like to hear from others in the WG on this.

Let me clarify my position here.

I believe that (a) the machinery of managing/orchestrating a
federated/distributed query is out of scope, but (b) a standardized
query solution such as specified by DAWG can nonetheless facilitate
execution of a federated/distributed query by reducing the burden
on the agent managing/orchestrating the execution of such a
federated/distributed query.

However, the details of the managment/orchistration is, IMO, out of
scope, and therefore the bulk of your example, which detailed what
happened after the initial DAWG conformant server recieved the query
and before it returned the results should, insofar as the DAWG
recommendation is concerned, be fully opaque to the DAWG spec. That was
the main point I was trying to make with my 'XXX' notation.

Now, if you had a use case which highlighted how a consistent,
standardized DAWG solution facilitates the management/orchestration
of a federated query executed accross multiple DAWG compliant
knowledge sources -- with a clear indication that the machinery
specific to the management/orchestration of the distributed
query is itself out of scope, then fine, I'm all for that.

Maybe that's what you had attempted to do, but I couldn't see it,
particularly since I took your example (use case) as showing why results
expressed in RDF was a bad thing rather than why federated
query execution is a good thing... and the two issues are IMO
completely orthogonal.

It would probably help alot if use cases were not mixed
in with discussion/debate about other use cases/issues/etc.

Patrick


>
> On Wed, Mar 24, 2004 at 11:47:33AM +0200, Patrick Stickler wrote:
>>
>>
>> On Mar 23, 2004, at 13:00, ext Eric Prud'hommeaux wrote:
>>
>>>
>>> On Mon, Mar 22, 2004 at 11:38:33AM +0200, Patrick Stickler wrote:
>>>> (a) expressing queries and query results in RDF
>>>
>>> Expressing queries in RDF:
>>>
>>> In LiteralDB+OWL, (a new name for PS-6 in deference to DanC's naming
>>> comments), you described a scenario where a server could use
>>> owl:sameAs, rdfs:subPropertyOf, owl:<class constraints> to manipulate
>>> the graph of an RDF query *because* it was expressed in RDF. To my
>>> eye, this treads on dangerous territory -- expressing the query as a
>>> simple graph results in it being an answer to the question,
>>
>> It wouldn't be an answer to any question if it isn't syndicated into
>> a knowledge store against which the question is asked (i.e. poor
>> organization/management/engineering can result in lots of "dangers")
>>
>> If the query isn't within the focus of interest, then don't put it 
>> into
>> the knowledge base that *is* the focus of interest.
>>
>>> ...
>>
>>> which would give back the rather unsatisfactory bindings:
>>>       ?who:   <http:...query5#who>
>>>       ?first: <http:...query5#first>
>>>       ?last:  <http:...query5#last>
>>>
>>> By what mechanism could this happen? Maybe queries are stored in a
>>> queue. A triple store compulsively scoops up stuff in the queue and
>>> writes it down. A query engine pops off the queue that look like
>>> queries. It finds this query, asks a lot of resources, ends of 
>>> getting
>>> back an answer from the compulsive scooper.
>>
>> Like I said, poor organization/management/engineering...
>>
>> If a harvestor is gobbling up stuff with little to no discrimination,
>> and/or if the same queue is being used by a query engine *and* a
>> harvester gathering knowledge that may fall within the focus of
>> executed queries, then that's just poor engineering plain and simple.
>>
>>>
>>> The assertions in that RDF form of the query are not actually
>>> assertions. But they look like assertions so we'd have to keep them
>>> insulated from the RDF world all their life.
>>
>> Sure. And we can have a nice non-normative section of the spec that
>> covers all sorts of "don't do this" scenarios.
>>
>>> (For those notstalgic
>>> about historical spam, "Poor little graph31825 can't leave his
>>> bubble. Please send postcards...") While it may be handy to use OWL
>>> and RDFS inferencing tools, to manipulate RDF-like forms of this
>>> query, I think the risk of graph "assertions" like this is very high.
>>
>> I don't. As long as folks realize that graphs containing queries 
>> should
>> not (usually) be mixed with graphs containing general assertions, all
>> will be well.
>>
>> If some folks are careless, sloppy, or ignorant and merge such graphs
>> then strangeness could result (though I'm not convinced
>> that anything bad would actually happen, just that query results
>> could be of degraded utility).
>>
>> A query graph is essentially a claim. There exist some target 
>> resources
>> which have certain characteristics, etc. and execution of the query
>> is figuring out how to make the claims true, and providing all the
>> evidence.
>>
>> So if you merge a query graph with your main knowledge base, the 
>> claims
>> are still valid -- you're saying "some" resource exists that has... --
>> yet
>> since all you have are bnodes, you don't know exactly which one it is,
>> and any query executed against those query-based claims would be 
>> *true*,
>> just not very informative.
>>
>>>
>>> An alternative would be to reify the query,
>>
>> Ugh. Please no.
>>
>> And there's no need. A query expressed in RDF is making certain 
>> claims.
>> And those claims are true no matter what other graphs they get
>> syndicated
>> into. Whether you *should* syndicate those claims into other graphs is
>> the
>> real issue here, not the fact that the query is expressed in RDF.
>>
>> Indiscriminate syndication will always lead to headaches. Be careful
>> what you eat!
>>
>>>
>>> Expressing results in RDF:
>>>
>>> It is not necessary that RDF query results be expressed as statements
>>> for query federation.
>>
>> Never said it was necessary, only that it was very useful, because 
>> from
>> start to finish an agent is able to work with RDF graphs rather than
>> multiple serializations.
>>
>>> Let's look at a fairly flushed out federation
>>> scenario.
>>>
>>> federatedAnnotFoaf:
>>> Client query: the name and email addrs of everyone who has created
>>> Annotea
>>> annotations:
>>>
>>>    ?annot dc:created     ?when
>>>    ?annot dc:creator     ?who
>>>    ?who   a:Email        ?email
>>>    ?who   foaf:givenName ?first
>>>    ?who   foaf:surname   ?last
>>>
>>> We send this query to http://www.w3.org/?DAWG and it break the query
>>> up into the pieces that it knows there is an agent to handle.
>>
>> We've now dipped below the specifics of what the DAWG spec would
>> define, so everthing up to XXX below is now out of scope...
>>
>>> It sends
>>>
>>> ask(?annot dc:created     ?when
>>>    ?annot dc:creator     ?who
>>>    ?who   a:Email        ?email)
>>> collect (?email ?when)
>>>
>>> to the Annotea server. The server gives back a list of email addres
>>> and dates those accounts created annotations. (Annotea account names
>>> are email addresses.) Let's assume first entry in this list is
>>> mailto:joe@example.com .
>>>
>>> The query federator knows that a:Email and foaf:mbox have ranges of
>>> the same data type (may 'cause one is a subPropertyOf the other) and
>>> knows (maybe some heuristic based on a service advertisement) that a
>>> foaf server is more likely to know foaf:mbox.
>>>
>>> For each of the email addresses that came back from the Annotea
>>> server, the unifier composes a new query that it sends to a foaf
>>> server:
>>>
>>>    ?who   a:Email        <mailto:joe@example.com>
>>>    ?who   foaf:givenName ?first
>>>    ?who   foaf:surname   ?last
>>>
>>> The server gives back all the combinations of first and last name for
>>> joe@example.com (probably 1, modula some problems spelling Joe
>>> Lambda's name).
>>
>>> The federator of the query drops these results into
>>> the bindings table, eliminating or duplicating rows when the number 
>>> of
>>> results is not 1:
>>>
>>>    date      email
>>>    20040311  mailto:joe@example.com
>>>    20040309  mailto:bob@example.com
>>>    ...       ...
>>>
>>> becomes
>>>
>>>    date      email                   first   last
>>>    20040311  mailto:joe@example.com  Joe     Lamda
>>>    20040311  mailto:joe@example.com  Joe     Lambda
>>>    20040309  mailto:bob@example.com  Bob     Robertson
>>>    ...       ...
>>
>> XXX
>>
>> At which point, the original federator recieving the original query
>> returns the final set of bindings -- which could just as well be
>> expressed in RDF using the Result Set Vocabulary, so that the
>> requesting agent need not have to parse yet another serialization.
>>
>> Thus, for DAWG to specify that variable bindings (if such are 
>> requested)
>> be communicated in query results as RDF does in no way prevent or even
>> complicate any of the above scenario you present above.
>>
>>>
>>> This can continue down through as many levels of federator/proxy as
>>> were involved in delivering the query. Every agent involved, 
>>> including
>>> the client's, has the capacity to *extract* a graph given the query
>>> that it originally say and a set of bindings. This can provide the 
>>> RDF
>>> analog of relatoinal closure.
>>
>> And at each level, the same could be achieved if those bindings were
>> expressed in RDF rather than some other encoding.
>>
>> Sorry, I fail to see any issues with expressing bindings in RDF in
>> the scenario you are presenting.
>>
>> Specifically, how does returning the equivalent of
>>
>>>    date      email                   first   last
>>>    20040311  mailto:joe@example.com  Joe     Lamda
>>>    20040311  mailto:joe@example.com  Joe     Lambda
>>>    20040309  mailto:bob@example.com  Bob     Robertson
>>>    ...       ...
>>
>> expressed in RDF using something akin to the Result Set Vocabulary
>> cause you problems or in any way preventing you from doing what you
>> have described above?
>>
>> Patrick
>>
>> --
>>
>> Patrick Stickler
>> Nokia, Finland
>> patrick.stickler@nokia.com
>
> -- 
> -eric
>
> office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
>                         Shonan Fujisawa Campus, Keio University,
>                         5322 Endo, Fujisawa, Kanagawa 252-8520
>                         JAPAN
>         +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
> cell:   +1.857.222.5741 (does not work in Asia)
>
> (eric@w3.org)
> Feel free to forward this message to any list for any purpose other 
> than
> email address distribution.
>
>

--

Patrick Stickler
Nokia, Finland
patrick.stickler@nokia.com

Received on Wednesday, 24 March 2004 08:55:33 UTC