- From: Danny Ayers <danny666@virgilio.it>
- Date: Fri, 19 Apr 2002 15:13:26 +0200
- To: "Dan Brickley" <danbri@w3.org>, "Jeremy Carroll" <jjc@hplb.hpl.hp.com>
- Cc: <www-rdf-interest@w3.org>
>If a graph navigation API is too granular for dealing with remote services >that aren't easily conceptualised as simple triple queries, maybe we >should be doing this at the rdf query level. Peeling apart an RDF query >(in one of the 'gimme bindings for this variable-name-decorated graph' >languages, Squish/RDFdbQL, Algae, RDQL etc), and sending a subquery to the >specialised Web service. Yep, that's why I though Jena would be a good fit - not only Java (easy Google interfacing) also got RDQL. >I've started to hack on this (in Ruby, having dumped Perl :) > >rough scribbles: >http://www.w3.org/2001/12/rubyrdf/squish/service/webfetch_tests.rb Looking good (though I can't comment on Ruby) >...currently gets a result set through (by hand) doing part of the query >against a local RDF graph, and part by calling a (different, scraped >still) Google backlinks API. This is an obvious candidate for automation >based on Web service description, and seems to offer a nice curve from >'simple stuff do-able now' to 'phd territory distributed query'. I've no >plans to go near the latter end! I do want to make demos that show some >basic Web service composition techniques though, ie. a single RDF query >serviced by consulting multiple Web services and binding the results >together, where the decomposition into sub-tasks is done by reading >RDF service descriptions (WSDL++++?). I'm not sure anyone needs to go near the latter end - as long as the interfaces to the services are reasonably compatible (SOAP-RDF I guess) then at the moment when the first service connects to the second, you're in distributed query territory. Major potential for network effects! (At least) a couple of issues need ironing out - rules for stopping/preventing loops, though timing & passing a 'already visited' path along with queries may be starting points (I bet this is in a web services standard somewhere already). There's also the small matter of ensuring a good quality/quantity ratio back from the queries (not unrelated to the UNBOUND BOUND UNBOUND issue you mention below). >the current hack is as follows: [snip] >This finds us one result, [snip] which I reckon is a reasonable proof-of-concept ;-) >Neither Google nor the local RSS feed has enough information alone to >answer the question. The common data model, and common naming system >(URIs) help us get to the answer. ASIDE: Interestingly, having a common >'Ontology' between Google and RSS/events was mostly irrelevant to the >problem. We got a match not because the two data sources shared an >ontology, but because they both used the same URI to name an individual >thing. Interesting. Now what about scraping URLs from blogs... >I want to try a couple of things next: > > (1) make the query engine understand when bits of a query can be >farmed out >to remote services; look at requirements on web service description that >make this deployable in the wild > > eg: the Google lookup services (scraped; not in their SOAP API yet) > > a) map onto a property (eg. goo:backlinks) and > b) expect BOUND, BOUND, UNBOUND for subject/predicate/object in >your query > ie. you can ask for 'page1.html goo:backlinks ?p' and get >back (several) values for ?p > but you can't (unlike other RDF data sources) ask it for > UNBOUND BOUND UNBOUND and expect a dump of all their backlinks >investigate how this relates to the (also handy) goal of sitting >these services behind a graph API, and hiding (partially) their remoteness >from users. ie. redo my old Perl hack properly. Again, I fancy Jena as a concentrator. >We can create backends for most RDF APIs that do the trick of going off to >Google when certain kinds of questions are asked. But we also need in some >contexts to expose this behaviour, since application code will need to be >sensitive to such goings on, eg. for the purposes of asking questions in >a sensible right order. > >ie. if I am a query engine that does the job of implementing rdf query >against a plain 'match these possibly blanked-out triples', I can't be >entirely agnostic about what's going on behind the RDF API. Or I can, but >if I ask the triple questions in thr wrong order, I'll miss out on answers. >We need to know that the backend will only be able to answer >'bound goo:backlinks unbound' or 'bound goo:backlinks bound' but not >'unbound goo:backlinks unbound'. > >Same sort of thing goes for substring searches etc., if they're being >plugged in secretetly behind an RDF graph API and applications are >trying to do query on top of those (instead of passing entire queries and >subqueries through to systems closer to the data). Hmm - this raises the idea of micro- and macro- reasoning, which could potentially be done using exactly the same inference tools, only at a different level of granularity? >I guess it'd be healthy to come up with some more practical use cases for >queries where part (but not all) of the work is done by google. Then map >these onto properties, eg. goo:backlinks, goo:goodMatchForQueryString, >goo:relatedPage, goo:assignedDmozCategory etc etc. Yep - that's the kind of mapping I had in mind, almost direct from the API. There's also other metadata (HTML+scraped) available from the returned links that could be repackaged as triples. Simple, practical use cases definitely needed ('I made a search engine out of Google' doesn't sound very convincing ;-) Cheers, Danny.
Received on Friday, 19 April 2002 09:18:54 UTC