Re: The Power of Virtuoso Sponger Technology from Kingsley Idehen on 2009-10-18 (public-lod@w3.org from October 2009)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sun, 18 Oct 2009 14:17:41 -0400
To: public-lod@w3.org
Message-ID: <4ADB5BC5.8080204@openlinksw.com>
Frederick Giasson wrote:
> Hi all,
>
>>>> The Web of Linked
>>>> Data shouldn't be about mass crawling (search engine style) 
>>>> etc...      
>>>
>>> It has to be. How would you answer a query like "all offers for a book
>>> written by a German author" without crawling the relevant data sets?
>
> First question would be: which dataset has this information? Does 
> amazon has it, or does it needs to be linked to other people dataset 
> where you can find such information? (which brings all the question of 
> disambiguation of entities, etc...)

Fred,

Disambiguation is handled in our case via the faceted search engine 
component of Virtuoso.
>
> In any case, there are multiple ways to endup with more or less the 
> same result. Tell me if I am right, but I think that the current set 
> of related cartridges only get data from a book URL? So, it is just 
> converting data about a particular book, for a given URL, using some 
> API (amazon in this case).
Even if you start with an Amazon URL you will not only have pathways to 
an Amazon data space hosted graph. You will have interesting pathways to 
O'Reilly, eBay, and many other places. Naturally, we also have the LOD 
Cloud Cache, Sindice, and other data spaces that play various parts in 
the processing pipeline.
>
> What about search URLs, using search APIs from the same services? 
Yahoo! Bing! Google (even), and others are all part of the cocktail of 
services for which we've developed lookup and inference rules driven 
Meta Cartridges.
> I can certainly think about a cartridge that does just this: searching 
> for items, and returning the resultsets in RDF using some ontologies. 
> And then you use the current cartridge to get all the information 
> about the items you care about in the resultset.
Yes, of course, and doing lookups against the LOD Cloud Cache and other 
sources.
>
> One thing is sure is that the expressiveness of your queries is bound 
> to the expressiveness of the search API you query. So this is not the 
> answer to all problems.
Yes, query expressiveness is vital, hence my references to SPARQL and 
OWL in my prior response.  Just add inference rules to that when 
thinking about Meta Cartridges.

Basically, we are packing the smart technology behind the proxy/wrapper 
URIs.
>
> But one question: is it realists to think that anyone could query all 
> amazon and ebay sites (US, CAN, and all the other countries) to 
> convert everything? And if it endups being the case, how synching and 
> maintenance could take place?
Exactly! And why should anyone really want to do this? It is possible to 
walk the Web is a very smart way, like a Stingray in a sense, and the 
Sponger combined with Virtuoso engine innards allows us to see the Web 
as a Federation of Data Spaces.

And even if some glutton of a service pulled this off, what about the 
Context Halo which encompasses all data access and integration 
endeavors, including: "change sensitivity" ? Example (based on Locale 
variety): how do you deal with the following within the context of a 
query, when the person seeks: all Books by a German Author, who is 
actually German, and has a preference for books associated with specific 
Subject Matter, with price preference in local currency?
>
> It really depends on the usecases, but there are much that can be done 
> by leveraging all APIs in systems such as the Virtuoso sponger. I 
> think that what you are talking about here will only happen when these 
> services will want it to happen.
In our case, what I describe is something we do want to happen re. 
Sponger based Linked Data graphs :-)

Kingsley
>
>
> Thanks,
>
>
> Take care,
>
>
> Fred
>


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Sunday, 18 October 2009 18:18:13 UTC