Re: The Power of Virtuoso Sponger Technology from Kingsley Idehen on 2009-10-18 (public-lod@w3.org from October 2009)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sun, 18 Oct 2009 14:03:04 -0400
CC: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <4ADB5858.6020103@openlinksw.com>
Hugh Glaser wrote:
> Hi Guys,
> I am puzzled by the whole discussion, so will try to summarise to find out
> if I have some misunderstanding.
>
> It really is "just" about finding where the URIs are, and search engines are
> the game in town. We need to make it really easy for people to find the
> Linked Data URIs they need. Wrappers make things a bit harder.
>
> Juan asked if "Sindice crawled the whole regular web and checked the
> Spongers for each URL (sic!)".
> I read this as: "Can I use Sindice to find Linked Data URIs provided by the
> Spongers?" Or to put it yet another way, "Does Sindice index the part of the
> Semantic Web provided by the Spongers?"
>
> One way to do this would be to do what Juan suggests - model what the
> Spongers are doing, and then infer what the Linked Data URIs would be, based
> on the URLs of the underlying web pages, having crawled them.
>
> But there seems to me a much simpler and more principled way - the Sponger
> should do it.
> Spongers should provide Semantic Sitemaps (and of course voiD descriptions),
> so that Sindice can index (not *crawl*, which I think has lead to some of
> the confusion) the sites.
>   

> How might this be done?
> Well, certainly where the Sponger is connected to a particular site which
> has an ordinary Sitemap, it could/should process it as part of the
> connection with a site, and then re-publish the Semantic Sitemap. For sites
> that don't have Sitemaps, it may/will be somewhat harder.
> I may be misunderstanding Spongers as well, but it all seems pretty clean
> and straightforward to me.
>
> Great stuff, of course.
>   
Hugh,

Quick Sponger Glossary:

Sponger -- The Data Access Manager Layer
Basic Cartridges/Drivers/Providers -- The components that perform the 
extraction and transformation into RDF model based Linked Data graphs
Meta Cartridges -- Smarter Cartridges that perform Lookups and leverage 
Inference Rules etc.. which are added to the basic Linked Data graphs 
(*these aren't part of the Open Source Edition of Virtuoso*).

A Sponger generated Linked Data graph does optionally include VoiD 
descriptions; why wouldn't it, bearing in mind our proximity to VoiD)? 
We just disabled while updating the Sponger Engine etc..  You must have 
seen VoiD graphs in earlier Sponger proxy URIs, right? They will be 
re-enabled very soon :-)

All sponged data ends up in the Quad Store of its host Virtuoso instance 
(so <http://uriburner.com/sparql> and <http://uriburner.com/fct> are in 
place as per usual re. Virtuoso instances).

A Virtuoso Sponger instance can optionally ping PTSW each time it makes 
a Linked Data graph from its Web Resource RDFization activity, so 
Sindice and others engines that already subscribe to PTSW also have 
access to the Sponger generated Linked Data.

Sponger proxy/wrapper URIs are Data Source Names, if you look at the 
graph closely you should see how we express with clarity what we are 
doing i.e., note the "owl:sameAs" assertion to the original data source, 
and the "owl:shameAs" pattern which is about minting a hint URI back to 
the data source we've sponged. What we will be adding is some additional 
Provenance Data now that we have a good shared ontology in place.


As stated above, the existence of <http://uriburner.com/fct> implies 
that the faceted search and find engine in also alive and any client 
application can use the REST or SOAP services it provides to perform 
disambiguated search and find queries. This means, Sindice can lookup 
Sponger instances on the same way the Sponger looks up Sindice via its 
Web Services.

When you use Virtuoso, with the Sponger Middleware Enabled, Web Document 
URLs basically become Named Graph IRIs re. SPARQL (the point Martin 
emphasized in his post). 

When you use the OpenLink Data Explorer (ODE) [5], you can also bind the 
Browser to bind to any of the instances above, and exploit the effect of 
sponging by simply using invoking the View Page Metadata option (main or 
context menu or via the URIBurner Bookmarklet).

Links:

1. http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtSponger
2. http://uriburner.com  --  instance of Virtuoso with Sponger 
Middleware enabled
3. http://bbc.openlinksw.com -- instance of Virtuoso with Sponger 
Middleware enabled
4. http://lod.openlinksw.com -- instance of Virtuoso with Sponger 
Middleware enabled
5. http://ode.openlinksw.com -- OpenLink Data Explorer
6. http://trdf.sourceforge.net/provenance/ns-20090825.html -- Provenance 
Ontology

Kingsley
> Best
> Hugh
>
>   
>>> On Sat, Oct 17, 2009 at 3:32 PM, Juan Sequeda <juanfederico@gmail.com> wrote:
>>>   
>>>       
>>>> But Sindice could at least crawl Amazon.
>>>> It would be great to use sig.ma to create a "meshup" with the amazon data.
>>>>
>>>>
>>>> Juan Sequeda, Ph.D Student
>>>> Dept. of Computer Sciences
>>>> The University of Texas at Austin
>>>> www.juansequeda.com
>>>> www.semanticwebaustin.org
>>>>
>>>>
>>>> On Sat, Oct 17, 2009 at 9:28 AM, Martin Hepp (UniBW)
>>>> <hepp@ebusiness-unibw.org> wrote:
>>>>     
>>>>         
>>>>> I don't think so, because this would require that Sindice crawled the
>>>>> whole regular web and checked the Spongers for each URL (sic!).
>>>>>
>>>>> Juan Sequeda wrote:
>>>>>
>>>>> Does Sindice crawl this (or any other semantic web search engines)?
>>>>> Juan Sequeda, Ph.D Student
>>>>> Dept. of Computer Sciences
>>>>> The University of Texas at Austin
>>>>> www.juansequeda.com
>>>>> www.semanticwebaustin.org
>>>>>
>>>>>           
>
>
>
>   


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Sunday, 18 October 2009 18:03:33 UTC