W3C home > Mailing lists > Public > public-lod@w3.org > October 2009

Re: The Power of Virtuoso Sponger Technology

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sat, 17 Oct 2009 19:26:39 -0400
Message-ID: <4ADA52AF.7040306@openlinksw.com>
To: "public-lod@w3.org" <public-lod@w3.org>
Giovanni Tummarello wrote:
> With respect to crawling and "scraping" or "sponging" or .. "trying to
> guess" based on partial fragments of structured information i can say
> 3 thngs
>
> a) No, we're not doing it at the moment, we are only covering those
> who chose to put structured semantics. Some book stuff shows up in
> Sig.ma .. e.g. http://sig.ma/search?q=frank+van+harmelen&sources=100
> bookfinder, our jerome digital library installation, but the triplees
> they provide are scarce and dont contribute much.  It would take so
> little for this to improve on their side i believe.
>
> b) No, we are not religious about this. We have talked about it
> several times, it might make sense to try to understand as much as the
> web as possible and index it. Maybe we'll do it in the future for
> selected fractions of the web to show how it looks
>
> c) crawling should be just one mean of acquiring the semantic web. in
> case of bestbuy or other large retailers where prices change possibly
> everyday crawling as a mean to emulate a simple.. call to a web
> service seems really not the smart thing to do. Will data providers
> really support with data dumps?
>
> cheers
> Giovanni
>   
Juan,

I am hoping that the response above clarifies matters, esp. point C.

Crawling the old way is futile when the "change sensitivity" aspect of a 
given unit of data is high.

Georgi: even the count of German book authors, the prices of their 
books, across a plethora or retailers, with a wide range of prices and 
availability, is very sensitive to change.

Georgi/Juan:

Mechanically, there is crawling, but essentially it simply isn't the old 
style approach (data warehousing) of yore as exemplified by Google, 
Yahoo!, ASK, and others.

Kingsley
>
> On Sat, Oct 17, 2009 at 3:32 PM, Juan Sequeda <juanfederico@gmail.com> wrote:
>   
>> But Sindice could at least crawl Amazon.
>> It would be great to use sig.ma to create a "meshup" with the amazon data.
>>
>>
>> Juan Sequeda, Ph.D Student
>> Dept. of Computer Sciences
>> The University of Texas at Austin
>> www.juansequeda.com
>> www.semanticwebaustin.org
>>
>>
>> On Sat, Oct 17, 2009 at 9:28 AM, Martin Hepp (UniBW)
>> <hepp@ebusiness-unibw.org> wrote:
>>     
>>> I don't think so, because this would require that Sindice crawled the
>>> whole regular web and checked the Spongers for each URL (sic!).
>>>
>>> Juan Sequeda wrote:
>>>
>>> Does Sindice crawl this (or any other semantic web search engines)?
>>> Juan Sequeda, Ph.D Student
>>> Dept. of Computer Sciences
>>> The University of Texas at Austin
>>> www.juansequeda.com
>>> www.semanticwebaustin.org
>>>
>>>
>>> On Sat, Oct 17, 2009 at 4:24 AM, Martin Hepp (UniBW) <
>>> hepp@ebusiness-unibw.org> wrote:
>>>
>>>
>>>
>>> Dear all:
>>>
>>> I just found out that the Virtuoso Sponger technology is even more
>>> powerful than I thought.
>>>
>>> Briefly: "Spongers" create rich GoodRelations (and other RDF) meta-data
>>> for existing Web pages on-the-fly. Other than traditional
>>> screen-scraping approaches, Spongers reuse public APIs and other
>>> techniques, so the data is of unprecedented degree of structure.
>>>
>>> Now, this can be directly used in arbitrary queries... by simply using
>>> the URI of the *existing* HTML Web page in the FROM clause of a SPARQL
>>> query.
>>>
>>> Example:
>>>
>>>
>>>
>>>
>>> http://www.amazon.com/Semantic-Web-Real-World-Applications-Industry/dp/0387485309
>>>
>>> is a Web page in plain HTML offering a book. Amazon does not yet produce
>>> GoodRelations meta-data on their pages.
>>>
>>> If you go to
>>>
>>>    http://uriburner.com/sparql
>>>
>>> and paste the URI in the "Default Graph URI " field and select "Retrieve
>>> remote RDF for all missing source graphs", then a query like
>>>
>>>   "SELECT * WHERE {?s ?p ?o} LIMIT 50"
>>>
>>> returns a fully-fledged GoodRelations description for that page - as if
>>> Amazon was already supporting GoodRelations for each of its > 4 million
>>> items!
>>>
>>> There are spongers for BestBuy, eBay, Zillow, and many other types of
>>> resources.
>>>
>>> Wow!
>>>
>>> Congrats to Kingsley and his team!
>>>
>>> Best wishes
>>>
>>> Martin Hepp
>>>
>>> --
>>> --------------------------------------------------------------
>>> martin hepp
>>> e-business & web science research group
>>> universitaet der bundeswehr muenchen
>>>
>>> e-mail:  hepp@ebusiness-unibw.org
>>> phone:   +49-(0)89-6004-4217
>>> fax:     +49-(0)89-6004-4620
>>> www:     http://www.unibw.de/ebusiness/ (group)
>>>         http://www.heppnetz.de/ (personal)
>>> skype:   mfhepp
>>> twitter: mfhepp
>>>
>>> Check out GoodRelations for E-Commerce on the Web of Linked Data!
>>> =================================================================
>>>
>>> Webcast:
>>> http://www.heppnetz.de/projects/goodrelations/webcast/
>>>
>>> Recipe for Yahoo SearchMonkey:
>>> http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey
>>>
>>> Talk at the Semantic Technology Conference 2009:
>>> "Semantic Web-based E-Commerce: The GoodRelations Ontology"
>>>
>>>
>>> http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287
>>>
>>> Overview article on Semantic Universe:
>>>
>>>
>>> http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html
>>>
>>> Project page:
>>> http://purl.org/goodrelations/
>>>
>>> Resources for developers:
>>> http://www.ebusiness-unibw.org/wiki/GoodRelations
>>>
>>> Tutorial materials:
>>> CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on
>>> Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
>>>
>>>
>>> http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_IEEE_CEC%2709
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> --------------------------------------------------------------
>>> martin hepp
>>> e-business & web science research group
>>> universitaet der bundeswehr muenchen
>>>
>>> e-mail:  hepp@ebusiness-unibw.org
>>> phone:   +49-(0)89-6004-4217
>>> fax:     +49-(0)89-6004-4620
>>> www:     http://www.unibw.de/ebusiness/ (group)
>>>          http://www.heppnetz.de/ (personal)
>>> skype:   mfhepp
>>> twitter: mfhepp
>>>
>>> Check out GoodRelations for E-Commerce on the Web of Linked Data!
>>> =================================================================
>>>
>>> Webcast:
>>> http://www.heppnetz.de/projects/goodrelations/webcast/
>>>
>>> Recipe for Yahoo SearchMonkey:
>>> http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey
>>>
>>> Talk at the Semantic Technology Conference 2009:
>>> "Semantic Web-based E-Commerce: The GoodRelations Ontology"
>>>
>>> http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287
>>>
>>> Overview article on Semantic Universe:
>>>
>>> http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html
>>>
>>> Project page:
>>> http://purl.org/goodrelations/
>>>
>>> Resources for developers:
>>> http://www.ebusiness-unibw.org/wiki/GoodRelations
>>>
>>> Tutorial materials:
>>> CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on
>>> Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
>>>
>>> http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_IEEE_CEC%2709
>>>
>>>       
>>     
>
>
>   


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Saturday, 17 October 2009 23:27:11 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:23 UTC