W3C home > Mailing lists > Public > public-lod@w3.org > October 2009

Re: The Power of Virtuoso Sponger Technology

From: Giovanni Tummarello <g.tummarello@gmail.com>
Date: Sat, 17 Oct 2009 21:19:00 +0100
Message-ID: <210271540910171319q6eb734c0n1782c65726db2693@mail.gmail.com>
To: Juan Sequeda <juanfederico@gmail.com>
Cc: hepp@ebusiness-unibw.org, "public-lod@w3.org" <public-lod@w3.org>
With respect to crawling and "scraping" or "sponging" or .. "trying to
guess" based on partial fragments of structured information i can say
3 thngs

a) No, we're not doing it at the moment, we are only covering those
who chose to put structured semantics. Some book stuff shows up in
Sig.ma .. e.g. http://sig.ma/search?q=frank+van+harmelen&sources=100
bookfinder, our jerome digital library installation, but the triplees
they provide are scarce and dont contribute much.  It would take so
little for this to improve on their side i believe.

b) No, we are not religious about this. We have talked about it
several times, it might make sense to try to understand as much as the
web as possible and index it. Maybe we'll do it in the future for
selected fractions of the web to show how it looks

c) crawling should be just one mean of acquiring the semantic web. in
case of bestbuy or other large retailers where prices change possibly
everyday crawling as a mean to emulate a simple.. call to a web
service seems really not the smart thing to do. Will data providers
really support with data dumps?

cheers
Giovanni


On Sat, Oct 17, 2009 at 3:32 PM, Juan Sequeda <juanfederico@gmail.com> wrote:
> But Sindice could at least crawl Amazon.
> It would be great to use sig.ma to create a "meshup" with the amazon data.
>
>
> Juan Sequeda, Ph.D Student
> Dept. of Computer Sciences
> The University of Texas at Austin
> www.juansequeda.com
> www.semanticwebaustin.org
>
>
> On Sat, Oct 17, 2009 at 9:28 AM, Martin Hepp (UniBW)
> <hepp@ebusiness-unibw.org> wrote:
>>
>> I don't think so, because this would require that Sindice crawled the
>> whole regular web and checked the Spongers for each URL (sic!).
>>
>> Juan Sequeda wrote:
>>
>> Does Sindice crawl this (or any other semantic web search engines)?
>> Juan Sequeda, Ph.D Student
>> Dept. of Computer Sciences
>> The University of Texas at Austin
>> www.juansequeda.com
>> www.semanticwebaustin.org
>>
>>
>> On Sat, Oct 17, 2009 at 4:24 AM, Martin Hepp (UniBW) <
>> hepp@ebusiness-unibw.org> wrote:
>>
>>
>>
>> Dear all:
>>
>> I just found out that the Virtuoso Sponger technology is even more
>> powerful than I thought.
>>
>> Briefly: "Spongers" create rich GoodRelations (and other RDF) meta-data
>> for existing Web pages on-the-fly. Other than traditional
>> screen-scraping approaches, Spongers reuse public APIs and other
>> techniques, so the data is of unprecedented degree of structure.
>>
>> Now, this can be directly used in arbitrary queries... by simply using
>> the URI of the *existing* HTML Web page in the FROM clause of a SPARQL
>> query.
>>
>> Example:
>>
>>
>>
>>
>> http://www.amazon.com/Semantic-Web-Real-World-Applications-Industry/dp/0387485309
>>
>> is a Web page in plain HTML offering a book. Amazon does not yet produce
>> GoodRelations meta-data on their pages.
>>
>> If you go to
>>
>>    http://uriburner.com/sparql
>>
>> and paste the URI in the "Default Graph URI " field and select "Retrieve
>> remote RDF for all missing source graphs", then a query like
>>
>>   "SELECT * WHERE {?s ?p ?o} LIMIT 50"
>>
>> returns a fully-fledged GoodRelations description for that page - as if
>> Amazon was already supporting GoodRelations for each of its > 4 million
>> items!
>>
>> There are spongers for BestBuy, eBay, Zillow, and many other types of
>> resources.
>>
>> Wow!
>>
>> Congrats to Kingsley and his team!
>>
>> Best wishes
>>
>> Martin Hepp
>>
>> --
>> --------------------------------------------------------------
>> martin hepp
>> e-business & web science research group
>> universitaet der bundeswehr muenchen
>>
>> e-mail:  hepp@ebusiness-unibw.org
>> phone:   +49-(0)89-6004-4217
>> fax:     +49-(0)89-6004-4620
>> www:     http://www.unibw.de/ebusiness/ (group)
>>         http://www.heppnetz.de/ (personal)
>> skype:   mfhepp
>> twitter: mfhepp
>>
>> Check out GoodRelations for E-Commerce on the Web of Linked Data!
>> =================================================================
>>
>> Webcast:
>> http://www.heppnetz.de/projects/goodrelations/webcast/
>>
>> Recipe for Yahoo SearchMonkey:
>> http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey
>>
>> Talk at the Semantic Technology Conference 2009:
>> "Semantic Web-based E-Commerce: The GoodRelations Ontology"
>>
>>
>> http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287
>>
>> Overview article on Semantic Universe:
>>
>>
>> http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html
>>
>> Project page:
>> http://purl.org/goodrelations/
>>
>> Resources for developers:
>> http://www.ebusiness-unibw.org/wiki/GoodRelations
>>
>> Tutorial materials:
>> CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on
>> Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
>>
>>
>> http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_IEEE_CEC%2709
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> --------------------------------------------------------------
>> martin hepp
>> e-business & web science research group
>> universitaet der bundeswehr muenchen
>>
>> e-mail:  hepp@ebusiness-unibw.org
>> phone:   +49-(0)89-6004-4217
>> fax:     +49-(0)89-6004-4620
>> www:     http://www.unibw.de/ebusiness/ (group)
>>          http://www.heppnetz.de/ (personal)
>> skype:   mfhepp
>> twitter: mfhepp
>>
>> Check out GoodRelations for E-Commerce on the Web of Linked Data!
>> =================================================================
>>
>> Webcast:
>> http://www.heppnetz.de/projects/goodrelations/webcast/
>>
>> Recipe for Yahoo SearchMonkey:
>> http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey
>>
>> Talk at the Semantic Technology Conference 2009:
>> "Semantic Web-based E-Commerce: The GoodRelations Ontology"
>>
>> http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287
>>
>> Overview article on Semantic Universe:
>>
>> http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html
>>
>> Project page:
>> http://purl.org/goodrelations/
>>
>> Resources for developers:
>> http://www.ebusiness-unibw.org/wiki/GoodRelations
>>
>> Tutorial materials:
>> CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on
>> Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
>>
>> http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_IEEE_CEC%2709
>>
>
>
Received on Saturday, 17 October 2009 20:19:53 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:23 UTC