- From: Giovanni Tummarello <giovanni.tummarello@deri.org>
- Date: Sun, 18 Oct 2009 15:33:36 +0100
- To: martin.hepp@ebusiness-unibw.org
- Cc: Juan Sequeda <juanfederico@gmail.com>, hepp@ebusiness-unibw.org, "public-lod@w3.org" <public-lod@w3.org>
I agree wihtt this, a combination of the 2, without into unrealistic services descriptions, is exactly its the question. its great to be talking about this. I'd be gladly have a chat about all this at ISWC for those who are there? Cheers Giovanni On Sun, Oct 18, 2009 at 8:37 AM, Martin Hepp (UniBW) <martin.hepp@ebusiness-unibw.org> wrote: > Guys, > the Web of Data cannot rely on mass data crawling of the whole Web but must > combine cached data with federated on-demand queries. Structured data > requires much faster update cycles than typical text-based Web indices. For > example, Google and Yahoo can rely on the fact that "http://www.cnn.com" is > relevant for "news". Such will not change within minutes. And both Google > and Yahoo need up to several weeks to visit your page again. > > When it comes to structured price and availability information, your data > may become outdated within hours, if not seconds. Think of eBay auctions, > hotel or flight availability, etc. > > So it will boil down to technology that combines (1) crawling and caching > rather stable data sets with (2) distributing queries and parts of queries > among the right SPARQL endpoints (whatever actual DB technology they > expose). > > You can keep a text index of the whole Web, if crawling cycles in the order > of magnitude of weeks are fine. For structured, linked data that exposes > dynamic database content, "dumb" crawling and caching will not scale. > > If the DB technology is able to involve the right set of endpoints for parts > of the query, why would you need a complete replication of all databases in > the world inside one huge repository? > > That repository will be a million-node cluster anyway. Why not directly use > the millions of nodes that provide the data and cache just the endpoint > meta-data? > > Martin > > > > Giovanni Tummarello wrote: > > With respect to crawling and "scraping" or "sponging" or .. "trying to > guess" based on partial fragments of structured information i can say > 3 thngs > > a) No, we're not doing it at the moment, we are only covering those > who chose to put structured semantics. Some book stuff shows up in > Sig.ma .. e.g. http://sig.ma/search?q=frank+van+harmelen&sources=100 > bookfinder, our jerome digital library installation, but the triplees > they provide are scarce and dont contribute much. It would take so > little for this to improve on their side i believe. > > b) No, we are not religious about this. We have talked about it > several times, it might make sense to try to understand as much as the > web as possible and index it. Maybe we'll do it in the future for > selected fractions of the web to show how it looks > > c) crawling should be just one mean of acquiring the semantic web. in > case of bestbuy or other large retailers where prices change possibly > everyday crawling as a mean to emulate a simple.. call to a web > service seems really not the smart thing to do. Will data providers > really support with data dumps? > > cheers > Giovanni > > > On Sat, Oct 17, 2009 at 3:32 PM, Juan Sequeda <juanfederico@gmail.com> > wrote: > > > But Sindice could at least crawl Amazon. > It would be great to use sig.ma to create a "meshup" with the amazon data. > > > Juan Sequeda, Ph.D Student > Dept. of Computer Sciences > The University of Texas at Austin > www.juansequeda.com > www.semanticwebaustin.org > > > On Sat, Oct 17, 2009 at 9:28 AM, Martin Hepp (UniBW) > <hepp@ebusiness-unibw.org> wrote: > > > I don't think so, because this would require that Sindice crawled the > whole regular web and checked the Spongers for each URL (sic!). > > Juan Sequeda wrote: > > Does Sindice crawl this (or any other semantic web search engines)? > Juan Sequeda, Ph.D Student > Dept. of Computer Sciences > The University of Texas at Austin > www.juansequeda.com > www.semanticwebaustin.org > > > On Sat, Oct 17, 2009 at 4:24 AM, Martin Hepp (UniBW) < > hepp@ebusiness-unibw.org> wrote: > > > > Dear all: > > I just found out that the Virtuoso Sponger technology is even more > powerful than I thought. > > Briefly: "Spongers" create rich GoodRelations (and other RDF) meta-data > for existing Web pages on-the-fly. Other than traditional > screen-scraping approaches, Spongers reuse public APIs and other > techniques, so the data is of unprecedented degree of structure. > > Now, this can be directly used in arbitrary queries... by simply using > the URI of the *existing* HTML Web page in the FROM clause of a SPARQL > query. > > Example: > > > > > http://www.amazon.com/Semantic-Web-Real-World-Applications-Industry/dp/0387485309 > > is a Web page in plain HTML offering a book. Amazon does not yet produce > GoodRelations meta-data on their pages. > > If you go to > > http://uriburner.com/sparql > > and paste the URI in the "Default Graph URI " field and select "Retrieve > remote RDF for all missing source graphs", then a query like > > "SELECT * WHERE {?s ?p ?o} LIMIT 50" > > returns a fully-fledged GoodRelations description for that page - as if > Amazon was already supporting GoodRelations for each of its > 4 million > items! > > There are spongers for BestBuy, eBay, Zillow, and many other types of > resources. > > Wow! > > Congrats to Kingsley and his team! > > Best wishes > > Martin Hepp > > -- > -------------------------------------------------------------- > martin hepp > e-business & web science research group > universitaet der bundeswehr muenchen > > e-mail: hepp@ebusiness-unibw.org > phone: +49-(0)89-6004-4217 > fax: +49-(0)89-6004-4620 > www: http://www.unibw.de/ebusiness/ (group) > http://www.heppnetz.de/ (personal) > skype: mfhepp > twitter: mfhepp > > Check out GoodRelations for E-Commerce on the Web of Linked Data! > ================================================================= > > Webcast: > http://www.heppnetz.de/projects/goodrelations/webcast/ > > Recipe for Yahoo SearchMonkey: > http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey > > Talk at the Semantic Technology Conference 2009: > "Semantic Web-based E-Commerce: The GoodRelations Ontology" > > > http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287 > > Overview article on Semantic Universe: > > > http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html > > Project page: > http://purl.org/goodrelations/ > > Resources for developers: > http://www.ebusiness-unibw.org/wiki/GoodRelations > > Tutorial materials: > CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on > Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey > > > http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_IEEE_CEC%2709 > > > > > > > > > -- > -------------------------------------------------------------- > martin hepp > e-business & web science research group > universitaet der bundeswehr muenchen > > e-mail: hepp@ebusiness-unibw.org > phone: +49-(0)89-6004-4217 > fax: +49-(0)89-6004-4620 > www: http://www.unibw.de/ebusiness/ (group) > http://www.heppnetz.de/ (personal) > skype: mfhepp > twitter: mfhepp > > Check out GoodRelations for E-Commerce on the Web of Linked Data! > ================================================================= > > Webcast: > http://www.heppnetz.de/projects/goodrelations/webcast/ > > Recipe for Yahoo SearchMonkey: > http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey > > Talk at the Semantic Technology Conference 2009: > "Semantic Web-based E-Commerce: The GoodRelations Ontology" > > http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287 > > Overview article on Semantic Universe: > > http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html > > Project page: > http://purl.org/goodrelations/ > > Resources for developers: > http://www.ebusiness-unibw.org/wiki/GoodRelations > > Tutorial materials: > CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on > Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey > > http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_IEEE_CEC%2709 > > > > > > > > -- > -------------------------------------------------------------- > martin hepp > e-business & web science research group > universitaet der bundeswehr muenchen > > e-mail: hepp@ebusiness-unibw.org > phone: +49-(0)89-6004-4217 > fax: +49-(0)89-6004-4620 > www: http://www.unibw.de/ebusiness/ (group) > http://www.heppnetz.de/ (personal) > skype: mfhepp > twitter: mfhepp > > Check out GoodRelations for E-Commerce on the Web of Linked Data! > ================================================================= > > Webcast: > http://www.heppnetz.de/projects/goodrelations/webcast/ > > Recipe for Yahoo SearchMonkey: > http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey > > Talk at the Semantic Technology Conference 2009: > "Semantic Web-based E-Commerce: The GoodRelations Ontology" > http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287 > > Overview article on Semantic Universe: > http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html > > Project page: > http://purl.org/goodrelations/ > > Resources for developers: > http://www.ebusiness-unibw.org/wiki/GoodRelations > > Tutorial materials: > CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on > Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey > http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_IEEE_CEC%2709 > >
Received on Sunday, 18 October 2009 14:34:31 UTC