- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Sat, 17 Oct 2009 19:26:39 -0400
- To: "public-lod@w3.org" <public-lod@w3.org>
Giovanni Tummarello wrote: > With respect to crawling and "scraping" or "sponging" or .. "trying to > guess" based on partial fragments of structured information i can say > 3 thngs > > a) No, we're not doing it at the moment, we are only covering those > who chose to put structured semantics. Some book stuff shows up in > Sig.ma .. e.g. http://sig.ma/search?q=frank+van+harmelen&sources=100 > bookfinder, our jerome digital library installation, but the triplees > they provide are scarce and dont contribute much. It would take so > little for this to improve on their side i believe. > > b) No, we are not religious about this. We have talked about it > several times, it might make sense to try to understand as much as the > web as possible and index it. Maybe we'll do it in the future for > selected fractions of the web to show how it looks > > c) crawling should be just one mean of acquiring the semantic web. in > case of bestbuy or other large retailers where prices change possibly > everyday crawling as a mean to emulate a simple.. call to a web > service seems really not the smart thing to do. Will data providers > really support with data dumps? > > cheers > Giovanni > Juan, I am hoping that the response above clarifies matters, esp. point C. Crawling the old way is futile when the "change sensitivity" aspect of a given unit of data is high. Georgi: even the count of German book authors, the prices of their books, across a plethora or retailers, with a wide range of prices and availability, is very sensitive to change. Georgi/Juan: Mechanically, there is crawling, but essentially it simply isn't the old style approach (data warehousing) of yore as exemplified by Google, Yahoo!, ASK, and others. Kingsley > > On Sat, Oct 17, 2009 at 3:32 PM, Juan Sequeda <juanfederico@gmail.com> wrote: > >> But Sindice could at least crawl Amazon. >> It would be great to use sig.ma to create a "meshup" with the amazon data. >> >> >> Juan Sequeda, Ph.D Student >> Dept. of Computer Sciences >> The University of Texas at Austin >> www.juansequeda.com >> www.semanticwebaustin.org >> >> >> On Sat, Oct 17, 2009 at 9:28 AM, Martin Hepp (UniBW) >> <hepp@ebusiness-unibw.org> wrote: >> >>> I don't think so, because this would require that Sindice crawled the >>> whole regular web and checked the Spongers for each URL (sic!). >>> >>> Juan Sequeda wrote: >>> >>> Does Sindice crawl this (or any other semantic web search engines)? >>> Juan Sequeda, Ph.D Student >>> Dept. of Computer Sciences >>> The University of Texas at Austin >>> www.juansequeda.com >>> www.semanticwebaustin.org >>> >>> >>> On Sat, Oct 17, 2009 at 4:24 AM, Martin Hepp (UniBW) < >>> hepp@ebusiness-unibw.org> wrote: >>> >>> >>> >>> Dear all: >>> >>> I just found out that the Virtuoso Sponger technology is even more >>> powerful than I thought. >>> >>> Briefly: "Spongers" create rich GoodRelations (and other RDF) meta-data >>> for existing Web pages on-the-fly. Other than traditional >>> screen-scraping approaches, Spongers reuse public APIs and other >>> techniques, so the data is of unprecedented degree of structure. >>> >>> Now, this can be directly used in arbitrary queries... by simply using >>> the URI of the *existing* HTML Web page in the FROM clause of a SPARQL >>> query. >>> >>> Example: >>> >>> >>> >>> >>> http://www.amazon.com/Semantic-Web-Real-World-Applications-Industry/dp/0387485309 >>> >>> is a Web page in plain HTML offering a book. Amazon does not yet produce >>> GoodRelations meta-data on their pages. >>> >>> If you go to >>> >>> http://uriburner.com/sparql >>> >>> and paste the URI in the "Default Graph URI " field and select "Retrieve >>> remote RDF for all missing source graphs", then a query like >>> >>> "SELECT * WHERE {?s ?p ?o} LIMIT 50" >>> >>> returns a fully-fledged GoodRelations description for that page - as if >>> Amazon was already supporting GoodRelations for each of its > 4 million >>> items! >>> >>> There are spongers for BestBuy, eBay, Zillow, and many other types of >>> resources. >>> >>> Wow! >>> >>> Congrats to Kingsley and his team! >>> >>> Best wishes >>> >>> Martin Hepp >>> >>> -- >>> -------------------------------------------------------------- >>> martin hepp >>> e-business & web science research group >>> universitaet der bundeswehr muenchen >>> >>> e-mail: hepp@ebusiness-unibw.org >>> phone: +49-(0)89-6004-4217 >>> fax: +49-(0)89-6004-4620 >>> www: http://www.unibw.de/ebusiness/ (group) >>> http://www.heppnetz.de/ (personal) >>> skype: mfhepp >>> twitter: mfhepp >>> >>> Check out GoodRelations for E-Commerce on the Web of Linked Data! >>> ================================================================= >>> >>> Webcast: >>> http://www.heppnetz.de/projects/goodrelations/webcast/ >>> >>> Recipe for Yahoo SearchMonkey: >>> http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey >>> >>> Talk at the Semantic Technology Conference 2009: >>> "Semantic Web-based E-Commerce: The GoodRelations Ontology" >>> >>> >>> http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287 >>> >>> Overview article on Semantic Universe: >>> >>> >>> http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html >>> >>> Project page: >>> http://purl.org/goodrelations/ >>> >>> Resources for developers: >>> http://www.ebusiness-unibw.org/wiki/GoodRelations >>> >>> Tutorial materials: >>> CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on >>> Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey >>> >>> >>> http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_IEEE_CEC%2709 >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> -------------------------------------------------------------- >>> martin hepp >>> e-business & web science research group >>> universitaet der bundeswehr muenchen >>> >>> e-mail: hepp@ebusiness-unibw.org >>> phone: +49-(0)89-6004-4217 >>> fax: +49-(0)89-6004-4620 >>> www: http://www.unibw.de/ebusiness/ (group) >>> http://www.heppnetz.de/ (personal) >>> skype: mfhepp >>> twitter: mfhepp >>> >>> Check out GoodRelations for E-Commerce on the Web of Linked Data! >>> ================================================================= >>> >>> Webcast: >>> http://www.heppnetz.de/projects/goodrelations/webcast/ >>> >>> Recipe for Yahoo SearchMonkey: >>> http://www.ebusiness-unibw.org/wiki/GoodRelations_and_Yahoo_SearchMonkey >>> >>> Talk at the Semantic Technology Conference 2009: >>> "Semantic Web-based E-Commerce: The GoodRelations Ontology" >>> >>> http://www.slideshare.net/mhepp/semantic-webbased-ecommerce-the-goodrelations-ontology-1535287 >>> >>> Overview article on Semantic Universe: >>> >>> http://www.semanticuniverse.com/articles-semantic-web-based-e-commerce-webmasters-get-ready.html >>> >>> Project page: >>> http://purl.org/goodrelations/ >>> >>> Resources for developers: >>> http://www.ebusiness-unibw.org/wiki/GoodRelations >>> >>> Tutorial materials: >>> CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on >>> Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey >>> >>> http://www.ebusiness-unibw.org/wiki/Web_of_Data_for_E-Commerce_Tutorial_IEEE_CEC%2709 >>> >>> >> > > > -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President & CEO OpenLink Software Web: http://www.openlinksw.com
Received on Saturday, 17 October 2009 23:27:11 UTC