- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Wed, 02 Sep 2009 12:57:39 -0400
- To: Giovanni Tummarello <giovanni.tummarello@deri.org>
- CC: "Myers, Jay" <Jay.Myers@bestbuy.com>, martin.hepp@ebusiness-unibw.org, public-lod@w3.org, Sindice developers list <sindice-dev@lists.deri.org>
Giovanni Tummarello wrote: > Jay, > > actually, as Kingsley was suggesting already, the truly best way to > expose this data would be by embedding RDFa in the actual web pages > that bestbuy has. > One would get : > > a) the exact same benefits than publishing the files alone (afterall > the RDF is just a transformation away) > b) the certainty of metadata being the same that the user sees > c) getting away from ambiguity of identifiers, the page would be used > as identifier for the item, period. much easier for other people to > find identifiers and link to them > d) totally ready for structured snippets, yahoo searchmonkey etc and > future semantic search engine optimizations. > e) a true enabler for client side applications, e.g. a firefox plugin > which acts as side shopping "assistants" e.g.allowin rich searching, > faceted comparison in the browser history or all sort of user centric > advanced use of structured data (e.g. a la "piggybank", for the > semantic web historians) > Amen! But s/piggybank/ode/g :-) > all this just some RDFa away :-) . Is it thinkable that this can > happen? Afterall its totally invisible for the user. > My guess is that it will happen. Note, that <http://stores.bestbuy.com> already has some RDFa in place :-) > of course the dumps would still be very useful!! (for the purpose of > not recrawling) and so the sitemap/semantic sitemap. > for entities that bestbuy does not intend to expose as pages (e.g. a > URI about a company) the pure RDF/XML would still be useful. > Hmm but the description (About) company is already exposed, so even that's just a case of marking up the existing HTML based "About" page with RDFa :-) > Hoping that others also agree on these benefits. > URIBurner home page is updated, and I am hoping the virtues of HTML representation of Metadata become clearer, especially as you can deliver these benefits via proxy/wrapper style HTTP URIs. Kingsley > thanks again for your efforts > Giovanni > > > On Tue, Sep 1, 2009 at 3:43 PM, Myers, Jay<Jay.Myers@bestbuy.com> wrote: > >> All, >> >> >> >> Thanks for the insight. As far as the sitemap is concerned, I used the >> current sitemap protocol (http://www.sitemaps.org/schemas/sitemap/0.9). >> Since we are publishing around 452K documents, it seemed like the correct >> route to use sitemap index files, as one file would certainly contain over >> 50,000 URIs and be over 10MB. I’m not aware of another method in which to >> publish this amount of data in a sitemap J >> >> >> >> At this point, we have no SPARQL endpoint, we are simply publishing product >> data out via RDF. I am hoping that attention to this effort will be noticed >> by senior leadership, convincing them to sponsor a greater, more complete >> effort that could serve as a model for big business. Any suggestions on this >> would be welcome. >> >> >> >> Thanks, >> >> >> >> Jay >> >> >> >> Jay Myers >> >> Lead Web Development Engineer >> >> Online Solutions, BestBuy.com >> >> jay.myers@bestbuy.com >> >> (w) 612-291-4007 >> >> (c) 612-296-5836 >> >> (twitter) @jaymyers >> >> (skype) jaymmyers >> >> >> >> >> >> ________________________________ >> >> From: Martin Hepp (UniBW) [mailto:martin.hepp@ebusiness-unibw.org] >> Sent: Tuesday, September 01, 2009 8:14 AM >> To: giovanni.tummarello@deri.org >> Cc: public-lod@w3.org >> Subject: Re: ANN: BestBuy.com starts publishing full catalog as RDF/XML >> using GoodRelations - 27 million triples >> >> >> >> Hi Giovanni: >> >> Giovanni Tummarello wrote: >> >> Hi Martin, all, >> >> >> >> the sitemap exposed is not a Semantic Sitemap >> >> >> >> Semantic Sitemap: http://products.semweb.bestbuy.com/sitemap.xml >> >> >> >> but simply gives the location of the dumps. >> >> >> >> >> >> As far as I see, the sitemap at >> >> http://products.semweb.bestbuy.com/sitemap.xml >> >> gives the locations of the compressed semantic sitemaps: >> >> >> <?xml version="1.0" encoding="UTF-8"?> >> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> >> <sitemap> >> <loc>http://products.semweb.bestbuy.com/sitemap1.xml.gz</loc> >> <lastmod>2009-07-31T18:23:17+00:00</lastmod> >> </sitemap> >> >> >> Each one of those seems to be a proper semantic sitemap >> E.g. >> >> http://products.semweb.bestbuy.com/sitemap1.xml.gz >> >> --> >> >> <?xml version="1.0" encoding="UTF-8"?> >> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" >> xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd"> >> <sc:dataset> >> <sc:datasetLabel>Sitemap data for Best Buy Co., Inc., products. Data >> based on http://purl.org/goodrelations/</sc:datasetLabel> >> <sc:datasetURI>http://products.semweb.bestbuy.com/</sc:datasetURI> >> <sc:linkedDataPrefix >> slicing="subject-object">http://products.semweb.bestbuy.com/</sc:linkedDataPrefix> >> >> <sc:sampleURI>http://products.semweb.bestbuy.com/products/9380001/semanticweb.rdf</sc:sampleURI> >> >> <sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/43900/semanticweb.rdf</sc:dataDumpLocation> >> >> <sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/48521/semanticweb.rdf</sc:dataDumpLocation> >> >> <sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/48530/semanticweb.rdf</sc:dataDumpLocation> >> >> <sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/54256/semanticweb.rdf</sc:dataDumpLocation> >> >> >> >> >> in theory if this information is exposed as linked data then one would >> >> like to have a semantic sitemap exposed, >> >> As said - I understand BestBuy is using the main sitemap to bundle the >> individual semantic sitemaps. Note that they are dealing with 450,000 >> documents. A single sitemap file would be pretty large. >> >> >> which includes other details >> >> e.g. a sparql endpoint some information on the datasets etc. [1] >> >> >> >> >> >> There is, to my knowledge, no SPARQL endpoint offered by BestBuy.com, but >> you can soon simply use the Linked Open Commerce dataspace at >> >> http://loc.openlinksw.com/sparql >> >> This will contain a current copy of the bestbuy graphs. >> >> has this been considered and decided against? >> >> As far as I know, the combination of a sitemap and 23 semantic sitemaps was >> a pragmatic decision. If it causes major problems, Jay Myers from BestBuy >> will for sure be open to improvements for suggestions. >> >> should we just live with >> >> it and fit sindice to do some guesswork and process those instead? (i >> >> am not necessarely against this last solution really.. ) >> >> >> >> You simply have to fetch and un-gzip the 23 semantic sitemaps at >> >> http://products.semweb.bestbuy.com/sitemap<n>.xml.gz >> >> with <n> being a number from 1 to 23. >> >> Note that >> >> http://products.semweb.bestbuy.com/sitemap5.xml.gz >> >> seems to have a syntactical problem (fix is already requested). >> >> >> >> In other words are you suggesting the use of semantic sitemaps >> >> We usually recommend using semantic sitemaps. But actually I think that a >> consolidated dataspace like the LOC will become more important in the >> future, because it creates to much overhead for each agent and application >> to crawl and consolidate the whole Web of Linked Data on his/her own. >> >> >> or >> >> should we just come to term to this? The disavantage is that linked >> >> data browser that wants to use an index to find information will be >> >> able to do so less reliably (hope that our guesswork works) >> >> >> >> As said - I understand (without a thorough analyis, though), that BestBuy's >> usage of a single sitemap and multiple semantic sitemaps is okay. >> >> >> >> Giovanni >> >> >> >> [1] http://sw.deri.org/2007/07/sitemapextension/ >> >> >> >> On Mon, Aug 31, 2009 at 8:08 PM, Martin Hepp >> >> (UniBW)<martin.hepp@ebusiness-unibw.org> wrote: >> >> >> >> Dear all: >> >> >> >> BestBuy.com has just started to serve a complete RDF/XML dump of their >> >> products and price information to the Web of Linked Data, using the >> >> GoodRelations vocabulary for e-commerce. The data dump is updated on a >> >> daily basis and contains detailed descriptions for roughly 450,000 >> >> individual items. With about 60 triples per item, this totals to about >> >> 27 million RDF triples. >> >> >> >> Semantic Sitemap: http://products.semweb.bestbuy.com/sitemap.xml >> >> >> >> Examples: >> >> a) Software: >> >> http://products.semweb.bestbuy.com/products/8182593/semanticweb.rdf >> >> >> >> b) "Hardgoods": >> >> http://products.semweb.bestbuy.com/products/8794691/semanticweb.rdf >> >> >> >> c) Movies: >> >> http://products.semweb.bestbuy.com/products/7590289/semanticweb.rdf >> >> >> >> d) Games: >> >> http://products.semweb.bestbuy.com/products/9223752/semanticweb.rdf >> >> >> >> Other than many existing large RDF transcripts, the data very dynamic, >> >> holding the daily prices for all items. >> >> According to Wikipedia, BestBuy.com is the largest specialty retailer of >> >> consumer electronics in the United States accounting for 19% of the market. >> >> >> >> It is likely the first Fortune 500 company to start publishing offer >> >> details on the Web of Linked Data. >> >> >> >> Congratulations to Jay Myers from BestBuy.com for this excellent >> >> contribution, and a big thanks to Andreas Radinger and Alex Stolz for >> >> their support, >> >> >> >> Best wishes >> >> >> >> Martin Hepp >> >> >> >> -- >> >> -------------------------------------------------------------- >> >> martin hepp >> >> e-business & web science research group >> >> universitaet der bundeswehr muenchen >> >> >> >> e-mail: mhepp@computer.org >> >> phone: +49-(0)89-6004-4217 >> >> fax: +49-(0)89-6004-4620 >> >> www: http://www.unibw.de/ebusiness/ (group) >> >> http://www.heppnetz.de/ (personal) >> >> skype: mfhepp >> >> twitter: mfhepp >> >> >> >> Check out GoodRelations for E-Commerce on the Web of Linked Data! >> >> ================================================================= >> >> >> >> Webcast: >> >> http://www.heppnetz.de/projects/goodrelations/webcast/ >> >> >> >> Recipe for Yahoo SearcMonkey: >> >> http://tr.im/rAbN >> >> >> >> Talk at the Semantic Technology Conference 2009: >> >> "Semantic Web-based E-Commerce: The GoodRelations Ontology" >> >> http://tinyurl.com/semtech-hepp >> >> >> >> Overview article on Semantic Universe: >> >> http://tinyurl.com/goodrelations-universe >> >> >> >> Project page: >> >> http://purl.org/goodrelations/ >> >> >> >> Resources for developers: >> >> http://www.ebusiness-unibw.org/wiki/GoodRelations >> >> >> >> Tutorial materials: >> >> CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on >> >> Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey >> >> http://tr.im/grcec09 >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> -------------------------------------------------------------- >> >> martin hepp >> >> e-business & web science research group >> >> universitaet der bundeswehr muenchen >> >> >> >> e-mail: mhepp@computer.org >> >> phone: +49-(0)89-6004-4217 >> >> fax: +49-(0)89-6004-4620 >> >> www: http://www.unibw.de/ebusiness/ (group) >> >> http://www.heppnetz.de/ (personal) >> >> skype: mfhepp >> >> twitter: mfhepp >> >> >> >> Check out GoodRelations for E-Commerce on the Web of Linked Data! >> >> ================================================================= >> >> >> >> Webcast: >> >> http://www.heppnetz.de/projects/goodrelations/webcast/ >> >> >> >> Recipe for Yahoo SearcMonkey: >> >> http://tr.im/rAbN >> >> >> >> Talk at the Semantic Technology Conference 2009: >> >> "Semantic Web-based E-Commerce: The GoodRelations Ontology" >> >> http://tinyurl.com/semtech-hepp >> >> >> >> Overview article on Semantic Universe: >> >> http://tinyurl.com/goodrelations-universe >> >> >> >> Project page: >> >> http://purl.org/goodrelations/ >> >> >> >> Resources for developers: >> >> http://www.ebusiness-unibw.org/wiki/GoodRelations >> >> >> >> Tutorial materials: >> >> CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on >> Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey >> >> http://tr.im/grcec09 >> > > > -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President & CEO OpenLink Software Web: http://www.openlinksw.com
Received on Wednesday, 2 September 2009 16:58:25 UTC