Re: ANN: BestBuy.com starts publishing full catalog as RDF/XML using GoodRelations - 27 million triples

Actually, my bad :-)
we put the sitemap as it is in Sindice and the processor and didnt
work, i assumed it was just a set of links to the dumps.
We'll fix that on our side, and update on the outcome.

cheers
Giovanni


On Tue, Sep 1, 2009 at 2:13 PM, Martin Hepp
(UniBW)<martin.hepp@ebusiness-unibw.org> wrote:
> Hi Giovanni:
>
> Giovanni Tummarello wrote:
>
> Hi Martin, all,
>
>  the sitemap exposed is not a Semantic Sitemap
>
> Semantic Sitemap: http://products.semweb.bestbuy.com/sitemap.xml
>
> but simply gives the location of the dumps.
>
>
>
> As far as I see, the sitemap at
>
> http://products.semweb.bestbuy.com/sitemap.xml
>
> gives the locations of the compressed semantic sitemaps:
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
>     <sitemap>
>         <loc>http://products.semweb.bestbuy.com/sitemap1.xml.gz</loc>
>         <lastmod>2009-07-31T18:23:17+00:00</lastmod>
>     </sitemap>
>
>
> Each one of those seems to be a proper semantic sitemap
> E.g.
>
> http://products.semweb.bestbuy.com/sitemap1.xml.gz
>
> -->
>
> <?xml version="1.0" encoding="UTF-8"?>
> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
> xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">
>     <sc:dataset>
>         <sc:datasetLabel>Sitemap data for Best Buy Co., Inc., products. Data
> based on http://purl.org/goodrelations/</sc:datasetLabel>
>         <sc:datasetURI>http://products.semweb.bestbuy.com/</sc:datasetURI>
>         <sc:linkedDataPrefix
> slicing="subject-object">http://products.semweb.bestbuy.com/</sc:linkedDataPrefix>
>
> <sc:sampleURI>http://products.semweb.bestbuy.com/products/9380001/semanticweb.rdf</sc:sampleURI>
>
> <sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/43900/semanticweb.rdf</sc:dataDumpLocation>
>
> <sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/48521/semanticweb.rdf</sc:dataDumpLocation>
>
> <sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/48530/semanticweb.rdf</sc:dataDumpLocation>
>
> <sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/54256/semanticweb.rdf</sc:dataDumpLocation>
>
>
>
> in theory if this information is exposed as linked data then one would
> like to have a semantic sitemap exposed,
>
> As said - I understand BestBuy is using the main sitemap to bundle the
> individual semantic sitemaps. Note that they are dealing with 450,000
> documents. A single sitemap file would be pretty large.
>
> which includes other details
> e.g. a sparql endpoint some information on the datasets etc. [1]
>
>
>
> There is, to my knowledge, no SPARQL endpoint offered by BestBuy.com, but
> you can soon simply use the Linked Open Commerce dataspace at
>
> http://loc.openlinksw.com/sparql
>
> This will contain a current copy of the bestbuy graphs.
>
> has this been considered and decided against?
>
> As far as I know, the combination of a sitemap and 23 semantic sitemaps was
> a pragmatic decision. If it causes major problems, Jay Myers from BestBuy
> will for sure be open to improvements for suggestions.
>
> should we just live with
> it and fit sindice to do some guesswork and process those instead? (i
> am not necessarely against this last solution really.. )
>
>
> You simply have to fetch and un-gzip the 23 semantic sitemaps at
>
> http://products.semweb.bestbuy.com/sitemap<n>.xml.gz
>
> with <n> being a number from 1 to 23.
>
> Note that
>
> http://products.semweb.bestbuy.com/sitemap5.xml.gz
>
> seems to have a syntactical problem (fix is already requested).
>
> In other words are you suggesting the use of semantic sitemaps
>
> We usually recommend using semantic sitemaps. But actually I think that a
> consolidated dataspace like the LOC will become more important in the
> future, because it creates to much overhead for each agent and application
> to crawl and consolidate the whole Web of Linked Data on his/her own.
>
> or
> should we just come to term to this? The disavantage is that linked
> data browser that wants to use an index to find information will be
> able to do so less reliably (hope that our guesswork works)
>
>
> As said - I understand (without a thorough analyis, though), that BestBuy's
> usage of a single sitemap and multiple semantic sitemaps is okay.
>
> Giovanni
>
> [1] http://sw.deri.org/2007/07/sitemapextension/
>
> On Mon, Aug 31, 2009 at 8:08 PM, Martin Hepp
> (UniBW)<martin.hepp@ebusiness-unibw.org> wrote:
>
>
> Dear all:
>
> BestBuy.com has just started to serve a complete RDF/XML dump of their
> products and price information to the Web of Linked Data, using the
> GoodRelations vocabulary for e-commerce. The data dump is updated on a
> daily basis and contains detailed descriptions for roughly 450,000
> individual items. With about 60 triples per item, this totals to about
> 27 million RDF triples.
>
> Semantic Sitemap: http://products.semweb.bestbuy.com/sitemap.xml
>
> Examples:
> a) Software:
> http://products.semweb.bestbuy.com/products/8182593/semanticweb.rdf
>
> b) "Hardgoods":
> http://products.semweb.bestbuy.com/products/8794691/semanticweb.rdf
>
> c) Movies:
> http://products.semweb.bestbuy.com/products/7590289/semanticweb.rdf
>
> d) Games:
> http://products.semweb.bestbuy.com/products/9223752/semanticweb.rdf
>
> Other than many existing large RDF transcripts, the data very dynamic,
> holding the daily prices for all items.
> According to Wikipedia, BestBuy.com is the largest specialty retailer of
> consumer electronics in the United States accounting for 19% of the market.
>
> It is likely the first Fortune 500 company to start publishing offer
> details on the Web of Linked Data.
>
> Congratulations to Jay Myers from BestBuy.com for this excellent
> contribution, and a big thanks to Andreas Radinger and Alex Stolz for
> their support,
>
> Best wishes
>
> Martin Hepp
>
> --
> --------------------------------------------------------------
> martin hepp
> e-business & web science research group
> universitaet der bundeswehr muenchen
>
> e-mail:  mhepp@computer.org
> phone:   +49-(0)89-6004-4217
> fax:     +49-(0)89-6004-4620
> www:     http://www.unibw.de/ebusiness/ (group)
>         http://www.heppnetz.de/ (personal)
> skype:   mfhepp
> twitter: mfhepp
>
> Check out GoodRelations for E-Commerce on the Web of Linked Data!
> =================================================================
>
> Webcast:
> http://www.heppnetz.de/projects/goodrelations/webcast/
>
> Recipe for Yahoo SearcMonkey:
> http://tr.im/rAbN
>
> Talk at the Semantic Technology Conference 2009:
> "Semantic Web-based E-Commerce: The GoodRelations Ontology"
> http://tinyurl.com/semtech-hepp
>
> Overview article on Semantic Universe:
> http://tinyurl.com/goodrelations-universe
>
> Project page:
> http://purl.org/goodrelations/
>
> Resources for developers:
> http://www.ebusiness-unibw.org/wiki/GoodRelations
>
> Tutorial materials:
> CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on
> Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
> http://tr.im/grcec09
>
>
>
>
>
> --
> --------------------------------------------------------------
> martin hepp
> e-business & web science research group
> universitaet der bundeswehr muenchen
>
> e-mail:  mhepp@computer.org
> phone:   +49-(0)89-6004-4217
> fax:     +49-(0)89-6004-4620
> www:     http://www.unibw.de/ebusiness/ (group)
>          http://www.heppnetz.de/ (personal)
> skype:   mfhepp
> twitter: mfhepp
>
> Check out GoodRelations for E-Commerce on the Web of Linked Data!
> =================================================================
>
> Webcast:
> http://www.heppnetz.de/projects/goodrelations/webcast/
>
> Recipe for Yahoo SearcMonkey:
> http://tr.im/rAbN
>
> Talk at the Semantic Technology Conference 2009:
> "Semantic Web-based E-Commerce: The GoodRelations Ontology"
> http://tinyurl.com/semtech-hepp
>
> Overview article on Semantic Universe:
> http://tinyurl.com/goodrelations-universe
>
> Project page:
> http://purl.org/goodrelations/
>
> Resources for developers:
> http://www.ebusiness-unibw.org/wiki/GoodRelations
>
> Tutorial materials:
> CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on
> Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
> http://tr.im/grcec09
>

Received on Tuesday, 1 September 2009 13:30:21 UTC