Re: ANN: BestBuy.com starts publishing full catalog as RDF/XML using GoodRelations - 27 million triples

Giovanni Tummarello wrote:
> Jay,
>
> actually, as Kingsley was suggesting already, the truly best way to
> expose this data would be by embedding RDFa in the actual web pages
> that bestbuy has.
> One would get :
>
> a) the exact same benefits than publishing the files alone (afterall
> the RDF is just a transformation away)
> b) the certainty of metadata being the same that the user sees
> c) getting away from ambiguity of identifiers, the page would be used
> as identifier for the item, period. much easier for other people to
> find identifiers and link to them
> d) totally ready for structured snippets, yahoo searchmonkey etc and
> future semantic search engine optimizations.
> e) a true enabler for client side applications, e.g. a firefox plugin
> which acts as side shopping "assistants" e.g.allowin rich searching,
> faceted comparison  in the browser history or all sort of user centric
> advanced use of structured data (e.g. a la "piggybank", for the
> semantic web historians)
>   
Amen! But s/piggybank/ode/g :-)

> all this just some RDFa away :-) . Is it thinkable that this can
> happen? Afterall its totally invisible for the user.
>   
My guess is that it will happen. Note, that <http://stores.bestbuy.com> 
already has some RDFa in place :-)
> of course the dumps would still be very useful!! (for the purpose of
> not recrawling) and so the sitemap/semantic sitemap.
> for entities that bestbuy does not intend to expose as pages (e.g. a
> URI about a company) the pure RDF/XML would still be useful.
>   
Hmm but the description (About) company is already exposed, so even 
that's just a case of marking up the existing HTML based "About" page  
with RDFa :-)
> Hoping that others also agree on these benefits.
>   
URIBurner home page is updated, and I am hoping the virtues of HTML 
representation of Metadata become clearer, especially as you can deliver 
these benefits via proxy/wrapper style HTTP URIs.


Kingsley

> thanks again for your efforts
> Giovanni
>
>
> On Tue, Sep 1, 2009 at 3:43 PM, Myers, Jay<Jay.Myers@bestbuy.com> wrote:
>   
>> All,
>>
>>
>>
>> Thanks for the insight. As far as the sitemap is concerned, I used the
>> current sitemap protocol (http://www.sitemaps.org/schemas/sitemap/0.9).
>> Since we are publishing around 452K documents, it seemed like the correct
>> route to use sitemap index files, as one file would certainly contain over
>> 50,000 URIs and be over 10MB. I’m not aware of another method in which to
>> publish this amount of data in a sitemap J
>>
>>
>>
>> At this point, we have no SPARQL endpoint, we are simply publishing product
>> data out via RDF. I am hoping that attention to this effort will be noticed
>> by senior leadership, convincing them to sponsor a greater, more complete
>> effort that could serve as a model for big business. Any suggestions on this
>> would be welcome.
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Jay
>>
>>
>>
>> Jay Myers
>>
>> Lead Web Development Engineer
>>
>> Online Solutions, BestBuy.com
>>
>> jay.myers@bestbuy.com
>>
>> (w) 612-291-4007
>>
>> (c) 612-296-5836
>>
>> (twitter) @jaymyers
>>
>> (skype) jaymmyers
>>
>>
>>
>>
>>
>> ________________________________
>>
>> From: Martin Hepp (UniBW) [mailto:martin.hepp@ebusiness-unibw.org]
>> Sent: Tuesday, September 01, 2009 8:14 AM
>> To: giovanni.tummarello@deri.org
>> Cc: public-lod@w3.org
>> Subject: Re: ANN: BestBuy.com starts publishing full catalog as RDF/XML
>> using GoodRelations - 27 million triples
>>
>>
>>
>> Hi Giovanni:
>>
>> Giovanni Tummarello wrote:
>>
>> Hi Martin, all,
>>
>>
>>
>>  the sitemap exposed is not a Semantic Sitemap
>>
>>
>>
>> Semantic Sitemap: http://products.semweb.bestbuy.com/sitemap.xml
>>
>>
>>
>> but simply gives the location of the dumps.
>>
>>
>>
>>
>>
>> As far as I see, the sitemap at
>>
>> http://products.semweb.bestbuy.com/sitemap.xml
>>
>> gives the locations of the compressed semantic sitemaps:
>>
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
>>     <sitemap>
>>         <loc>http://products.semweb.bestbuy.com/sitemap1.xml.gz</loc>
>>         <lastmod>2009-07-31T18:23:17+00:00</lastmod>
>>     </sitemap>
>>
>>
>> Each one of those seems to be a proper semantic sitemap
>> E.g.
>>
>> http://products.semweb.bestbuy.com/sitemap1.xml.gz
>>
>> -->
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
>> xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">
>>     <sc:dataset>
>>         <sc:datasetLabel>Sitemap data for Best Buy Co., Inc., products. Data
>> based on http://purl.org/goodrelations/</sc:datasetLabel>
>>         <sc:datasetURI>http://products.semweb.bestbuy.com/</sc:datasetURI>
>>         <sc:linkedDataPrefix
>> slicing="subject-object">http://products.semweb.bestbuy.com/</sc:linkedDataPrefix>
>>
>> <sc:sampleURI>http://products.semweb.bestbuy.com/products/9380001/semanticweb.rdf</sc:sampleURI>
>>
>> <sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/43900/semanticweb.rdf</sc:dataDumpLocation>
>>
>> <sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/48521/semanticweb.rdf</sc:dataDumpLocation>
>>
>> <sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/48530/semanticweb.rdf</sc:dataDumpLocation>
>>
>> <sc:dataDumpLocation>http://products.semweb.bestbuy.com/products/54256/semanticweb.rdf</sc:dataDumpLocation>
>>
>>
>>
>>
>> in theory if this information is exposed as linked data then one would
>>
>> like to have a semantic sitemap exposed,
>>
>> As said - I understand BestBuy is using the main sitemap to bundle the
>> individual semantic sitemaps. Note that they are dealing with 450,000
>> documents. A single sitemap file would be pretty large.
>>
>>
>> which includes other details
>>
>> e.g. a sparql endpoint some information on the datasets etc. [1]
>>
>>
>>
>>
>>
>> There is, to my knowledge, no SPARQL endpoint offered by BestBuy.com, but
>> you can soon simply use the Linked Open Commerce dataspace at
>>
>> http://loc.openlinksw.com/sparql
>>
>> This will contain a current copy of the bestbuy graphs.
>>
>> has this been considered and decided against?
>>
>> As far as I know, the combination of a sitemap and 23 semantic sitemaps was
>> a pragmatic decision. If it causes major problems, Jay Myers from BestBuy
>> will for sure be open to improvements for suggestions.
>>
>> should we just live with
>>
>> it and fit sindice to do some guesswork and process those instead? (i
>>
>> am not necessarely against this last solution really.. )
>>
>>
>>
>> You simply have to fetch and un-gzip the 23 semantic sitemaps at
>>
>> http://products.semweb.bestbuy.com/sitemap<n>.xml.gz
>>
>> with <n> being a number from 1 to 23.
>>
>> Note that
>>
>> http://products.semweb.bestbuy.com/sitemap5.xml.gz
>>
>> seems to have a syntactical problem (fix is already requested).
>>
>>
>>
>> In other words are you suggesting the use of semantic sitemaps
>>
>> We usually recommend using semantic sitemaps. But actually I think that a
>> consolidated dataspace like the LOC will become more important in the
>> future, because it creates to much overhead for each agent and application
>> to crawl and consolidate the whole Web of Linked Data on his/her own.
>>
>>
>> or
>>
>> should we just come to term to this? The disavantage is that linked
>>
>> data browser that wants to use an index to find information will be
>>
>> able to do so less reliably (hope that our guesswork works)
>>
>>
>>
>> As said - I understand (without a thorough analyis, though), that BestBuy's
>> usage of a single sitemap and multiple semantic sitemaps is okay.
>>
>>
>>
>> Giovanni
>>
>>
>>
>> [1] http://sw.deri.org/2007/07/sitemapextension/
>>
>>
>>
>> On Mon, Aug 31, 2009 at 8:08 PM, Martin Hepp
>>
>> (UniBW)<martin.hepp@ebusiness-unibw.org> wrote:
>>
>>
>>
>> Dear all:
>>
>>
>>
>> BestBuy.com has just started to serve a complete RDF/XML dump of their
>>
>> products and price information to the Web of Linked Data, using the
>>
>> GoodRelations vocabulary for e-commerce. The data dump is updated on a
>>
>> daily basis and contains detailed descriptions for roughly 450,000
>>
>> individual items. With about 60 triples per item, this totals to about
>>
>> 27 million RDF triples.
>>
>>
>>
>> Semantic Sitemap: http://products.semweb.bestbuy.com/sitemap.xml
>>
>>
>>
>> Examples:
>>
>> a) Software:
>>
>> http://products.semweb.bestbuy.com/products/8182593/semanticweb.rdf
>>
>>
>>
>> b) "Hardgoods":
>>
>> http://products.semweb.bestbuy.com/products/8794691/semanticweb.rdf
>>
>>
>>
>> c) Movies:
>>
>> http://products.semweb.bestbuy.com/products/7590289/semanticweb.rdf
>>
>>
>>
>> d) Games:
>>
>> http://products.semweb.bestbuy.com/products/9223752/semanticweb.rdf
>>
>>
>>
>> Other than many existing large RDF transcripts, the data very dynamic,
>>
>> holding the daily prices for all items.
>>
>> According to Wikipedia, BestBuy.com is the largest specialty retailer of
>>
>> consumer electronics in the United States accounting for 19% of the market.
>>
>>
>>
>> It is likely the first Fortune 500 company to start publishing offer
>>
>> details on the Web of Linked Data.
>>
>>
>>
>> Congratulations to Jay Myers from BestBuy.com for this excellent
>>
>> contribution, and a big thanks to Andreas Radinger and Alex Stolz for
>>
>> their support,
>>
>>
>>
>> Best wishes
>>
>>
>>
>> Martin Hepp
>>
>>
>>
>> --
>>
>> --------------------------------------------------------------
>>
>> martin hepp
>>
>> e-business & web science research group
>>
>> universitaet der bundeswehr muenchen
>>
>>
>>
>> e-mail:  mhepp@computer.org
>>
>> phone:   +49-(0)89-6004-4217
>>
>> fax:     +49-(0)89-6004-4620
>>
>> www:     http://www.unibw.de/ebusiness/ (group)
>>
>>         http://www.heppnetz.de/ (personal)
>>
>> skype:   mfhepp
>>
>> twitter: mfhepp
>>
>>
>>
>> Check out GoodRelations for E-Commerce on the Web of Linked Data!
>>
>> =================================================================
>>
>>
>>
>> Webcast:
>>
>> http://www.heppnetz.de/projects/goodrelations/webcast/
>>
>>
>>
>> Recipe for Yahoo SearcMonkey:
>>
>> http://tr.im/rAbN
>>
>>
>>
>> Talk at the Semantic Technology Conference 2009:
>>
>> "Semantic Web-based E-Commerce: The GoodRelations Ontology"
>>
>> http://tinyurl.com/semtech-hepp
>>
>>
>>
>> Overview article on Semantic Universe:
>>
>> http://tinyurl.com/goodrelations-universe
>>
>>
>>
>> Project page:
>>
>> http://purl.org/goodrelations/
>>
>>
>>
>> Resources for developers:
>>
>> http://www.ebusiness-unibw.org/wiki/GoodRelations
>>
>>
>>
>> Tutorial materials:
>>
>> CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on
>>
>> Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
>>
>> http://tr.im/grcec09
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> --------------------------------------------------------------
>>
>> martin hepp
>>
>> e-business & web science research group
>>
>> universitaet der bundeswehr muenchen
>>
>>
>>
>> e-mail:  mhepp@computer.org
>>
>> phone:   +49-(0)89-6004-4217
>>
>> fax:     +49-(0)89-6004-4620
>>
>> www:     http://www.unibw.de/ebusiness/ (group)
>>
>>          http://www.heppnetz.de/ (personal)
>>
>> skype:   mfhepp
>>
>> twitter: mfhepp
>>
>>
>>
>> Check out GoodRelations for E-Commerce on the Web of Linked Data!
>>
>> =================================================================
>>
>>
>>
>> Webcast:
>>
>> http://www.heppnetz.de/projects/goodrelations/webcast/
>>
>>
>>
>> Recipe for Yahoo SearcMonkey:
>>
>> http://tr.im/rAbN
>>
>>
>>
>> Talk at the Semantic Technology Conference 2009:
>>
>> "Semantic Web-based E-Commerce: The GoodRelations Ontology"
>>
>> http://tinyurl.com/semtech-hepp
>>
>>
>>
>> Overview article on Semantic Universe:
>>
>> http://tinyurl.com/goodrelations-universe
>>
>>
>>
>> Project page:
>>
>> http://purl.org/goodrelations/
>>
>>
>>
>> Resources for developers:
>>
>> http://www.ebusiness-unibw.org/wiki/GoodRelations
>>
>>
>>
>> Tutorial materials:
>>
>> CEC'09 2009 Tutorial: The Web of Data for E-Commerce: A Hands-on
>> Introduction to the GoodRelations Ontology, RDFa, and Yahoo! SearchMonkey
>>
>> http://tr.im/grcec09
>>     
>
>
>   


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com

Received on Wednesday, 2 September 2009 16:58:25 UTC