W3C home > Mailing lists > Public > public-vocabs@w3.org > January 2014

Re: E-commerce Product Description Dataset

From: Martin Hepp <martin.hepp@ebusiness-unibw.org>
Date: Tue, 7 Jan 2014 19:41:45 +0100
Cc: SchemaDot Org <public-vocabs@w3.org>
Message-Id: <B02B9C7A-7277-448E-991C-275EAF6B4CE7@ebusiness-unibw.org>
To: Barbara Starr <BarbaraStarr2009@gmail.com>, Olivier Austina <olivier.austina@gmail.com>
In short: There are no freely available, large-scale e-commerce datasets available, mostly because:

1. It is a huge effort.
2. The data is dynamic, so you have to do it constantly.
3. There are IPR issues with releasing the results of such crawls. My group, for instance, has several huge crawls for internal research purposes, but we cannot simply put them online without the risk of being sued e.g. for copyright on product texts, or similar. At least a watertight legal clearance is outside our abilities.

The http://webdatacommons.org/ crawl is an attempt, but since it relies on http://commoncrawl.org/, it misses the biggest part of data, because the crawl does not go deep enough into e-commerce sites.

So if you want to use respective data for research purposes or your startup idea, you will have to crawl and consolidate the data on your own.


On Jan 1, 2014, at 9:38 PM, Barbara Starr wrote:

> OR even here if you are comfortable with SPARL, etc: http://linkedopencommerce.com/
> Hope that helps :)
> On Jan 1, 2014, at 9:18 AM, Olivier Austina <olivier.austina@gmail.com> wrote:
>> Hi,
>> I am looking for e-commerce product description dataset. I am not targeting a specific product or specific ontology of the dataset. It can be form schema.org such as Good Relation or others. Any dataset is welcome. Thank you.   
>> Regards
>> Olivier

martin hepp
e-business & web science research group
universitaet der bundeswehr muenchen

e-mail:  hepp@ebusiness-unibw.org
phone:   +49-(0)89-6004-4217
fax:     +49-(0)89-6004-4620
www:     http://www.unibw.de/ebusiness/ (group)
         http://www.heppnetz.de/ (personal)
skype:   mfhepp 
twitter: mfhepp

Check out GoodRelations for E-Commerce on the Web of Linked Data!
* Project Main Page: http://purl.org/goodrelations/
Received on Tuesday, 7 January 2014 18:42:08 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:49:20 UTC