Re: ANN: GoodRelations - E-Commerce on the Web of Data - New Datasets and Applications

Hi Libby,

> That's rather fabulous! Can you give some information about how often 
> this dataset is updated, and what's its geographical and product type 
> reach?
Thanks! This particular data set is a rather static collection and has a 
bias towards US products. It will soon be complemented by a more dynamic 
and European-centric second data set.

In the long run, we will have to convince professional providers of 
commodity master data (e.g. GS1) to release their data following our 
structure. Currently, this is not possible due to licensing restrictions 
(there are look-up services like GEPIR, but none of them allows 
redistribution of the data).

The upcoming second data set will be based on a community process, i.e., 
shop owners enter labels for EAN/UPCs in a Wiki.

Since EAN/UPCs must (theoretically) not be reused, the current data set 
should be pretty reliable, though not necessarily very complete.

I see the main benefit of the current data set in
- using it as a showcase how small businesses can fetch product master 
data from the Semantic Web and
- showing how data on the same commodity from multiple sources can be 
easily linked on the basis of having the same

property value.
>> Individual commodity descriptions can be retrieved as follows:
>> Example:
> This seems to give me multiple product descriptions - am I 
> misunderstanding?
The whole data set is divided in currently 100 (will be changed to 1000 
soon) RDF files, which are being served via a bit complicated .htaccess 

The reason is that the large number of instance data would otherwise 
require 1 million very small files (a few triples each), which may cause 
problems with several file systems. Also, since we want as much of our 
data as possible to stay within OWL DL (I know not everybody in the 
community shares that), this would cause a lot of redundancy due to 
ontology imports / header data in each single file.

But as far as I can see, the current approach should not have major side 
effects - you get back additional triples, but the size of the files 
being served is limited. Currently, we serve 4 MB file chunks. We will 
shortly reduce that to 400 - 800 KB. That seems reasonable to me.


> Libby

Received on Wednesday, 20 May 2009 09:36:17 UTC