- From: Martin Hepp (UniBW) <martin.hepp@ebusiness-unibw.org>
- Date: Wed, 20 May 2009 11:35:36 +0200
- To: Libby Miller <libby@nicecupoftea.org>
- CC: semantic-web at W3C <semantic-web@w3c.org>, "public-lod@w3.org" <public-lod@w3.org>
- Message-ID: <4A13CEE8.30604@ebusiness-unibw.org>
Hi Libby, > > That's rather fabulous! Can you give some information about how often > this dataset is updated, and what's its geographical and product type > reach? Thanks! This particular data set is a rather static collection and has a bias towards US products. It will soon be complemented by a more dynamic and European-centric second data set. In the long run, we will have to convince professional providers of commodity master data (e.g. GS1) to release their data following our structure. Currently, this is not possible due to licensing restrictions (there are look-up services like GEPIR, but none of them allows redistribution of the data). The upcoming second data set will be based on a community process, i.e., shop owners enter labels for EAN/UPCs in a Wiki. Since EAN/UPCs must (theoretically) not be reused, the current data set should be pretty reliable, though not necessarily very complete. I see the main benefit of the current data set in - using it as a showcase how small businesses can fetch product master data from the Semantic Web and - showing how data on the same commodity from multiple sources can be easily linked on the basis of having the same http://purl.org/goodrelations/v1.html#hasEAN_UCC-13 property value. > >> Individual commodity descriptions can be retrieved as follows: >> >> http://openean.kaufkauf.net/id/EanUpc_<UPC/EAN> >> >> Example: >> >> http://openean.kaufkauf.net/id/EanUpc_0001067792600 > > This seems to give me multiple product descriptions - am I > misunderstanding? The whole data set is divided in currently 100 (will be changed to 1000 soon) RDF files, which are being served via a bit complicated .htaccess configuration. The reason is that the large number of instance data would otherwise require 1 million very small files (a few triples each), which may cause problems with several file systems. Also, since we want as much of our data as possible to stay within OWL DL (I know not everybody in the community shares that), this would cause a lot of redundancy due to ontology imports / header data in each single file. But as far as I can see, the current approach should not have major side effects - you get back additional triples, but the size of the files being served is limited. Currently, we serve 4 MB file chunks. We will shortly reduce that to 400 - 800 KB. That seems reasonable to me. Best Martin > > Libby > >
Received on Wednesday, 20 May 2009 09:36:16 UTC