- From: Ghislain Atemezing <auguste.atemezing@eurecom.fr>
- Date: Sat, 9 Aug 2014 11:32:11 +0200
- To: public-dwbp-wg@w3.org
- Cc: atemezin@eurecom.fr
- Message-Id: <F9A3C474-60EC-4164-BEE3-9293AB70145E@eurecom.fr>
Hi all, I’ve just came across this issue with Geonames dump in RDF, that seems to be a quite “normal” situation. The issue is well described here [1] , as it says : "Geonames is a great resource for geographical information. Helpfully they publish data exports in a variety of formats, allowing others to process and manipulate the data locally. Unfortunately the RDF data dump that is available from: [http://download.geonames.org/export/dump/all-geonames-rdf.txt.zip] is a little idiosyncratic. Rather than provide a single ntriples or even RDF/XML file the dump consists of a text file that consists of alternating lines like this: ...feature URI.... rdf:RDF...RDF/XML description of feature..../rdf:RDF This means you need to script up unpacking the file in order to load it into a triple store. " As you can imagine, this implies two issues : 1- Users/consumers have to write scripts for “harmonizing “ in clean triples . 2- The provider claims [2] to have 8514201 features and about 125 mio rdf triples (2013 08 27). 2-1: How to ensure this original number is kept after uploading in third party endpoint ? For example, I was looking at LOD cache and Factforge to fing GEonames features #- results for http://lod.openlinksw.com/sparql — > see http://goo.gl/VFfQ4x (4 989 694 / 5 539 694 features) #- Results for factforge.net: http://goo.gl/VC2YuE (8.060.727 features). This seems to be more “realistic” according to the original dump. 3- Trusting issue: Which endpoint to trust when I don’t have enough resource to build a script and load Geonames dump in local ? With all the above issues, do you think this can be a “valid” Use Case for this group to deal with ? WDYT ? Best, Ghislain [1] https://github.com/ldodds/geonames [2] http://www.geonames.org/ontology/documentation.html ------------- Ghislain Atemezing EURECOM, Multimedia Communication Department Campus SophiaTech 450, route des Chappes, 06410 Biot, France email: auguste.atemezing@eurecom.fr & ghislain.atemezing@gmail.com Tel: +33 (0)4- 9300 8178 Fax: +33 (0)4- 9000 8200 Web: http://www.eurecom.fr/~atemezin Google+: http://google.com/+GhislainATEMEZING Twitter: @gatemezing
Received on Saturday, 9 August 2014 09:32:42 UTC