- From: Frans Knibbe <frans.knibbe@geodan.nl>
- Date: Fri, 20 Nov 2015 11:06:57 +0100
- To: SDW WG Public List <public-sdw-wg@w3.org>
- Message-ID: <CAFVDz40LyM_w9VDschxpKEDi45Wtb4HBB_dN=a8tcEfX_7Xkyg@mail.gmail.com>
Oops, I missed Andrea's message. Sorry for the duplicate. Frans 2015-11-20 10:59 GMT+01:00 Frans Knibbe <frans.knibbe@geodan.nl>: > Hello all, especially BP editors, > > Some results from the GeoKnow project <http://geoknow.eu/Welcome.html> > were shared on geosemweb (the W3C version) list. I thought it would be > good to forward the message because this is about practices that could be > investigated in the search for best practices. > > Regards, > Frans > > > ---------- Forwarded message ---------- > From: W3C Community Development Team <team-community-process@w3.org> > Date: 2015-11-20 9:20 GMT+01:00 > Subject: GeoKnow Public Datasets [via Geospatial Semantic Web Community > Group] > To: public-geosemweb@w3.org > > > In this blogpost we want to present three public datasets that were > improved/created in GeoKnow project. > > LinkedGeoData > Size: 177GB zipped turtle file > URL: http://linkedgeodata.org/ > LinkedGeoData is the RDF version of Open Street Map (OSM), which covers > the > entire planet geospatial data information. As of September 2014 the > zipped xml > file from OSM had 36GB of data, while the size of zipped LGD files in > turtle > format is 177GB. The detailed description of the dataset can be found in > the > D1.3.2 Continuous Report on Performance Evaluation. > Technically, LinkedGeoData is set of SQL files, database-to-rdf (RDB2RDF) > mappings, and bash scripts. The actual RDF conversion is carried out by > the > SPARQL-to-SQL rewriter Sparqlify. You can view the Sparqlify Mappings for > LinkedGeoData here. Within The maintenance and improvement of the Mappings > required to transform OSM data to RDF has being done during all the > project. > This dataset has being used in several use cases, but specially for all > benchmarking tasks within GeoKnow. > Wikimapia > URL: http://wikimapia.org/api/ > Wikimapia is a crowdsourced, open-content, collaborative mapping > initiative, > where users can contribute mapping information. This dataset existed > already > before the project started. However it was only accessible through > Wikimapia’s > API⁴ and provided in XML or JSON formats. Within GeoKnow, we downloaded > several > sets of geospatial entities from Wikimapia, including both spatial and > non-spatial attributes for each entity and transformed them into RDF > data. The > process we followed is described next. We considered a set of cities > throughout > the world (Athens, London, Leipzig, Berlin, New York) and downloaded the > whole > content provided by Wikimapia regarding the geospatial entities included > in > those geographical areas. These cities where preferred since they are the > base > cities of several partners in the project, while > the rest two cities were randomly selected, with the aim to reach our > target of > more than 100000 spatial entities from Wikimapia. Apart from geometries, > Wikimapia provided a very rich set of metadata (non-spatial properties) > for each > entity (e.g. tags and categories describing the geospatial entities, > topological > relations > with nearby entities, comments of the users, etc.). The aforementioned > dumps > were transformed into RDF triples in a straightforward way: (a) defining > intermediate resources (functioning as blank nodes) where information was > organized in more than one levels, (b) flattening the information of deeper > levels where possible in order to simplify the structure of the dataset > and (c) > transforming tags into OWL classes. Specifically, we developed a parsing > tool to > communicate with the Wikimapia API and construct appropriate n-triples > from the > dataset. The tool takes as input a bounding box in the form of wgs84 > coordinates > (min long, min lat, max long, max lat). We chose five initial bounding > boxes: > one for each of the cities mentioned above. The bounding box was defined > in such > way so that it covered the whole area of the selected city. Each bounding > box > was then further divided by the tool into a grid of smaller bounding boxes > in > order to overcome the upper limit per area of the returned entities from > Wikimapia API. For each place returned, we transformed all properties > into RDF > triples. Every tag was assigned an OWL class and an appropriate label, > corresponding to the textual description in the initial Wikimapia XML > file. Each > place became an instance of the classes provided by its tags. For the rest > of > the returned Wikimapia attributes, we created a custom property in a > uniform way > for each attribute of the returned Wikimapia XML file. The properties > resulting > from the Wikimapia XML attributes point to their literal values. For > example, we > construct properties about each place’s language id, Wikipedia link, URL > link, > title, description, edit info, location info, global administrative areas, > available languages and geometry information. > If these attributes follow a deeper tree structure, we assign the > properties at > intermediate custom nodes by concatenating the property with the place ID; > these > nodes function as blank nodes and connect the initial entity with a set of > properties and the respective values. This process resulted to creating an > initial geospatial RDF dataset containing, for each entity, the polygon > geometry > that represents it, along with a wealth of non-spatial properties of the > entity. > The dataset contains 102,019 geospatial entities and 4,629,223 triples. > Upon that, in order to create a synthetically interlinked pair of datasets, > we > split the Wikimapia RDF dataset, duplicating the geometries and dividing > them > into the two datasets in the following way. For each polygon geometry, we > created another point geometry located in the centroid of the polygon and > then > shifted the point by a random (but bounded) factor⁵. The polygon was left > in > the first dataset where the point was transferred to the second dataset. > The > rest of the properties where distributed between the two datasets as > follows: The first dataset consists of metadata containing the main > information > about the Wikimapia places and edit information about users, timestamps, > deletion state and editors. The second dataset consists of metadata > concerning > basic info, location and language information. This way, the two sub- > datasets > essentially refer to the same Wikimapia entities, differing only in > geometric > and metadata information. Each of the two sub-datasets contains 102,019 > geospatial entities and the first one contains 1,225,049 triples while the > second one 4,633,603 triples. > > Seven Greek INSPIRE-compliant data themes of Annex I > URL: http://geodata.gov.gr/sparql/ > For the INSPIRE to RDF use case, we selected seven data themes from Annex > I,that > are describes in the Table below. Although all metadata in geodata.gov.gr > is > fully compatible with INSPIRE regulations, data is not because it has been > integrated from several diverse sources, which have rarely followed the > proper > standards. Thus, due to data variety, provenance, and excessive volume, its > transformation into INSPIRE-compliant datasets is a time-consuming and > demanding > task. The first step was the alignment of the data to INSPIRE Annex I. To > this > goal, we utilised the Humboldt Alignment Editor, a powerful open-source > tool > with a graphical interface and a high-level language for expressing custom > alignments. Such transformation can be used to turn a non-harmonised data > source > to an INSPIRE-compliant dataset. It only requires a source schema (an .xsd > for > the local GML file) and a target one > (an .xsd implementing an INSPIRE data schema). As soon as the schema > mapping was > defined, the source GML data was loaded, and the INSPIRE-aligned GML file > was > produced. > The second step was the transformation into RDF. This process was quite > straightforward, provided the set of suitable XSL stylesheets. We > developed all > these transformations in XSLT 2.0, implementing one parametrised > stylesheet per > selected data theme. By default, all geometries were encoded in WKT > serialisations according to GeoSPARQL.The produced RDF triples were > finally > loaded and made available in both Virtuoso and Parliament RDF stores, in > http://geodata.gov.gr/sparql, as a proof of concept. > > > INSPIRE Data ThemeGreek datasetNumber of featuresNumber of triples > [GN] Geographical names Settlements, towns, and localities in Greece.13 > 259304 > 957 > [AU] Administrative units All Greek municipalities after the most recent > restructuring (”Kallikratis”).326 9 454 > [AD] Addresses Street addresses in Kalamaria municipality.10 776277 838 > [CP] Cadastral parcels The building blocks in Kalamaria are used. Data > from the > official Greek Cadastre are not available through geodata. gov.gr.96513 > 510 > [TN] Transport networks Urban road network in Kalamaria.2 58459 432 > [HY] Hydrography All rivers and waterstreams in Greece.4299120 372 > [PS] Protected sites All areas of natural preservation in Greece according > to > the EU Natura 2000 network.419 10 894 > > > > > ---------- > > This post sent on Geospatial Semantic Web Community Group > > > > 'GeoKnow Public Datasets' > > https://www.w3.org/community/geosemweb/2015/11/20/geoknow-public-datasets/ > > > > Learn more about the Geospatial Semantic Web Community Group: > > https://www.w3.org/community/geosemweb > > > > >
Received on Friday, 20 November 2015 10:07:35 UTC