A real world example: Dutch registry of buildings and addresses

Hello list,

I have just finished (I think) a renewed publication of a dataset that 
could serve as a nice real world example of application of the core 
location vocabulary.
The dataset is the Dutch registry of buildings and addresses. It 
consists of about 573 million triples. The URI of the dataset is 
http://lod.geodan.nl/basisreg/bag/. This URI should be enough to enable 
usage of the dataset as it should provide the data necessary for further 
exploration. The dataset is bilingual: all terms in the main vocabulary 
have explanations in Dutch and English.

I would be happy with any comments from this group on this data set, or 
the associated vocabulary. I hope I have done some things right, but 
probably there is some room for improvement.

Anyway, I would like to list some of the issues that I have encountered 
that have something to do with the core location vocabulary. I would 
love to know what you think about these!

About *metadata*: The dataset URI (http://lod.geodan.nl/basisreg/bag/) 
resolves to dataset metadata. Because this dataset contains location 
data (locations, addresses, geometries) I think some special metadata 
are called for.

_Issue 1:_  I feel that it is important to let it be known that a 
dataset is of a geographical nature, i.e., a consumer could expect data 
about locations in the data. As far as I know, there is no well 
established way of making such a statement. For this dataset, I 
specified <http://www.w3.org/ns/locn> as one of the main vocabularies 
used (using void:vocabulary) and I specified the spatial extent of the 
data (using dcterms:spatial). WDYT?

_Issue 2:_ Spatial Extent: The spatial extent of the dataset is 
specified by both a geometry and a dbpedia reference to the Netherlands. 
I think that is sufficient.

_Issue 3:_ CRS: I can think of no way to specify the CRS used in the 
data. An extension of LOCN to enable this would be welcome, I think.

_Issue 4:_ Level of Detail / Spatial resolution: This would be 
applicable to the subsets (which are named graphs) within the dataset. I 
think that information could be useful to consumers, but I can not think 
of a way to express this.

About *geometry*:

_Issue 5:_ The geometries in the source data use the Dutch national CRS. 
I have transformed them to WGS84 lon/lat for several reasons:
a) The triple store used (Virtuoso) does not support other CRSs yet
b) I really do not like WKT literals with prefixed CRS URIs, as mandated 
by GeoSPARQL
c) the CRS is more common, especially internationally it will be more 
useful.

The only drawback I can think of is that this transformation would not 
do with very detailed geometries. Because these data are European, it 
would be better to use ETRS89. The current standard is far more useful 
for American data than for data from other continents!

_Issue 6:_ The publication is powered by Virtuoso 7.1. This means there 
are capabilities for using topological functions in SPARQL. The 
following example asks the name of the town in which a point (which 
could be your current location) is located, using the function 
st_within(). The SPARQL endpoint is http://lod.geodan.nl/sparql, as 
specified in the metadata.

prefix bag: <http://lod.geodan.nl/vocab/bag#>
select ?name
from <http://lod.geodan.nl/basisreg/bag/woonplaats/>
where {
     ?wpmut a bag:Woonplaatsmutatie .
     ?wpmut bag:lastKnown "true"^^xsd:boolean .
     ?wpmut bag:geometrie ?geom .
     ?wpmut bag:naam ?name
     filter (bif:st_within(?geom, bif:st_point (6.56,53.21)))
}

It is not perfect yet: topological functions operate on bounding boxes 
of geometries, not the geometries themselves. Also, it is not yet 
possible to use GeoSPARQL expressions. According to people at Openlink, 
these issues will be resolved soon, in a next version of Virtuoso.

About application of *LOCN*:

_Issue 7:_ If you take a look at the vocabulary I made for this dataset 
(http://lod.geodan.nl/vocab/bag orhttp://lod.geodan.nl/vocab/bag.ttl 
<http://lod.geodan.nl/vocab/bag.ttl>), you can see that I tried to apply 
LOCN. Mostly, classes are defined as being subclasses of LOCN classes 
and properties are defined as being subproperties of LOCN properties. 
But without special measures, one can not use LOCN terms in SPARQL 
queries. The following example returns nothing because I have not 
created explicit triples for locn classes, and neither have I made 
inference rules 
<http://docs.openlinksw.com/virtuoso/rdfsparqlrule.html>. So  I wonder 
if it is really worthwhile to use LOCN, or to use it in the way that I have.

prefix locn: <http://www.w3.org/ns/locn#>
select *
from <http://lod.geodan.nl/basisreg/bag/ligplaats/>
where {
     ?s a locn:Location .
}

Or to put in different words: what is the added value of LOCN in this 
case? And how could that added value be increased?


Regards,
Frans


------------------------------------------------------------------------
Frans Knibbe
Geodan
President Kennedylaan 1
1079 MB Amsterdam (NL)

T +31 (0)20 - 5711 347
E frans.knibbe@geodan.nl
www.geodan.nl <http://www.geodan.nl> | disclaimer 
<http://www.geodan.nl/disclaimer>
------------------------------------------------------------------------

Received on Friday, 9 May 2014 15:41:15 UTC