Re: A real world example: Dutch registry of buildings and addresses

On 2014-05-10 1:37, Andrea Perego wrote:
> Interesting work, Frans! Thanks a lot for sharing it.
And thank you for the comments! Mine are inline too.
>
> My comments inline.
>
>> [snip]
>>
>> About metadata: The dataset URI (http://lod.geodan.nl/basisreg/bag/)
>> resolves to dataset metadata. Because this dataset contains location data
>> (locations, addresses, geometries) I think some special metadata are called
>> for.
>>
>> Issue 1:  I feel that it is important to let it be known that a dataset is
>> of a geographical nature, i.e., a consumer could expect data about locations
>> in the data. As far as I know, there is no well established way of making
>> such a statement. For this dataset, I specified <http://www.w3.org/ns/locn>
>> as one of the main vocabularies used (using void:vocabulary) and I specified
>> the spatial extent of the data (using dcterms:spatial). WDYT?
> Not sure this is enough. The use of dcterms:spatial and of the LOCN
> vocabulary does not mean that the dataset is about "location data".
>
> In VoID there's the following example:
>
> :Geonames a void:Dataset;
>      dcterms:subject <http://dbpedia.org/resource/Location> .
>
> See http://www.w3.org/TR/void/#subject
OK, that seems a good suggestion. I have added it to the metadata.
>> [snip]
>>
>> Issue 3: CRS: I can think of no way to specify the CRS used in the data. An
>> extension of LOCN to enable this would be welcome, I think.
> +1 from me.
I suppose it is up to the volunteers to get it done :-)
>
>> Issue 4: Level of Detail / Spatial resolution: This would be applicable to
>> the subsets (which are named graphs) within the dataset. I think that
>> information could be useful to consumers, but I can not think of a way to
>> express this.
> Probably, this can be addressed by a specific extension of the LOCN
> vocabulary, as for point (3).
That would be nice. If we can get some heads together it should not be 
that hard. I think some work on quantifying spatial resolution is 
already done in the OGC.
>> About geometry:
>>
>> Issue 5: The geometries in the source data use the Dutch national CRS. I
>> have transformed them to WGS84 lon/lat for several reasons:
>> a) The triple store used (Virtuoso) does not support other CRSs yet
>> b) I really do not like WKT literals with prefixed CRS URIs, as mandated by
>> GeoSPARQL
>> c) the CRS is more common, especially internationally it will be more
>> useful.
>>
>> The only drawback I can think of is that this transformation would not do
>> with very detailed geometries. Because these data are European, it would be
>> better to use ETRS89. The current standard is far more useful for American
>> data than for data from other continents!
> I wonder whether you had any feedback from the Virtuoso team on
> whether they plan to support other CRSs and how - I mean, if following
> GeoSPARQL or (also) with the CRS specified separately from the
> geometry.
I understand they are going to include the proj4 library in a next 
version of Virtuoso Open Source. A next version could be released soon 
and it will probably have a lot of improvements for geographical data. I 
don't know whether the next version will already have full CRS support. 
About the how, I think that is an interesting question. I will ask and I 
will report back here if I get an answer. I guess GeoSPARQL is hard to 
ignore (we saw in London how widespread its use already is) and they 
will probably have to support it, but I think it would be nice if other 
representations are allowed too.
>> [snip]
>>
>> About application of LOCN:
>>
>> Issue 7: If you take a look at the vocabulary I made for this dataset
>> (http://lod.geodan.nl/vocab/bag or http://lod.geodan.nl/vocab/bag.ttl), you
>> can see that I tried to apply LOCN. Mostly, classes are defined as being
>> subclasses of LOCN classes and properties are defined as being subproperties
>> of LOCN properties. But without special measures, one can not use LOCN terms
>> in SPARQL queries. The following example returns nothing because I have not
>> created explicit triples for locn classes, and neither have I made
>> inference rules. So  I wonder if it is really worthwhile to use LOCN, or to
>> use it in the way that I have.
>>
>> [snip]
>>
>> Or to put in different words: what is the added value of LOCN in this case?
>> And how could that added value be increased?
> Frans, could you please provide some details about your design choices
> for the vocabulary? E.g., if I understand correctly, you are modelling
> also changes to addresses and address components. I wonder whether
> this would be in scope of the LOCN vocabulary or of a specific
> extension.
The registry that the dataset is based on contains historical data. 
Existing data are not changed, but new records are added. Records have 
start dates and end dates that enable historical queries or getting the 
current record. For publication of the registry as Linked Data, I have 
introduced the concept of a mutation (bag:Mutatie), which is a set of 
data that corresponds to a certain time interval. For convenience, I 
have added the property bag:lastKnown, which is a boolean that indicates 
whether a mutation is the last known valid mutation. If one is not 
interested in historical data, but wants the most recent data, this 
property can be used as a filter.

I believe that modelling changes in datasets can be kept separate from 
the topic(s) of the dataset. Whether the dataset contains data on 
addresses, locations, people or whatever, I think (and hope) that 
version management or managing lineage is a subject that can be kept 
completely separate.
>
> A few more specific comments.
>
> 1. Naming errors:
> - "a rdfs:class" instead of "a rdfs:Class"
> - "a rdfs:property" instead of "a rdf:Property"
> - "a locn:Location" instead of "a dcterms:Location"
>
> 2. It seems that :Woonplaats is both a class and property instance ("a
> locn:postName"):
>
> :Woonplaats
>    a rdfs:class;
>    a locn:Location;
>    a geosparql:SpatialObject;
>    a locn:postName;
>
> 3. Always about :Woonplaats (but this applies also to other classes),
> why is it an instance, and not a subclass of dcterms:Location?
Thank you for spotting the mistakes! I have corrected them. And I 
started looking for a more thorough way of validating vocabularies...
>
> Cheers,
>
> Andrea


------------------------------------------------------------------------
Frans Knibbe
Geodan
President Kennedylaan 1
1079 MB Amsterdam (NL)

T +31 (0)20 - 5711 347
E frans.knibbe@geodan.nl
www.geodan.nl <http://www.geodan.nl> | disclaimer 
<http://www.geodan.nl/disclaimer>
------------------------------------------------------------------------

Received on Tuesday, 13 May 2014 09:31:05 UTC