Re: A real world example: Dutch registry of buildings and addresses

Hi Frans,

I do have a suggestion, and it will make your life much easier.  You have a date already.  I suggest you add a "Gaussian Timestamp" as a "version" as well.  Gauss's computation of Easter is well known.  Open Office has an EASTERSUNDAY() function. The calculation, based on harmonics, is accurate to within 3 days (One of them will be a Sunday).  The point is that Easter does not cause New Years (or Christmas).  That is the post hoc ergo propter hoc fallacy.  If you are a business, the "cause" of a quarterly report is that it is time for a quarterly report.  Same thing for monthly, weekly etc. reports.  A calendar quarter is ((365.25 x 4)/16) = (1461/16) = 91.3125 days long.  The fractional phase (Hour Angle) matters to Gauss's computation.

It varies a bit from year to year.  That is not the point.  The point is that 53 million triples all have the same "birthday" so the user knows queries saved might need an update - however the user *does not need to ask you* if the data needs an update since they can figure out the schedule for themselves.  This does not smooth your data in any way.

New Years             1.0000     2014-01-01T00:00:00Z
1st Q. report         91.3125     2014-04-02T07:30:00Z
Mid-Year             182.6250     2014-07-02T15:00:00Z
3rd Q. report        273.9375     2014-10-01T22:30:00Z
Annual report       365.2500     2015-01-01T06:00:00Z
(start over)

The scheme is described in less detail here (funnier though)
[1] http://www.rustprivacy.org/2014/balance/reports/
[2] http://www.rustprivacy.org/2014/balance/reports/StratML.pdf
--------------------------------------------
On Fri, 5/9/14, Frans Knibbe | Geodan <frans.knibbe@geodan.nl> wrote:

 Subject: A real world example: Dutch registry of buildings and addresses
 To: "public-locadd@w3.org Mailing list" <public-locadd@w3.org>
 Date: Friday, May 9, 2014, 10:37 AM
 
 
   
 
     
   
   
     Hello list,
 
     
 
     I have just finished (I think) a renewed publication of
 a dataset
     that could serve as a nice real world example of
 application of the
     core location vocabulary.
 
     The dataset is the Dutch registry of buildings and
 addresses. It
     consists of about 573 million triples. The URI of the
 dataset is
     http://lod.geodan.nl/basisreg/bag/.
 This URI should be enough to
     enable usage of the dataset as it should provide the
 data necessary
     for further exploration. The dataset is bilingual: all
 terms in the
     main vocabulary have explanations in Dutch and English.
 
 
     
 
     I would be happy with any comments from this group on
 this data set,
     or the associated vocabulary. I hope I have done some
 things right,
     but probably there is some room for improvement. 
 
     
 
     Anyway, I would like to list some of the issues that I
 have
     encountered that have something to do with the core
 location
     vocabulary. I would love to know what you think about
 these!
 
     
 
     About metadata: The dataset URI (http://lod.geodan.nl/basisreg/bag/)
     resolves to dataset metadata. Because this dataset
 contains location
     data (locations, addresses, geometries) I think some
 special
     metadata are called for. 
 
     
 
     Issue 1:  I feel that it is important to let
 it be known that
     a dataset is of a geographical nature, i.e., a consumer
 could expect
     data about locations in the data. As far as I know,
 there is no well
     established way of making such a statement. For this
 dataset, I
     specified <http://www.w3.org/ns/locn>
 as one of the main
     vocabularies used (using void:vocabulary) and I
 specified the
     spatial extent of the data (using dcterms:spatial).
 WDYT?
 
     
 
     Issue 2: Spatial Extent: The spatial extent of
 the dataset is
     specified by both a geometry and a dbpedia reference to
 the
     Netherlands. I think that is sufficient.
 
     
 
     Issue 3: CRS: I can think of no way to specify
 the CRS used
     in the data. An extension of LOCN to enable this would
 be welcome, I
     think. 
 
     
 
     Issue 4: Level of Detail / Spatial resolution:
 This would be
     applicable to the subsets (which are named graphs)
 within the
     dataset. I think that information could be useful to
 consumers, but
     I can not think of a way to express this. 
 
     
 
     About geometry:
 
     
 
     Issue 5: The geometries in the source data use
 the Dutch
     national CRS. I have transformed them to WGS84 lon/lat
 for several
     reasons:
 
     a) The triple store used (Virtuoso) does not support
 other CRSs yet
 
     b) I really do not like WKT literals with prefixed CRS
 URIs, as
     mandated by GeoSPARQL
 
     c) the CRS is more common, especially internationally it
 will be
     more useful.
 
     
 
     The only drawback I can think of is that this
 transformation would
     not do with very detailed geometries. Because these data
 are
     European, it would be better to use ETRS89. The current
 standard is
     far more useful for American data than for data from
 other
     continents!
 
     
 
     Issue 6: The publication is powered by Virtuoso
 7.1. This
     means there are capabilities for using topological
 functions in
     SPARQL. The following example asks the name of the town
 in which a
     point (which could be your current location) is located,
 using the
     function st_within(). The SPARQL endpoint is
     http://lod.geodan.nl/sparql,
 as specified in the metadata.
 
     
 
     prefix bag: <http://lod.geodan.nl/vocab/bag#>
 
     select ?name
 
     from <http://lod.geodan.nl/basisreg/bag/woonplaats/>
 
     where {
 
         ?wpmut a
 bag:Woonplaatsmutatie .
 
         ?wpmut bag:lastKnown
 "true"^^xsd:boolean .
 
         ?wpmut bag:geometrie ?geom
 .
 
         ?wpmut bag:naam
 ?name
 
         filter (bif:st_within(?geom,
 bif:st_point
       (6.56,53.21)))
 
     }
 
     
 
     It is not perfect yet: topological functions operate on
 bounding
     boxes of geometries, not the geometries themselves.
 Also, it is not
     yet possible to use GeoSPARQL expressions. According to
 people at
     Openlink, these issues will be resolved soon, in a next
 version of
     Virtuoso. 
 
     
 
     About application of LOCN:
 
     
 
     Issue 7: If you take a look at the vocabulary I
 made for this
     dataset (http://lod.geodan.nl/vocab/bag
     or
       http://lod.geodan.nl/vocab/bag.ttl), you can see
 that I tried
     to apply LOCN. Mostly, classes are defined as being
 subclasses of
     LOCN classes and properties are defined as being
 subproperties of
     LOCN properties. But without special measures, one can
 not use LOCN
     terms in SPARQL queries. The following example returns
 nothing
     because I have not created explicit triples for locn
 classes, and
     neither have I made  inference
       rules. So  I wonder if it is really
 worthwhile to use LOCN, or
     to use it in the way that I have.
 
     
 
     prefix locn: <http://www.w3.org/ns/locn#>
 
     select *
 
     from <http://lod.geodan.nl/basisreg/bag/ligplaats/>
 
     where {
 
         ?s a locn:Location
 .
 
     }
 
     
 
     Or to put in different words: what is the added value of
 LOCN in
     this case? And how could that added value be increased?
 
     
 
     
 
     Regards,
 
     Frans
 
     
 
     
 
     
         
         Frans Knibbe
 
         Geodan
 
         President Kennedylaan 1
 
         1079 MB Amsterdam (NL)
 
         
 
         T +31 (0)20 - 5711 347
 
         E frans.knibbe@geodan.nl
 
         www.geodan.nl | disclaimer
 
         
   
 
 
 
 
 

Received on Friday, 9 May 2014 19:59:27 UTC