RE: A real world example: Dutch registry of buildings and addresses from Makx Dekkers on 2014-05-28 (public-locadd@w3.org from May 2014)

From: Makx Dekkers <mail@makxdekkers.com>
Date: Wed, 28 May 2014 14:26:53 +0200
To: <public-locadd@w3.org>
Cc: <public-egov-ig@w3.org>
Message-ID: <001f01cf7a70$1d87ab30$58970190$@makxdekkers.com>
There is a controlled vocabulary for the AccrualPolicy at DCMI:
http://dublincore.org/groups/collections/accrual-policy/

However, it is a very simple set of terms and does not satisfy the more
complex requirements that Frans identifies.

 

Please note that the DCMI Wiki states explicitly at
http://wiki.dublincore.org/index.php/User_Guide/Publishing_Metadata#dcte
rms:accrualPolicy  that "the property may only be used with non-literal
values". The example shows the use of a blank node with a text label,
not a sting value.

 

You can always create your own vocabulary and use the term URIs as
values for the property.

 

Makx.

 

 

 

From: Frans Knibbe | Geodan [mailto:frans.knibbe@geodan.nl] 
Sent: Wednesday, May 28, 2014 1:41 PM
To: Gannon Dick
Cc: public-locadd@w3.org Mailing list; public-egov-ig@w3.org
Subject: Re: A real world example: Dutch registry of buildings and
addresses

 

On 2014-05-13 22:51, Gannon Dick wrote:

 
 
===============
       In this dataset existing (historical) data are never change, but
at
 some time more recent data might be added. Thinking about this I
 realise that this is useful information for users that really should be
       published in the metadata somehow.
 
===============
Here is the "somehow". It is a SKOS/RDF style List of timestamps,
partially processed - moving the next report date to rdf:first.  Either
DCMI  <http://purl.org/dc/terms/accrualPolicy>
<http://purl.org/dc/terms/accrualPolicy> or
<http://purl.org/dc/terms/accrualPeriodicity>
<http://purl.org/dc/terms/accrualPeriodicity> look about right for an
meta element framework.

Thank you. Those metadata elements seem to be close to covering the
information that I would like to provide. The terms deal with 'accrual',
the addition of new data to the dataset. At the moment I can't make
definitive statements about the periodicity, but perhaps the term
accrualPolicy could be used to express the fact that existing data will
not be changed when the dataset is updated.  In SQL terms, I would say
that the database can be expected to change only by INSERT, not by
UPDATE or DELETE. I don't know if if an expression exists that captures
this kind of dataset. A comment for accrualPolicy states "Recommended
best practice is to use a value from a controlled vocabulary". Could it
be that such a vocabulary exists? Otherwise, the dublin core wiki
provides an example of using a text literal
<http://wiki.dublincore.org/index.php/User_Guide/Publishing_Metadata#dct
erms:accrualPolicy> . I guess I could always resort to that. 




 
 
http://www.rustprivacy.org/2014/balance/gts/
 
A formal provenance would be nice, but this "naturally occuring" system
is much better than nothing.
 
 
Cheers,
Gannon
 
       
 
       
 
       On 2014-05-09 21:58, Gannon Dick wrote:
 
     
     
       Hi Frans,
 
 I do have a suggestion, and it will make your life much
 easier.  You have a date already.  I suggest you add a
 "Gaussian Timestamp" as a "version" as
 well.  Gauss's computation of Easter is well known. 
 Open Office has an EASTERSUNDAY() function. The calculation,
 based on harmonics, is accurate to within 3 days (One of
 them will be a Sunday).  The point is that Easter does not
 cause New Years (or Christmas).  That is the post hoc ergo
 propter hoc fallacy.  If you are a business, the
 "cause" of a quarterly report is that it is time
 for a quarterly report.  Same thing for monthly, weekly etc.
 reports.  A calendar quarter is ((365.25 x 4)/16) =
 (1461/16) = 91.3125 days long.  The fractional phase (Hour
 Angle) matters to Gauss's computation.
 
 It varies a bit from year to year.  That is not the point. 
 The point is that 53 million triples all have the same
 "birthday" so the user knows queries saved might
 need an update - however the user *does not need to ask you*
 if the data needs an update since they can figure out the
 schedule for themselves.  This does not smooth your data in
 any way.
 
 New Years             1.0000     2014-01-01T00:00:00Z
 1st Q. report         91.3125     2014-04-02T07:30:00Z
 Mid-Year             182.6250     2014-07-02T15:00:00Z
 3rd Q. report        273.9375     2014-10-01T22:30:00Z
 Annual report       365.2500     2015-01-01T06:00:00Z
 (start over)
 
 The scheme is described in less detail here (funnier though)
 [1] http://www.rustprivacy.org/2014/balance/reports/
 [2] http://www.rustprivacy.org/2014/balance/reports/StratML.pdf
 --------------------------------------------
 On Fri, 5/9/14, Frans Knibbe | Geodan  <mailto:frans.knibbe@geodan.nl>
<frans.knibbe@geodan.nl>
 wrote:
 
  Subject: A real world example: Dutch registry of buildings
 and addresses
  To:  <mailto:public-locadd@w3.orgMailinglist> "public-locadd@w3.org
 Mailing list <mailto:public-locadd@w3.orgMailinglist> "
<mailto:public-locadd@w3.org> <public-locadd@w3.org>
  Date: Friday, May 9, 2014, 10:37 AM
  
  
    
  
      
    
    
      Hello list,
  
      
  
      I have just finished (I think) a renewed publication of
  a dataset
      that could serve as a nice real world example of
  application of the
      core location vocabulary.
  
      The dataset is the Dutch registry of buildings and
  addresses. It
      consists of about 573 million triples. The URI of the
  dataset is
      http://lod.geodan.nl/basisreg/bag/.
  This URI should be enough to
      enable usage of the dataset as it should provide the
  data necessary
      for further exploration. The dataset is bilingual: all
  terms in the
      main vocabulary have explanations in Dutch and English.
  
  
      
  
      I would be happy with any comments from this group on
  this data set,
      or the associated vocabulary. I hope I have done some
  things right,
      but probably there is some room for improvement. 
  
      
  
      Anyway, I would like to list some of the issues that I
  have
      encountered that have something to do with the core
  location
      vocabulary. I would love to know what you think about
  these!
  
      
  
      About metadata: The dataset URI
(http://lod.geodan.nl/basisreg/bag/)
      resolves to dataset metadata. Because this dataset
  contains location
      data (locations, addresses, geometries) I think some
  special
      metadata are called for. 
  
      
  
      Issue 1:  I feel that it is important to let
  it be known that
      a dataset is of a geographical nature, i.e., a consumer
  could expect
      data about locations in the data. As far as I know,
  there is no well
      established way of making such a statement. For this
  dataset, I
      specified  <http://www.w3.org/ns/locn> <http://www.w3.org/ns/locn>
  as one of the main
      vocabularies used (using void:vocabulary) and I
  specified the
      spatial extent of the data (using dcterms:spatial).
  WDYT?
  
      
  
      Issue 2: Spatial Extent: The spatial extent of
  the dataset is
      specified by both a geometry and a dbpedia reference to
  the
      Netherlands. I think that is sufficient.
  
      
  
      Issue 3: CRS: I can think of no way to specify
  the CRS used
      in the data. An extension of LOCN to enable this would
  be welcome, I
      think. 
  
      
  
      Issue 4: Level of Detail / Spatial resolution:
  This would be
      applicable to the subsets (which are named graphs)
  within the
      dataset. I think that information could be useful to
  consumers, but
      I can not think of a way to express this. 
  
      
  
      About geometry:
  
      
  
      Issue 5: The geometries in the source data use
  the Dutch
      national CRS. I have transformed them to WGS84 lon/lat
  for several
      reasons:
  
      a) The triple store used (Virtuoso) does not support
  other CRSs yet
  
      b) I really do not like WKT literals with prefixed CRS
  URIs, as
      mandated by GeoSPARQL
  
      c) the CRS is more common, especially internationally
 it
  will be
      more useful.
  
      
  
      The only drawback I can think of is that this
  transformation would
      not do with very detailed geometries. Because these
 data
  are
      European, it would be better to use ETRS89. The current
  standard is
      far more useful for American data than for data from
  other
      continents!
  
      
  
      Issue 6: The publication is powered by Virtuoso
  7.1. This
      means there are capabilities for using topological
  functions in
      SPARQL. The following example asks the name of the town
  in which a
      point (which could be your current location) is
 located,
  using the
      function st_within(). The SPARQL endpoint is
      http://lod.geodan.nl/sparql,
  as specified in the metadata.
  
      
  
      prefix bag:  <http://lod.geodan.nl/vocab/bag>
<http://lod.geodan.nl/vocab/bag#>
  
      select ?name
  
      from  <http://lod.geodan.nl/basisreg/bag/woonplaats/>
<http://lod.geodan.nl/basisreg/bag/woonplaats/>
  
      where {
  
          ?wpmut a
  bag:Woonplaatsmutatie .
  
          ?wpmut bag:lastKnown
  "true"^^xsd:boolean .
  
          ?wpmut bag:geometrie ?geom
  .
  
          ?wpmut bag:naam
  ?name
  
          filter (bif:st_within(?geom,
  bif:st_point
        (6.56,53.21)))
  
      }
  
      
  
      It is not perfect yet: topological functions operate on
  bounding
      boxes of geometries, not the geometries themselves.
  Also, it is not
      yet possible to use GeoSPARQL expressions. According to
  people at
      Openlink, these issues will be resolved soon, in a next
  version of
      Virtuoso. 
  
      
  
      About application of LOCN:
  
      
  
      Issue 7: If you take a look at the vocabulary I
  made for this
      dataset (http://lod.geodan.nl/vocab/bag
      or
        http://lod.geodan.nl/vocab/bag.ttl),
 you can see
  that I tried
      to apply LOCN. Mostly, classes are defined as being
  subclasses of
      LOCN classes and properties are defined as being
  subproperties of
      LOCN properties. But without special measures, one can
  not use LOCN
      terms in SPARQL queries. The following example returns
  nothing
      because I have not created explicit triples for locn
  classes, and
      neither have I made  inference
        rules. So  I wonder if it is really
  worthwhile to use LOCN, or
      to use it in the way that I have.
  
      
  
      prefix locn:  <http://www.w3.org/ns/locn>
<http://www.w3.org/ns/locn#>
  
      select *
  
      from  <http://lod.geodan.nl/basisreg/bag/ligplaats/>
<http://lod.geodan.nl/basisreg/bag/ligplaats/>
  
      where {
  
          ?s a locn:Location
  .
  
      }
  
      
  
      Or to put in different words: what is the added value
 of
  LOCN in
      this case? And how could that added value be increased?
  
      
  
      
  
      Regards,
  
      Frans
  
      
  
      
  
      
          
          Frans Knibbe
  
          Geodan
  
          President Kennedylaan 1
  
          1079 MB Amsterdam (NL)
  
          
  
          T +31 (0)20 - 5711 347
  
          E frans.knibbe@geodan.nl <mailto:frans.knibbe@geodan.nl> 
  
          www.geodan.nl <http://www.geodan.nl>  | disclaimer
  
          
    
  
  
  
  
  
 
     
     
 
     
 
     
         
         Frans Knibbe
 
         Geodan
 
         President Kennedylaan 1
 
         1079 MB Amsterdam (NL)
 
         
 
         T +31 (0)20 - 5711 347
 
         E frans.knibbe@geodan.nl <mailto:frans.knibbe@geodan.nl> 
 
         www.geodan.nl <http://www.geodan.nl>  | disclaimer
 
         
   
 
 
 
 
 

 

  _____  

Frans Knibbe
Geodan
President Kennedylaan 1
1079 MB Amsterdam (NL)

T +31 (0)20 - 5711 347
E frans.knibbe@geodan.nl <mailto:frans.knibbe@geodan.nl> 
www.geodan.nl <http://www.geodan.nl>  | disclaimer
<http://www.geodan.nl/disclaimer> 

  _____
Received on Wednesday, 28 May 2014 12:27:36 UTC