Re: A real world example: Dutch registry of buildings and addresses

On 2014-05-13 22:51, Gannon Dick wrote:
>
> ===============
>         In this dataset existing (historical) data are never change, but at
>   some time more recent data might be added. Thinking about this I
>   realise that this is useful information for users that really should be
>         published in the metadata somehow.
>
> ===============
> Here is the "somehow". It is a SKOS/RDF style List of timestamps, partially processed - moving the next report date to rdf:first.  Either DCMI <http://purl.org/dc/terms/accrualPolicy> or <http://purl.org/dc/terms/accrualPeriodicity> look about right for an meta element framework.
Thank you. Those metadata elements seem to be close to covering the 
information that I would like to provide. The terms deal with 'accrual', 
the addition of new data to the dataset. At the moment I can't make 
definitive statements about the periodicity, but perhaps the term 
accrualPolicy could be used to express the fact that existing data will 
not be changed when the dataset is updated.  In SQL terms, I would say 
that the database can be expected to change only by INSERT, not by 
UPDATE or DELETE. I don't know if if an expression exists that captures 
this kind of dataset. A comment for accrualPolicy states "Recommended 
best practice is to use a value from a controlled vocabulary". Could it 
be that such a vocabulary exists? Otherwise, the dublin core wiki 
providesan example of using a text literal 
<http://wiki.dublincore.org/index.php/User_Guide/Publishing_Metadata#dcterms:accrualPolicy>. 
I guess I could always resort to that.

>
> http://www.rustprivacy.org/2014/balance/gts/
>
> A formal provenance would be nice, but this "naturally occuring" system is much better than nothing.
>
>   
> Cheers,
> Gannon
>   
>         
>   
>         
>   
>         On 2014-05-09 21:58, Gannon Dick wrote:
>   
>       
>       
>         Hi Frans,
>   
>   I do have a suggestion, and it will make your life much
>   easier.  You have a date already.  I suggest you add a
>   "Gaussian Timestamp" as a "version" as
>   well.  Gauss's computation of Easter is well known.
>   Open Office has an EASTERSUNDAY() function. The calculation,
>   based on harmonics, is accurate to within 3 days (One of
>   them will be a Sunday).  The point is that Easter does not
>   cause New Years (or Christmas).  That is the post hoc ergo
>   propter hoc fallacy.  If you are a business, the
>   "cause" of a quarterly report is that it is time
>   for a quarterly report.  Same thing for monthly, weekly etc.
>   reports.  A calendar quarter is ((365.25 x 4)/16) =
>   (1461/16) = 91.3125 days long.  The fractional phase (Hour
>   Angle) matters to Gauss's computation.
>   
>   It varies a bit from year to year.  That is not the point.
>   The point is that 53 million triples all have the same
>   "birthday" so the user knows queries saved might
>   need an update - however the user *does not need to ask you*
>   if the data needs an update since they can figure out the
>   schedule for themselves.  This does not smooth your data in
>   any way.
>   
>   New Years             1.0000     2014-01-01T00:00:00Z
>   1st Q. report         91.3125     2014-04-02T07:30:00Z
>   Mid-Year             182.6250     2014-07-02T15:00:00Z
>   3rd Q. report        273.9375     2014-10-01T22:30:00Z
>   Annual report       365.2500     2015-01-01T06:00:00Z
>   (start over)
>   
>   The scheme is described in less detail here (funnier though)
>   [1] http://www.rustprivacy.org/2014/balance/reports/
>   [2] http://www.rustprivacy.org/2014/balance/reports/StratML.pdf
>   --------------------------------------------
>   On Fri, 5/9/14, Frans Knibbe | Geodan <frans.knibbe@geodan.nl>
>   wrote:
>   
>    Subject: A real world example: Dutch registry of buildings
>   and addresses
>    To: "public-locadd@w3.org
>   Mailing list" <public-locadd@w3.org>
>    Date: Friday, May 9, 2014, 10:37 AM
>    
>    
>      
>    
>        
>      
>      
>        Hello list,
>    
>        
>    
>        I have just finished (I think) a renewed publication of
>    a dataset
>        that could serve as a nice real world example of
>    application of the
>        core location vocabulary.
>    
>        The dataset is the Dutch registry of buildings and
>    addresses. It
>        consists of about 573 million triples. The URI of the
>    dataset is
>        http://lod.geodan.nl/basisreg/bag/.
>    This URI should be enough to
>        enable usage of the dataset as it should provide the
>    data necessary
>        for further exploration. The dataset is bilingual: all
>    terms in the
>        main vocabulary have explanations in Dutch and English.
>    
>    
>        
>    
>        I would be happy with any comments from this group on
>    this data set,
>        or the associated vocabulary. I hope I have done some
>    things right,
>        but probably there is some room for improvement.
>    
>        
>    
>        Anyway, I would like to list some of the issues that I
>    have
>        encountered that have something to do with the core
>    location
>        vocabulary. I would love to know what you think about
>    these!
>    
>        
>    
>        About metadata: The dataset URI (http://lod.geodan.nl/basisreg/bag/)
>        resolves to dataset metadata. Because this dataset
>    contains location
>        data (locations, addresses, geometries) I think some
>    special
>        metadata are called for.
>    
>        
>    
>        Issue 1:  I feel that it is important to let
>    it be known that
>        a dataset is of a geographical nature, i.e., a consumer
>    could expect
>        data about locations in the data. As far as I know,
>    there is no well
>        established way of making such a statement. For this
>    dataset, I
>        specified <http://www.w3.org/ns/locn>
>    as one of the main
>        vocabularies used (using void:vocabulary) and I
>    specified the
>        spatial extent of the data (using dcterms:spatial).
>    WDYT?
>    
>        
>    
>        Issue 2: Spatial Extent: The spatial extent of
>    the dataset is
>        specified by both a geometry and a dbpedia reference to
>    the
>        Netherlands. I think that is sufficient.
>    
>        
>    
>        Issue 3: CRS: I can think of no way to specify
>    the CRS used
>        in the data. An extension of LOCN to enable this would
>    be welcome, I
>        think.
>    
>        
>    
>        Issue 4: Level of Detail / Spatial resolution:
>    This would be
>        applicable to the subsets (which are named graphs)
>    within the
>        dataset. I think that information could be useful to
>    consumers, but
>        I can not think of a way to express this.
>    
>        
>    
>        About geometry:
>    
>        
>    
>        Issue 5: The geometries in the source data use
>    the Dutch
>        national CRS. I have transformed them to WGS84 lon/lat
>    for several
>        reasons:
>    
>        a) The triple store used (Virtuoso) does not support
>    other CRSs yet
>    
>        b) I really do not like WKT literals with prefixed CRS
>    URIs, as
>        mandated by GeoSPARQL
>    
>        c) the CRS is more common, especially internationally
>   it
>    will be
>        more useful.
>    
>        
>    
>        The only drawback I can think of is that this
>    transformation would
>        not do with very detailed geometries. Because these
>   data
>    are
>        European, it would be better to use ETRS89. The current
>    standard is
>        far more useful for American data than for data from
>    other
>        continents!
>    
>        
>    
>        Issue 6: The publication is powered by Virtuoso
>    7.1. This
>        means there are capabilities for using topological
>    functions in
>        SPARQL. The following example asks the name of the town
>    in which a
>        point (which could be your current location) is
>   located,
>    using the
>        function st_within(). The SPARQL endpoint is
>        http://lod.geodan.nl/sparql,
>    as specified in the metadata.
>    
>        
>    
>        prefix bag: <http://lod.geodan.nl/vocab/bag#>
>    
>        select ?name
>    
>        from <http://lod.geodan.nl/basisreg/bag/woonplaats/>
>    
>        where {
>    
>            ?wpmut a
>    bag:Woonplaatsmutatie .
>    
>            ?wpmut bag:lastKnown
>    "true"^^xsd:boolean .
>    
>            ?wpmut bag:geometrie ?geom
>    .
>    
>            ?wpmut bag:naam
>    ?name
>    
>            filter (bif:st_within(?geom,
>    bif:st_point
>          (6.56,53.21)))
>    
>        }
>    
>        
>    
>        It is not perfect yet: topological functions operate on
>    bounding
>        boxes of geometries, not the geometries themselves.
>    Also, it is not
>        yet possible to use GeoSPARQL expressions. According to
>    people at
>        Openlink, these issues will be resolved soon, in a next
>    version of
>        Virtuoso.
>    
>        
>    
>        About application of LOCN:
>    
>        
>    
>        Issue 7: If you take a look at the vocabulary I
>    made for this
>        dataset (http://lod.geodan.nl/vocab/bag
>        or
>          http://lod.geodan.nl/vocab/bag.ttl),
>   you can see
>    that I tried
>        to apply LOCN. Mostly, classes are defined as being
>    subclasses of
>        LOCN classes and properties are defined as being
>    subproperties of
>        LOCN properties. But without special measures, one can
>    not use LOCN
>        terms in SPARQL queries. The following example returns
>    nothing
>        because I have not created explicit triples for locn
>    classes, and
>        neither have I made  inference
>          rules. So  I wonder if it is really
>    worthwhile to use LOCN, or
>        to use it in the way that I have.
>    
>        
>    
>        prefix locn: <http://www.w3.org/ns/locn#>
>    
>        select *
>    
>        from <http://lod.geodan.nl/basisreg/bag/ligplaats/>
>    
>        where {
>    
>            ?s a locn:Location
>    .
>    
>        }
>    
>        
>    
>        Or to put in different words: what is the added value
>   of
>    LOCN in
>        this case? And how could that added value be increased?
>    
>        
>    
>        
>    
>        Regards,
>    
>        Frans
>    
>        
>    
>        
>    
>        
>            
>            Frans Knibbe
>    
>            Geodan
>    
>            President Kennedylaan 1
>    
>            1079 MB Amsterdam (NL)
>    
>            
>    
>            T +31 (0)20 - 5711 347
>    
>            E frans.knibbe@geodan.nl
>    
>            www.geodan.nl | disclaimer
>    
>            
>      
>    
>    
>    
>    
>    
>   
>       
>       
>   
>       
>   
>       
>           
>           Frans Knibbe
>   
>           Geodan
>   
>           President Kennedylaan 1
>   
>           1079 MB Amsterdam (NL)
>   
>           
>   
>           T +31 (0)20 - 5711 347
>   
>           E frans.knibbe@geodan.nl
>   
>           www.geodan.nl | disclaimer
>   
>           
>     
>   
>   
>   
>   
>   


------------------------------------------------------------------------
Frans Knibbe
Geodan
President Kennedylaan 1
1079 MB Amsterdam (NL)

T +31 (0)20 - 5711 347
E frans.knibbe@geodan.nl
www.geodan.nl <http://www.geodan.nl> | disclaimer 
<http://www.geodan.nl/disclaimer>
------------------------------------------------------------------------

Received on Wednesday, 28 May 2014 11:41:16 UTC