Re: A real world example: Dutch registry of buildings and addresses from Gannon Dick on 2014-05-09 (public-egov-ig@w3.org from May 2014)

From: Gannon Dick <gannon_dick@yahoo.com>
Date: Fri, 9 May 2014 12:58:58 -0700 (PDT)
To: "public-locadd@w3.org Mailing list" <public-locadd@w3.org>, Frans Knibbe | Geodan <frans.knibbe@geodan.nl>
Cc: public-egov-ig@w3.org
Message-ID: <1399665538.26670.YahooMailBasic@web122903.mail.ne1.yahoo.com>

Hi Frans,

I do have a suggestion, and it will make your life much easier. You have a date already. I suggest you add a "Gaussian Timestamp" as a "version" as well. Gauss's computation of Easter is well known. Open Office has an EASTERSUNDAY() function. The calculation, based on harmonics, is accurate to within 3 days (One of them will be a Sunday). The point is that Easter does not cause New Years (or Christmas). That is the post hoc ergo propter hoc fallacy. If you are a business, the "cause" of a quarterly report is that it is time for a quarterly report. Same thing for monthly, weekly etc. reports. A calendar quarter is ((365.25 x 4)/16) = (1461/16) = 91.3125 days long. The fractional phase (Hour Angle) matters to Gauss's computation.

It varies a bit from year to year. That is not the point. The point is that 53 million triples all have the same "birthday" so the user knows queries saved might need an update - however the user *does not need to ask you* if the data needs an update since they can figure out the schedule for themselves. This does not smooth your data in any way.

New Years 1.0000 2014-01-01T00:00:00Z
1st Q. report 91.3125 2014-04-02T07:30:00Z
Mid-Year 182.6250 2014-07-02T15:00:00Z
3rd Q. report 273.9375 2014-10-01T22:30:00Z
Annual report 365.2500 2015-01-01T06:00:00Z
(start over)

The scheme is described in less detail here (funnier though)
[1] http://www.rustprivacy.org/2014/balance/reports/
[2] http://www.rustprivacy.org/2014/balance/reports/StratML.pdf
--------------------------------------------
On Fri, 5/9/14, Frans Knibbe | Geodan <frans.knibbe@geodan.nl> wrote:

Subject: A real world example: Dutch registry of buildings and addresses
To: "public-locadd@w3.org Mailing list" <public-locadd@w3.org>
Date: Friday, May 9, 2014, 10:37 AM

Hello list,

I have just finished (I think) a renewed publication of
a dataset
that could serve as a nice real world example of
application of the
core location vocabulary.

The dataset is the Dutch registry of buildings and
addresses. It
consists of about 573 million triples. The URI of the
dataset is
http://lod.geodan.nl/basisreg/bag/.
This URI should be enough to
enable usage of the dataset as it should provide the
data necessary
for further exploration. The dataset is bilingual: all
terms in the
main vocabulary have explanations in Dutch and English.

I would be happy with any comments from this group on
this data set,
or the associated vocabulary. I hope I have done some
things right,
but probably there is some room for improvement.

Anyway, I would like to list some of the issues that I
have
encountered that have something to do with the core
location
vocabulary. I would love to know what you think about
these!

About metadata: The dataset URI (http://lod.geodan.nl/basisreg/bag/)
resolves to dataset metadata. Because this dataset
contains location
data (locations, addresses, geometries) I think some
special
metadata are called for.

Issue 1: I feel that it is important to let
it be known that
a dataset is of a geographical nature, i.e., a consumer
could expect
data about locations in the data. As far as I know,
there is no well
established way of making such a statement. For this
dataset, I
specified <http://www.w3.org/ns/locn>
as one of the main
vocabularies used (using void:vocabulary) and I
specified the
spatial extent of the data (using dcterms:spatial).
WDYT?

Issue 2: Spatial Extent: The spatial extent of
the dataset is
specified by both a geometry and a dbpedia reference to
the
Netherlands. I think that is sufficient.

Issue 3: CRS: I can think of no way to specify
the CRS used
in the data. An extension of LOCN to enable this would
be welcome, I
think.

Issue 4: Level of Detail / Spatial resolution:
This would be
applicable to the subsets (which are named graphs)
within the
dataset. I think that information could be useful to
consumers, but
I can not think of a way to express this.

About geometry:

Issue 5: The geometries in the source data use
the Dutch
national CRS. I have transformed them to WGS84 lon/lat
for several
reasons:

a) The triple store used (Virtuoso) does not support
other CRSs yet

b) I really do not like WKT literals with prefixed CRS
URIs, as
mandated by GeoSPARQL

c) the CRS is more common, especially internationally it
will be
more useful.

The only drawback I can think of is that this
transformation would
not do with very detailed geometries. Because these data
are
European, it would be better to use ETRS89. The current
standard is
far more useful for American data than for data from
other
continents!

Issue 6: The publication is powered by Virtuoso
7.1. This
means there are capabilities for using topological
functions in
SPARQL. The following example asks the name of the town
in which a
point (which could be your current location) is located,
using the
function st_within(). The SPARQL endpoint is
http://lod.geodan.nl/sparql,
as specified in the metadata.

prefix bag: <http://lod.geodan.nl/vocab/bag#>

select ?name

from <http://lod.geodan.nl/basisreg/bag/woonplaats/>

where {

?wpmut a
bag:Woonplaatsmutatie .

?wpmut bag:lastKnown
"true"^^xsd:boolean .

?wpmut bag:geometrie ?geom
.

?wpmut bag:naam
?name

filter (bif:st_within(?geom,
bif:st_point
(6.56,53.21)))

}

It is not perfect yet: topological functions operate on
bounding
boxes of geometries, not the geometries themselves.
Also, it is not
yet possible to use GeoSPARQL expressions. According to
people at
Openlink, these issues will be resolved soon, in a next
version of
Virtuoso.

About application of LOCN:

Issue 7: If you take a look at the vocabulary I
made for this
dataset (http://lod.geodan.nl/vocab/bag
or
http://lod.geodan.nl/vocab/bag.ttl), you can see
that I tried
to apply LOCN. Mostly, classes are defined as being
subclasses of
LOCN classes and properties are defined as being
subproperties of
LOCN properties. But without special measures, one can
not use LOCN
terms in SPARQL queries. The following example returns
nothing
because I have not created explicit triples for locn
classes, and
neither have I made inference
rules. So I wonder if it is really
worthwhile to use LOCN, or
to use it in the way that I have.

prefix locn: <http://www.w3.org/ns/locn#>

select *

from <http://lod.geodan.nl/basisreg/bag/ligplaats/>

where {

?s a locn:Location
.

}

Or to put in different words: what is the added value of
LOCN in
this case? And how could that added value be increased?

Regards,

Frans

Frans Knibbe

Geodan

President Kennedylaan 1

1079 MB Amsterdam (NL)

T +31 (0)20 - 5711 347

E frans.knibbe@geodan.nl

www.geodan.nl | disclaimer

Received on Friday, 9 May 2014 19:59:27 UTC