- From: Gannon Dick <gannon_dick@yahoo.com>
- Date: Tue, 13 May 2014 13:51:54 -0700 (PDT)
- To: "public-locadd@w3.org Mailing list" <public-locadd@w3.org>, Frans Knibbe | Geodan <frans.knibbe@geodan.nl>
- Cc: public-egov-ig@w3.org
--------------------------------------------
On Tue, 5/13/14, Frans Knibbe | Geodan <frans.knibbe@geodan.nl> wrote:
Subject: Re: A real world example: Dutch registry of buildings and addresses
The dates that are recorded in the dataset that I
described are administrative dates, they are not dates that record
the occurrence of some natural phenomenon. They can be
viewed as a date stamp that a civil servant stamps on a document
at some time during a work day.
===============
Yes, what I would propose is that all of these "date records" (the detritus of Bureaucracy) have the common meta property that they are already grouped in one quarter, or week or year. That list of report dates can easily be determined in advance and the next one served with the data set.
The fractional "day number" I use is a (day of the year) DOT (phase of the year), not (day of the year).(time of day) ... because time of day is decoupled as in "If we had only had the Annual Meeting an hour later, the Stockholders would have been much happier we lost all their money".
If you want to make meta data "seasonally aware" you have to help it along.
===============
In this dataset existing (historical) data are never change, but at
some time more recent data might be added. Thinking about this I
realise that this is useful information for users that really should be
published in the metadata somehow.
===============
Here is the "somehow". It is a SKOS/RDF style List of timestamps, partially processed - moving the next report date to rdf:first. Either DCMI <http://purl.org/dc/terms/accrualPolicy> or <http://purl.org/dc/terms/accrualPeriodicity> look about right for an meta element framework.
http://www.rustprivacy.org/2014/balance/gts/
A formal provenance would be nice, but this "naturally occuring" system is much better than nothing.
Cheers,
Gannon
On 2014-05-09 21:58, Gannon Dick wrote:
Hi Frans,
I do have a suggestion, and it will make your life much
easier. You have a date already. I suggest you add a
"Gaussian Timestamp" as a "version" as
well. Gauss's computation of Easter is well known.
Open Office has an EASTERSUNDAY() function. The calculation,
based on harmonics, is accurate to within 3 days (One of
them will be a Sunday). The point is that Easter does not
cause New Years (or Christmas). That is the post hoc ergo
propter hoc fallacy. If you are a business, the
"cause" of a quarterly report is that it is time
for a quarterly report. Same thing for monthly, weekly etc.
reports. A calendar quarter is ((365.25 x 4)/16) =
(1461/16) = 91.3125 days long. The fractional phase (Hour
Angle) matters to Gauss's computation.
It varies a bit from year to year. That is not the point.
The point is that 53 million triples all have the same
"birthday" so the user knows queries saved might
need an update - however the user *does not need to ask you*
if the data needs an update since they can figure out the
schedule for themselves. This does not smooth your data in
any way.
New Years 1.0000 2014-01-01T00:00:00Z
1st Q. report 91.3125 2014-04-02T07:30:00Z
Mid-Year 182.6250 2014-07-02T15:00:00Z
3rd Q. report 273.9375 2014-10-01T22:30:00Z
Annual report 365.2500 2015-01-01T06:00:00Z
(start over)
The scheme is described in less detail here (funnier though)
[1] http://www.rustprivacy.org/2014/balance/reports/
[2] http://www.rustprivacy.org/2014/balance/reports/StratML.pdf
--------------------------------------------
On Fri, 5/9/14, Frans Knibbe | Geodan <frans.knibbe@geodan.nl>
wrote:
Subject: A real world example: Dutch registry of buildings
and addresses
To: "public-locadd@w3.org
Mailing list" <public-locadd@w3.org>
Date: Friday, May 9, 2014, 10:37 AM
Hello list,
I have just finished (I think) a renewed publication of
a dataset
that could serve as a nice real world example of
application of the
core location vocabulary.
The dataset is the Dutch registry of buildings and
addresses. It
consists of about 573 million triples. The URI of the
dataset is
http://lod.geodan.nl/basisreg/bag/.
This URI should be enough to
enable usage of the dataset as it should provide the
data necessary
for further exploration. The dataset is bilingual: all
terms in the
main vocabulary have explanations in Dutch and English.
I would be happy with any comments from this group on
this data set,
or the associated vocabulary. I hope I have done some
things right,
but probably there is some room for improvement.
Anyway, I would like to list some of the issues that I
have
encountered that have something to do with the core
location
vocabulary. I would love to know what you think about
these!
About metadata: The dataset URI (http://lod.geodan.nl/basisreg/bag/)
resolves to dataset metadata. Because this dataset
contains location
data (locations, addresses, geometries) I think some
special
metadata are called for.
Issue 1: I feel that it is important to let
it be known that
a dataset is of a geographical nature, i.e., a consumer
could expect
data about locations in the data. As far as I know,
there is no well
established way of making such a statement. For this
dataset, I
specified <http://www.w3.org/ns/locn>
as one of the main
vocabularies used (using void:vocabulary) and I
specified the
spatial extent of the data (using dcterms:spatial).
WDYT?
Issue 2: Spatial Extent: The spatial extent of
the dataset is
specified by both a geometry and a dbpedia reference to
the
Netherlands. I think that is sufficient.
Issue 3: CRS: I can think of no way to specify
the CRS used
in the data. An extension of LOCN to enable this would
be welcome, I
think.
Issue 4: Level of Detail / Spatial resolution:
This would be
applicable to the subsets (which are named graphs)
within the
dataset. I think that information could be useful to
consumers, but
I can not think of a way to express this.
About geometry:
Issue 5: The geometries in the source data use
the Dutch
national CRS. I have transformed them to WGS84 lon/lat
for several
reasons:
a) The triple store used (Virtuoso) does not support
other CRSs yet
b) I really do not like WKT literals with prefixed CRS
URIs, as
mandated by GeoSPARQL
c) the CRS is more common, especially internationally
it
will be
more useful.
The only drawback I can think of is that this
transformation would
not do with very detailed geometries. Because these
data
are
European, it would be better to use ETRS89. The current
standard is
far more useful for American data than for data from
other
continents!
Issue 6: The publication is powered by Virtuoso
7.1. This
means there are capabilities for using topological
functions in
SPARQL. The following example asks the name of the town
in which a
point (which could be your current location) is
located,
using the
function st_within(). The SPARQL endpoint is
http://lod.geodan.nl/sparql,
as specified in the metadata.
prefix bag: <http://lod.geodan.nl/vocab/bag#>
select ?name
from <http://lod.geodan.nl/basisreg/bag/woonplaats/>
where {
?wpmut a
bag:Woonplaatsmutatie .
?wpmut bag:lastKnown
"true"^^xsd:boolean .
?wpmut bag:geometrie ?geom
.
?wpmut bag:naam
?name
filter (bif:st_within(?geom,
bif:st_point
(6.56,53.21)))
}
It is not perfect yet: topological functions operate on
bounding
boxes of geometries, not the geometries themselves.
Also, it is not
yet possible to use GeoSPARQL expressions. According to
people at
Openlink, these issues will be resolved soon, in a next
version of
Virtuoso.
About application of LOCN:
Issue 7: If you take a look at the vocabulary I
made for this
dataset (http://lod.geodan.nl/vocab/bag
or
http://lod.geodan.nl/vocab/bag.ttl),
you can see
that I tried
to apply LOCN. Mostly, classes are defined as being
subclasses of
LOCN classes and properties are defined as being
subproperties of
LOCN properties. But without special measures, one can
not use LOCN
terms in SPARQL queries. The following example returns
nothing
because I have not created explicit triples for locn
classes, and
neither have I made inference
rules. So I wonder if it is really
worthwhile to use LOCN, or
to use it in the way that I have.
prefix locn: <http://www.w3.org/ns/locn#>
select *
from <http://lod.geodan.nl/basisreg/bag/ligplaats/>
where {
?s a locn:Location
.
}
Or to put in different words: what is the added value
of
LOCN in
this case? And how could that added value be increased?
Regards,
Frans
Frans Knibbe
Geodan
President Kennedylaan 1
1079 MB Amsterdam (NL)
T +31 (0)20 - 5711 347
E frans.knibbe@geodan.nl
www.geodan.nl | disclaimer
Frans Knibbe
Geodan
President Kennedylaan 1
1079 MB Amsterdam (NL)
T +31 (0)20 - 5711 347
E frans.knibbe@geodan.nl
www.geodan.nl | disclaimer
Received on Tuesday, 13 May 2014 20:52:22 UTC