Re: CRS specification (was: Re: ISA Core Location Vocabulary) from Kostis Kyzirakos on 2014-01-03 (public-locadd@w3.org from January 2014)

From: Kostis Kyzirakos <Kostis.Kyzirakos@cwi.nl>
Date: Fri, 3 Jan 2014 16:34:50 +0100
To: "Frans Knibbe | Geodan" <frans.knibbe@geodan.nl>
Cc: LocAdd W3C CG Public Mailing list <public-locadd@w3.org>
Message-ID: <CAJUi=VGAwvzqP6VupgA3UxOHWVft=paGrCB8bbmh=0yBvd4eYw@mail.gmail.com>
Hi,
Please find some answers inline.

Cheers,
Kostis

===================================================
Kostis E. Kyzirakos, Ph.D.
Centrum voor Wiskunde en Informatica
DB Architectures (DA)
Office L320
Science Park 123
1098 XG Amsterdam  (NL)
tel: +31 (20) 592-4039
mobile: +31 (0) 6422-95345
e-mail:  kostis@cwi.nl
===================================================


On Fri, Jan 3, 2014 at 11:48 AM, Frans Knibbe | Geodan <
frans.knibbe@geodan.nl> wrote:

>  Hello,
>
> I agree that a sequence of coordinates should be associated with a CRS. In
> my opinion, that is exactly what happens in the example I gave:
>
>
> ex1:myGeometry
>     a ex2:geometry ;
>     ex2:asWKT "POLYGON((97372 487152,97372 580407,149636 580407,149636
> 487152,97372 487152))"^^ex2:wktLiteral ;
>     ex2:CRS <http://www.opengis.net/def/crs/EPSG/0/28992><http://www.opengis.net/def/crs/EPSG/0/28992>;
>
>  This is based on the viewpoint of a geometry consisting of a sequence of
> coordinates and a CRS.  They are both properties of a geometry.
>

What happens though if you merge the following graphs:

ex1:myGeometry ex2:asWKT "POLYGON((97372 487152,97372 580407,149636
580407,149636 487152,97372 487152))"^^ex2:wktLiteral ;
                          ex2:CRS
<http://www.opengis.net/def/crs/EPSG/0/28992><http://www.opengis.net/def/crs/EPSG/0/28992>.

ex1:myGeometry ex2:asWKT " POLYGON((4.54103559631648
52.369221013436,4.52469206625503 53.2071950151372,5.30692041653441
53.210266927542,5.30843905756704 52.3722183594399,4.54103559631648
52.369221013436))"^^ex2:wktLiteral ;
                          ex2:CRS <
http://www.opengis.net/def/crs/EPSG/0/4326> .

One could argue that we can avoid such problems by using different URI for
different serializations. However, combination of different datasets
becomes problematic since constraints are introduced...


> This is somewhat similar to the decision made in NeoGeo to keep the
> coordinates separate objects.
>

This representation is excellent for specific application domains e.g.,
when computing shortest paths. IMHO the problem with this approach has to
do with querying such data in a broader domain and not only for a single
domain. How can you express queries like: find all archaeological sites
(polygon) that are within a municipality (polygon) that is neighboring with
the municipality of Athens (polygon) and are near a beach (linestring)?


> In some scenarios, it may be more convenient to model a text as an ordered
> sequence of character elements. In most cases, the text will be used as a
> whole, without any need to process the individual characters. So the first
> example is more convenient than the second. Now let us try to put the word
> "Αθήνα" in perspective by including its script and language:
>
> ex1:myFeature
>   a ex2:name ;
>   ex3:spelling  "Αθήνα"^^xsd:string ;
>   ex3:language "Greek" ;
>   ex3:script "Greek" .
>
> The extra properties provide data that are vital for the correct
> interpretation of the string in some cases. So why not put them all in the
> same literal?
>
> ex1:myFeature ex2:name "script='Greek';language='Greek';'Αθήνα'";
>

This is not a correct analogy. Because "POINT(0 0)" has no meaning on its
own, while "Αθήνα" has some meaning on its own. You can add as much
information as you want, but each RDF term has to be self-contained. You
cannot rely on a set of triples to interpret an RDF term. At least this is
what all formal treatments of RDF do!

It is easy to see that this is not the most convenient way of expressing
> the text. For example, it needs some processing before it can be used to
> form a human readable text. Similarly, I don't think it is convenient to
> put the specification of the CRS together with a coordinate sequence in the
> same literal. Here is a list of reasons why I think it is inconvenient:
>
>    1. Most (all?) current GIS software takes coordinate strings and CRS
>    specifications separately.
>
> This is not (exactly) correct. GIS software implement standards. As
Clemens also pointed out, different approaches are used in existing
standards. ESRI shape files are essentially a collection of files, one of
which describe the CRS (you cannot mix CRS in the same file). On the other
hand, GML documents, GeoSPARQL documents, KML files have this information
either hard-coded or defined at different granularities within their
content (as Clemens also pointed out).

>
>    1. It should be possible to specify the CRS at the level of a data set
>    or a collection of geometries.
>
> We should be careful here. One could argue that we should be able to
define the CRS at the level of a triple, at the level of an rdf:set, at the
level of a named graph and so on (see for example the relevant research on
provenance). I think that the simplest way to go is to define it at the
finest level of granularity which is a triple. This allows all other cases
to be covered as well. After all, RDF is not known for being laconic :D

Anyhow, WKT and GML provides geometry collections like MULTIPOINT,
MULTILINESTRING, MULTIPOLYGON etc. so you have specify a single CRS for a
geometry collection.

>
>    1. It should be made easy for storage media to index the CRS.
>
> Since I have been heavily involved in the implementation of a geospatial
RDF store (http://strabon.di.uoa.gr), I fully agree with this. Making
storage easy however, is only possible when RDF terms are self-contained.
However, when designing a vocabulary we should not be driver by
implementations.

>
>    1. It should be possible to easily select data based on CRS in SPARQL
>    queries.
>
>  GeoSPARQL and stSPARQL already provide a function for doing so.

>
>    1. Having multiple specifications of the same CRS for a single
>    geometry should be possible.
>    2. Having multiple specifications of the same coordinate sequence for
>    a single geometry should be possible.
>
> I understand that is is desirable to use multiple serializations that use
different CRS, but I do not understand what exactly you mean here. Can you
elaborate on this?

>
>    1. There should not be a single authority for specifying CRS's
>    (especially if we want specifications to last until the sun goes nova).
>
> +1. This is why using just a URI is a good choice!

>
>    1. Next to CRS there are other geometry properties that could be
>    important, like level of detail. Do they need to be put in the same literal
>    too? That would make things even messier.
>
> Do you need this information to interpret the coordinates?

>
>    1. It should be possible to select only the CRS or only the
>    coordinates (for specific use cases).
>
> GeoSPARQL and stSPARQL offers a function for selecting the CRS of a
geometry, and you can use a simple regex to get just the coordinates. As
you say, this is for specific use cases, so why reinventing the wheel and
not use existing standards?

>
>    1. Processing the coordinates requires removing the CRS specification
>    from the string, which is undesirable extra processing.
>
> On the contrary! When storing a spatial literal, you start by reading the
CRS, create the appropriate precision model, and then use a WKT parser for
example for the rest. If you have to keep in memory or secondary storage
triples until all required parts are gathered things can get extremely
costly very easily (for example performance will vary according to the
ordering of the triples within a file!).

>
>    1. It should be possible to make statements about the CRS.
>    2. It should be possible to dereference a CRS.
>
> I fully agree with you, but I think that this is not required in the
context of this group. This should be the work of another group, since many
vocabularies for representing CRS have to be unified, which is definitely
not an easy task (and definitely out of scope for this group).


>  And I am quite sure this list is not exhaustive. Is it possible to have
> an overview of advantages of concatenating the CRS and the coordinates?
>

The most important aspect of having the CRS inside a spatial literal, is
that it relies on RDF and nothing more thus making it a good choice for the
linked open data world.

What you propose has to be encoded as an OWL 2 ontology with the
appropriate cardinality restrictions, and then the users are forced to
adopt this ontology. The linked data story so far (even though I do not
entirely agree with this :) ) has shown that the only way to achieve
adoption is by enforcing as few restrictions as possible.


> Put in a more general perspective: If geographical data are going to exist
> in the web of Linked Data, it is good to depart from historical constructs
> if better solutions are on offer. Using a URI to identify a CRS is a very
> good step in the rights direction. But why make it impossible to use that
> same URI in an RDF triple?
>

Linked geospatial data are already here (one way or another) :) For
example, a few months ago we published more than 100GB of linked geospatial
data just by publishing two datasets (http://datahub.io/organization/teleios).
I fully agree with you that we should not reinvent the wheel. That's why I
am arguing in favor of using existing standards for this. So far, I have
not been convinced that functionality-wise there is something missing from
existing standards like GeoSPARQL. We have pointed out some small issues
here and there (spatial aggregates, a transformation function) or some more
important issues (temporal dimension of spatial data), but I am not
convinced that there is something fundamentally wrong with it. If we follow
the evolution of historical constructs, we can see that OGC started by
separating serilalizations from CRS (however it defined precisely a
mechanism for associating CRS and geometries), and then moved on and
combined them (GML and GeoSPARQL). I agree that it would be nice to be able
to define the CRS for a set of geographic features, but this would mean
that we should define an OWL 2 ontology (similarly to schemas defined in
GML) for this purpose, thus shooting our selves in the foot regarding the
adoption of the proposed vocabulary.
Received on Friday, 3 January 2014 15:35:45 UTC