Re: CRS specification (was: Re: ISA Core Location Vocabulary) from Frans Knibbe | Geodan on 2014-01-03 (public-locadd@w3.org from January 2014)

From: Frans Knibbe | Geodan <frans.knibbe@geodan.nl>
Date: Fri, 03 Jan 2014 18:21:38 +0100
To: Kostis Kyzirakos <Kostis.Kyzirakos@cwi.nl>
CC: LocAdd W3C CG Public Mailing list <public-locadd@w3.org>
Message-ID: <52C6F1A2.50000@geodan.nl>
Hello Kostis,

Thank you for your elaborate reply! I am afraid I will have to answer 
inline too...

Regards,
Frans

On 2014-01-03 16:34, Kostis Kyzirakos wrote:
> Hi,
> Please find some answers inline.
>
> Cheers,
> Kostis
>
> ===================================================
> Kostis E. Kyzirakos, Ph.D.
> Centrum voor Wiskunde en Informatica
> DB Architectures (DA)
> Office L320
> Science Park 123
> 1098 XG Amsterdam  (NL)
> tel: +31 (20) 592-4039
> mobile: +31 (0) 6422-95345
> e-mail: kostis@cwi.nl <mailto:kostis@cwi.nl>
> ===================================================
>
>
> On Fri, Jan 3, 2014 at 11:48 AM, Frans Knibbe | Geodan 
> <frans.knibbe@geodan.nl <mailto:frans.knibbe@geodan.nl>> wrote:
>
>     Hello,
>
>     I agree that a sequence of coordinates should be associated with a
>     CRS. In my opinion, that is exactly what happens in the example I
>     gave:
>
>
>     ex1:myGeometry
>         a ex2:geometry ;
>         ex2:asWKT "POLYGON((97372 487152,97372 580407,149636
>     580407,149636 487152,97372 487152))"^^ex2:wktLiteral ;
>         ex2:CRS <http://www.opengis.net/def/crs/EPSG/0/28992>
>     <http://www.opengis.net/def/crs/EPSG/0/28992> ;
>
>     This is based on the viewpoint of a geometry consisting of a
>     sequence of coordinates and a CRS. They are both properties of a
>     geometry.
>
>
> What happens though if you merge the following graphs:
>
> ex1:myGeometry ex2:asWKT "POLYGON((97372 487152,97372 580407,149636 
> 580407,149636 487152,97372 487152))"^^ex2:wktLiteral ;
>                           ex2:CRS 
> <http://www.opengis.net/def/crs/EPSG/0/28992> 
> <http://www.opengis.net/def/crs/EPSG/0/28992> .
>
> ex1:myGeometry ex2:asWKT " POLYGON((4.54103559631648 
> 52.369221013436,4.52469206625503 53.2071950151372,5.30692041653441 
> 53.210266927542,5.30843905756704 52.3722183594399,4.54103559631648 
> 52.369221013436))"^^ex2:wktLiteral ;
>                           ex2:CRS 
> <http://www.opengis.net/def/crs/EPSG/0/4326> .
>
> One could argue that we can avoid such problems by using different URI 
> for different serializations. However, combination of different 
> datasets becomes problematic since constraints are introduced...
I would say the two geometries in your examples are different 
geometries. So they can not be merged (have the same URI). Could you 
explain how a combination of different datasets becomes problematic in 
that case?

>     This is somewhat similar to the decision made in NeoGeo to keep
>     the coordinates separate objects.
>
>
> This representation is excellent for specific application domains 
> e.g., when computing shortest paths. IMHO the problem with this 
> approach has to do with querying such data in a broader domain and not 
> only for a single domain. How can you express queries like: find all 
> archaeological sites (polygon) that are within a municipality 
> (polygon) that is neighboring with the municipality of Athens 
> (polygon) and are near a beach (linestring)?
I was not trying to say that the NeoGeo way is better. For the majority 
of use cases making the coordinate sequence one element is fine. I was 
just mentioning NeoGeo to illustrate that the decision to define a 
certain basic building block can be quite arbitrary.
>
>     In some scenarios, it may be more convenient to model a text as an
>     ordered sequence of character elements. In most cases, the text
>     will be used as a whole, without any need to process the
>     individual characters. So the first example is more convenient
>     than the second. Now let us try to put the word "Αθήνα" in
>     perspective by including its script and language:
>
>     ex1:myFeature
>       a ex2:name ;
>       ex3:spelling  "Αθήνα"^^xsd:string ;
>       ex3:language "Greek" ;
>       ex3:script "Greek" .
>
>     The extra properties provide data that are vital for the correct
>     interpretation of the string in some cases. So why not put them
>     all in the same literal?
>
>     ex1:myFeature ex2:name "script='Greek';language='Greek';'Αθήνα'";
>
>
> This is not a correct analogy. Because "POINT(0 0)" has no meaning on 
> its own, while "Αθήνα" has some meaning on its own. You can add as 
> much information as you want, but each RDF term has to be 
> self-contained. You cannot rely on a set of triples to interpret an 
> RDF term. At least this is what all formal treatments of RDF do!
The string "POINT(0 0)" does have meaning. You can tell it is a point 
with known coordinates. You just don't know the CRS, so you can not draw 
the point on a map, for instance. Similarly, if you would like to use 
the word "Αθήνα" in a text document things would probably also go wrong 
because vital data are missing.

>
>     It is easy to see that this is not the most convenient way of
>     expressing the text. For example, it needs some processing before
>     it can be used to form a human readable text. Similarly, I don't
>     think it is convenient to put the specification of the CRS
>     together with a coordinate sequence in the same literal. Here is a
>     list of reasons why I think it is inconvenient:
>
>      1. Most (all?) current GIS software takes coordinate strings and
>         CRS specifications separately.
>
> This is not (exactly) correct. GIS software implement standards. As 
> Clemens also pointed out, different approaches are used in existing 
> standards. ESRI shape files are essentially a collection of files, one 
> of which describe the CRS (you cannot mix CRS in the same file). On 
> the other hand, GML documents, GeoSPARQL documents, KML files have 
> this information either hard-coded or defined at different 
> granularities within their content (as Clemens also pointed out).
Let's take the example of drawing geographical data on a map. In all 
software that I know of, the CRS is specified for the map as a whole, 
not for individual geometries. Or take the example of storing 
geographical data in a database table. The CRS is usually specified at 
the column level.
>
>      1. It should be possible to specify the CRS at the level of a
>         data set or a collection of geometries.
>
> We should be careful here. One could argue that we should be able to 
> define the CRS at the level of a triple, at the level of an rdf:set, 
> at the level of a named graph and so on (see for example the relevant 
> research on provenance). I think that the simplest way to go is to 
> define it at the finest level of granularity which is a triple. This 
> allows all other cases to be covered as well. After all, RDF is not 
> known for being laconic :D
So that means the CRS should be a separate URI, right?
>
> Anyhow, WKT and GML provides geometry collections like MULTIPOINT, 
> MULTILINESTRING, MULTIPOLYGON etc. so you have specify a single CRS 
> for a geometry collection.
Yes, but those are fixed collections. I think it should be possible to 
assign a CRS to dynamic collections, like the result set of a query for 
instance. And I think it is important to be able to assign the CRS at 
the level of a dataset (things like void:Dataset or dcat:Dataset).
>
>      1. It should be made easy for storage media to index the CRS.
>
> Since I have been heavily involved in the implementation of a 
> geospatial RDF store (http://strabon.di.uoa.gr), I fully agree with 
> this. Making storage easy however, is only possible when RDF terms are 
> self-contained. However, when designing a vocabulary we should not be 
> driver by implementations.
True, but if it is all the same, it doesn't hurt thinking about the 
developers who have to make things work. Anyway, this argument is 
similar to others, in that if we treat data as expected, i.e. as a URI 
instead of part of a string, it will fit much better in the the existing 
world of Linked Data.

>      1. It should be possible to easily select data based on CRS in
>         SPARQL queries.
>
>  GeoSPARQL and stSPARQL already provide a function for doing so.
True, but I consider that a workaround. And it is one thing to have a 
specification, and another one to have it working. A getsrid function 
would need to be implemented by all SPARQL software. A plain URI works 
right away.
>
>      1. Having multiple specifications of the same CRS for a single
>         geometry should be possible.
>      2. Having multiple specifications of the same coordinate sequence
>         for a single geometry should be possible.
>
> I understand that is is desirable to use multiple serializations that 
> use different CRS, but I do not understand what exactly you mean here. 
> Can you elaborate on this?
Yes. Perhaps is easiest to give an example:

ex1:myGeometry
     ex2:shape "POLYGON((97372 487152,97372 580407,149636 580407,149636 
487152,97372 487152))"^^ex2:wktLiteral;
     ex2:shape "Polyshape {97372 487152,97372 580407,149636 
580407,149636 487152}"^^ex2:SomeOtherFormat;
     ex2:CRS <http://www.opengis.net/def/crs/EPSG/0/28992> 
<http://www.opengis.net/def/crs/EPSG/0/28992> ;
     ex2:CRS <http://www.other.org/srs/12345> 
<http://www.opengis.net/def/crs/EPSG/0/28992> ;
.

Note that this is one geometry, because the CRSs and the coordinate 
strings are equal. It is just their notation that differs.

Data publication like this could be useful because consumer A might want 
WKT while consumer B wants SomeOtherFormat. Likewise, consumer A may 
like using OGC/EPSG specifications of CRS's while consumer B prefers 
another authority.

>      1. There should not be a single authority for specifying CRS's
>         (especially if we want specifications to last until the sun
>         goes nova).
>
> +1. This is why using just a URI is a good choice!
I am kind of confused here. Are you actually saying that you agree with 
this point and that the CRS should be specified separately?

>      1. Next to CRS there are other geometry properties that could be
>         important, like level of detail. Do they need to be put in the
>         same literal too? That would make things even messier.
>
> Do you need this information to interpret the coordinates?
Yes. In some cases.

And in some cases the CRS is irrelevant for interpretation. Perhaps a 
consumer wants to know if the coordinates are known. Or she (I will go 
along with that :-)) would like to know the number of coordinates (to 
get a measure of shape complexity). Or she would like to know the 
geometry type (is it a point or a line or a polygon or a multipolygon?). 
Or she would like to know if the geometry is 2D or 3D....

>      1. It should be possible to select only the CRS or only the
>         coordinates (for specific use cases).
>
> GeoSPARQL and stSPARQL offers a function for selecting the CRS of a 
> geometry, and you can use a simple regex to get just the coordinates. 
> As you say, this is for specific use cases, so why reinventing the 
> wheel and not use existing standards?
I am all for using existing standards. As far as I know (but I could be 
wrong), WKT does not have a CRS specification. It was only added by 
GeoSPARQL. WKT is widely used, it is probably the best supported 
geometry encoding around. So I am all for keeping WKT (although I am 
doubtful about conflating the coordinate sequence and the geometry type).
>
>      1. Processing the coordinates requires removing the CRS
>         specification from the string, which is undesirable extra
>         processing.
>
> On the contrary! When storing a spatial literal, you start by reading 
> the CRS, create the appropriate precision model, and then use a WKT 
> parser for example for the rest. If you have to keep in memory or 
> secondary storage triples until all required parts are gathered things 
> can get extremely costly very easily (for example performance will 
> vary according to the ordering of the triples within a file!).
Well, that could be the case in some software, but in the software that 
I have experience with the CRS is not specified at the geometry level, 
but at some higher level. First some kind of container must exist, and 
it has a CRS property. Then the coordinate sequences are used to create 
geometric objects.
>
>      1. It should be possible to make statements about the CRS.
>      2. It should be possible to dereference a CRS.
>
> I fully agree with you, but I think that this is not required in the 
> context of this group. This should be the work of another group, since 
> many vocabularies for representing CRS have to be unified, which is 
> definitely not an easy task (and definitely out of scope for this group).
Unification of vocabularies and the semantics of CRS's may be out of 
scope, but such work will be hindered if the CRS is hidden away in a 
text string.
>
>     And I am quite sure this list is not exhaustive. Is it possible to
>     have an overview of advantages of concatenating the CRS and the
>     coordinates?
>
>
> The most important aspect of having the CRS inside a spatial literal, 
> is that it relies on RDF and nothing more thus making it a good choice 
> for the linked open data world.
I am sorry, but I don't understand what you wrote. In my mind, having 
the CRS as a separate URI is much more in the spirit of RDF and linked 
open data.

>
> What you propose has to be encoded as an OWL 2 ontology with the 
> appropriate cardinality restrictions, and then the users are forced to 
> adopt this ontology. The linked data story so far (even though I do 
> not entirely agree with this :) ) has shown that the only way to 
> achieve adoption is by enforcing as few restrictions as possible.
Could you explain why OWL 2 is necessary?

>     Put in a more general perspective: If geographical data are going
>     to exist in the web of Linked Data, it is good to depart from
>     historical constructs if better solutions are on offer. Using a
>     URI to identify a CRS is a very good step in the rights direction.
>     But why make it impossible to use that same URI in an RDF triple?
>
>
> Linked geospatial data are already here (one way or another) :) For 
> example, a few months ago we published more than 100GB of linked 
> geospatial data just by publishing two datasets 
> (http://datahub.io/organization/teleios). I fully agree with you that 
> we should not reinvent the wheel. That's why I am arguing in favor of 
> using existing standards for this. So far, I have not been convinced 
> that functionality-wise there is something missing from existing 
> standards like GeoSPARQL. We have pointed out some small issues here 
> and there (spatial aggregates, a transformation function) or some more 
> important issues (temporal dimension of spatial data), but I am not 
> convinced that there is something fundamentally wrong with it. If we 
> follow the evolution of historical constructs, we can see that OGC 
> started by separating serilalizations from CRS (however it defined 
> precisely a mechanism for associating CRS and geometries), and then 
> moved on and combined them (GML and GeoSPARQL).
GML doesn't really count in this respect. It is not a data format, but a 
way of modelling data, comparable with RDF. By nature, GML incorporates 
everything. As far as I know, it is only GeoSPARQL that glued the CRS to 
the coordinate sequence.
> I agree that it would be nice to be able to define the CRS for a set 
> of geographic features, but this would mean that we should define an 
> OWL 2 ontology (similarly to schemas defined in GML) for this purpose, 
> thus shooting our selves in the foot regarding the adoption of the 
> proposed vocabulary.
>
>
>
>
Received on Friday, 3 January 2014 17:22:12 UTC