Re: CRS specification (was: Re: ISA Core Location Vocabulary) from Frans Knibbe | Geodan on 2014-01-06 (public-locadd@w3.org from January 2014)

From: Frans Knibbe | Geodan <frans.knibbe@geodan.nl>
Date: Mon, 06 Jan 2014 16:38:32 +0100
To: Kostis Kyzirakos <Kostis.Kyzirakos@cwi.nl>
CC: LocAdd W3C CG Public Mailing list <public-locadd@w3.org>
Message-ID: <52CACDF8.1020703@geodan.nl>
Hello Kostis,

Yes, it is interesting! I continue below. I did remove some portions of 
older discussion, hopefully without destroying proper contexts.

Regards,
Frans

On 2014-01-06 14:40, Kostis Kyzirakos wrote:
> It is always interesting to exchange opinions :)
>
>>         I agree that a sequence of coordinates should be associated
>>         with a CRS. In my opinion, that is exactly what happens in
>>         the example I gave:
>>
>>
>>         ex1:myGeometry
>>             a ex2:geometry ;
>>             ex2:asWKT "POLYGON((97372 487152,97372 580407,149636
>>         580407,149636 487152,97372 487152))"^^ex2:wktLiteral ;
>>             ex2:CRS <http://www.opengis.net/def/crs/EPSG/0/28992>
>>         <http://www.opengis.net/def/crs/EPSG/0/28992> ;
>>
>>         This is based on the viewpoint of a geometry consisting of a
>>         sequence of coordinates and a CRS.  They are both properties
>>         of a geometry.
>>
>>
>>     What happens though if you merge the following graphs:
>>
>>     ex1:myGeometry ex2:asWKT "POLYGON((97372 487152,97372
>>     580407,149636 580407,149636 487152,97372 487152))"^^ex2:wktLiteral ;
>>                               ex2:CRS
>>     <http://www.opengis.net/def/crs/EPSG/0/28992>
>>     <http://www.opengis.net/def/crs/EPSG/0/28992> .
>>
>>     ex1:myGeometry ex2:asWKT " POLYGON((4.54103559631648
>>     52.369221013436,4.52469206625503
>>     53.2071950151372,5.30692041653441
>>     53.210266927542,5.30843905756704
>>     52.3722183594399,4.54103559631648
>>     52.369221013436))"^^ex2:wktLiteral ;
>>                               ex2:CRS
>>     <http://www.opengis.net/def/crs/EPSG/0/4326> .
>>
>>     One could argue that we can avoid such problems by using
>>     different URI for different serializations. However, combination
>>     of different datasets becomes problematic since constraints are
>>     introduced...
>     I would say the two geometries in your examples are different
>     geometries. So they can not be merged (have the same URI). Could
>     you explain how a combination of different datasets becomes
>     problematic in that case?
>
>
> Actually it is the same geometry expressed in different CRS, and it is 
> perfectly valid to have all the above statements in an RDF graph.
Well, that depends on the definition of what a geometry is, doesn't it?  
If we define a geometry to be a certain set of coordinates in a certain 
CRS, then changing either the coordinates or the CRS gives us a 
different geometry. In my mind, that is most logical and useful way of 
defining the concept of geometry.

Consider a geographical feature (with a single URI) that has multiple 
sets of coordinates in different CRS's. Probably only one set was truly 
measured and the others were calculated by a transformation function. 
Those other sets of coordinates can contain all kinds of errors 
introduced by the transformation and its parameters. Can we then truly 
state that the coordinates are equal? Or would that be saying something 
like 1.00000000 = 1.0000001?

> You could also use different triples to refer to different 
> approximations of the same geographic feature (e.g., a city can be 
> represented by point, a bounding box, a very detailed polyon etc). The 
> problem is that after storing them, you cannot get back which CRS 
> corresponds to each spatial literal.
True, if you only have the literal, e.g. "POLYGON((97372 487152,97372 
580407,149636 580407,149636 487152,97372 487152))", you can not get the 
CRS. But is that a problem? If you have the triple you can get the CRS 
(provided one is specified).
> My opinion is that the serialziation of a geometry should not be 
> discussed in the context of this working group. Links to existing (or 
> future!) standards has to be at the correct places (this is what this 
> vocab does in my opinion).
I agree. But I also think the CRS should not be part of the 
serialization of a geometry.

By the way, I think it is perfectly OK not to specify any CRS and to 
assume a standard CRS in that case (like WGS84). The simpler, the 
better.  In fact, I hope that in the future we will have one standard 
CRS for the whole world. Not having to project geographical data to flat 
space any longer helps...
>
>>     This is not a correct analogy. Because "POINT(0 0)" has no
>>     meaning on its own, while "Αθήνα" has some meaning on its own.
>>     You can add as much information as you want, but each RDF term
>>     has to be self-contained. You cannot rely on a set of triples to
>>     interpret an RDF term. At least this is what all formal
>>     treatments of RDF do!
>     The string "POINT(0 0)" does have meaning. You can tell it is a
>     point with known coordinates. You just don't know the CRS, so you
>     can not draw the point on a map, for instance. Similarly, if you
>     would like to use the word "Αθήνα" in a text document things would
>     probably also go wrong because vital data are missing.
>
>
> This has nothing to do with drawing something on a map. Can you say if 
> "POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))"^^geo:wktLiteral contains 
> "POINT(5 5)"^^geo:wktLiteral if you do not know the underlying CRS 
> (ignore that GeoSPARQL defines a default CRS for this example)? No, 
> you cannot. If you had two literals like "1"^^xsd:integer and 
> "2"^^xsd:integer, could you say where the first is smaller than the 
> second. Sure. But why?
Clearly adding data about an object increases the things you can can do 
with the object. I was only trying to demonstrate that a string like 
"POINT(0 0)" does have meaning. Adding data about the CRS would allow 
one to do more things. Adding a LOD, for example, would allow one to do 
even more. In my view, this does not prove that the CRS can not be 
decoupled. Besides that, decoupling the CRS is not the same as losing 
the CRS.

Would there be a problem with defining a function like "contains" to 
work on classes instead of data types?  I honestly don't know...

>
> In general, when defining a datatype, you need to properly define the 
> lexical space, the value space and the lexical-to-value mapping 
> function. The lexical space is defined by the WKT grammar in this 
> example. However, in order to map elements of the lexical space to the 
> value space, you need to be able to interpret the coordinates meaning 
> you need to know the underlying CRS. This is what I meant when I was 
> saying that RDF terms have to be self-contained. Otherwise, you can 
> not give precise semantics which is crucial when you are extending a 
> standard like RDF with new datatypes.
>
>     Let's take the example of drawing geographical data on a map. In
>     all software that I know of, the CRS is specified for the map as a
>     whole, not for individual geometries. Or take the example of
>     storing geographical data in a database table. The CRS is usually
>     specified at the column level.
>
>
> Usually you define a CRS per layer when drawing data on a map (e.g., 
> ArcGIS and QGis allows you to do so). In PostGIS you are allowed to 
> have different CRS at the same column. However, if you choose to do 
> so, you cannot index the column or use this column in a topological 
> function. In any case, you are describing an application and not a 
> vocabulary which not the best motivation IMHO.
I wrote that to illustrate the point that current GIS software takes 
geometry and CRS separately. It was one of the many reasons why I think 
the CRS should be kept separate, and I think the point still stands.
>
>>
>>     Anyhow, WKT and GML provides geometry collections like
>>     MULTIPOINT, MULTILINESTRING, MULTIPOLYGON etc. so you have
>>     specify a single CRS for a geometry collection.
>     Yes, but those are fixed collections. I think it should be
>     possible to assign a CRS to dynamic collections, like the result
>     set of a query for instance. And I think it is important to be
>     able to assign the CRS at the level of a dataset (things like
>     void:Dataset or dcat:Dataset).
>
>
> Why should we disallow results that contain different CRS? Why should 
> we impose more restrictions than necessary?
> On the contrary, we should encourage usage of all available 
> information and not impose any restrictions on this matter, unless the 
> user explicitly requests so.
How would making it possible to assign CRS's to groups of geometries be 
a restriction? I am afraid I don't get it.

>>          1. Having multiple specifications of the same CRS for a
>>             single geometry should be possible.
>>          2. Having multiple specifications of the same coordinate
>>             sequence for a single geometry should be possible.
>>
>>     I understand that is is desirable to use multiple serializations
>>     that use different CRS, but I do not understand what exactly you
>>     mean here. Can you elaborate on this?
>     Yes. Perhaps is easiest to give an example:
>
>     ex1:myGeometry
>         ex2:shape "POLYGON((97372 487152,97372 580407,149636
>     580407,149636 487152,97372 487152))"^^ex2:wktLiteral;
>         ex2:shape "Polyshape {97372 487152,97372 580407,149636
>     580407,149636 487152}"^^ex2:SomeOtherFormat;
>
>         ex2:CRS <http://www.opengis.net/def/crs/EPSG/0/28992>
>     <http://www.opengis.net/def/crs/EPSG/0/28992> ;
>         ex2:CRS <http://www.other.org/srs/12345>
>     <http://www.opengis.net/def/crs/EPSG/0/28992> ;
>     .
>
>     Note that this is one geometry, because the CRSs and the
>     coordinate strings are equal. It is just their notation that differs.
>
>     Data publication like this could be useful because consumer A
>     might want WKT while consumer B wants SomeOtherFormat. Likewise,
>     consumer A may like using OGC/EPSG specifications of CRS's while
>     consumer B prefers another authority.
>
>
> Which triple contain the EPSG:28992 serialization and which one the 
> srs:12345 one?
Both. This describes one set of coordinates (with two different 
notations) and one CRS (with two different notations/authorities).
> Why do you enforce that a spatial feature must have exactly one 
> spatial extent?
I do not. A spatial feature is not the same as a geometry. If you think 
I was implying that then there has probably been a lot of misunderstanding.
> Isn't it possible for a municipality for example to be represented by 
> a very detailed polygon that can be used for map construction, a 
> bounding box that can be used for approximate answers and a point that 
> can be used for drawing a dot on a map?
Sure, that is possible. In my mind, the municipality (the spatial 
feature) can have lots of different geometries. The geometries can 
differ in multiple ways: they can have different complexity, a different 
spatial resolution, a different time of measurement or a different CRS. 
It is up to the consumer to select whatever geometry is best suited for 
her purposes.

>>          1. There should not be a single authority for specifying
>>             CRS's (especially if we want specifications to last until
>>             the sun goes nova).
>>
>>     +1. This is why using just a URI is a good choice!
>     I am kind of confused here. Are you actually saying that you agree
>     with this point and that the CRS should be specified separately?
>
>
> I agree that CRS should be identified by URI, but not in another 
> triple separated by the geometry serialization :) A CRS could be used 
> in different triples as in order to provide metadata about the geometry.
>
>     And in some cases the CRS is irrelevant for interpretation.
>     Perhaps a consumer wants to know if the coordinates are known. Or
>     she (I will go along with that :-)) would like to know the number
>     of coordinates (to get a measure of shape complexity). Or she
>     would like to know the geometry type (is it a point or a line or a
>     polygon or a multipolygon?). Or she would like to know if the
>     geometry is 2D or 3D....
>
>
> What you are describing is metadata about the geometry. This 
> information can be asserted using additional triples.
You are saying that the CRS is essential and therefore not 'metadata', 
right? And that things that are essential can not exist in separate 
triples?

I agree that the CRS is very important. But in some cases it is not 
needed. But more importantly, why can't something very important not 
exist in a separate triple?

>
>>          1. It should be possible to select only the CRS or only the
>>             coordinates (for specific use cases).
>>
>>     GeoSPARQL and stSPARQL offers a function for selecting the CRS of
>>     a geometry, and you can use a simple regex to get just the
>>     coordinates. As you say, this is for specific use cases, so why
>>     reinventing the wheel and not use existing standards?
>     I am all for using existing standards. As far as I know (but I
>     could be wrong), WKT does not have a CRS specification. It was
>     only added by GeoSPARQL. WKT is widely used, it is probably the
>     best supported geometry encoding around. So I am all for keeping
>     WKT (although I am doubtful about conflating the coordinate
>     sequence and the geometry type).
>
>
> The OGC-SFA standard has two parts. In the first part it defines the 
> grammar for representing points, linestrings, polygons etc and a 
> grammar for representing CRS. For example, the following is the WKT 
> representation of WGS84:
>
> GEOGCS["WGS 84",
>        DATUM["WGS_1984", SPHEROID["WGS 
> 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],
>        PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]],
> UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],
>        AUTHORITY["EPSG","4326"]]
>
> On the second part of the standard, it defines the SQL realization of 
> the first part. This is where it defines for examples that a spatial 
> RDBMS should have identify  feature tables, geometry columns, CRS, etc.

Ok, but is there something in the original standards that says it is not 
possible to use a WKT representation of a geometry without a 
specification of a CRS? As far as I know, if you agree to use WKT as a 
notation for geometry, it is all right to do so without concatenating a 
CRS URI.
>
>>          1. Processing the coordinates requires removing the CRS
>>             specification from the string, which is undesirable extra
>>             processing.
>>
>>     On the contrary! When storing a spatial literal, you start by
>>     reading the CRS, create the appropriate precision model, and then
>>     use a WKT parser for example for the rest. If you have to keep in
>>     memory or secondary storage triples until all required parts are
>>     gathered things can get extremely costly very easily (for example
>>     performance will vary according to the ordering of the triples
>>     within a file!).
>     Well, that could be the case in some software, but in the software
>     that I have experience with the CRS is not specified at the
>     geometry level, but at some higher level. First some kind of
>     container must exist, and it has a CRS property. Then the
>     coordinate sequences are used to create geometric objects.
>
>
> Regardless the implementation, consider the implication of having to 
> gather all parts of a geometry that is dispersed among different 
> triples in order to interpret it.
I would not easily refer to all properties of a geometry being child 
objects as all parts being dispersed. The properties would be exactly 
the triples one gets when the URI of the geometry is dereferenced.
> Either way, we should not define something based on current or future 
> implemenations.
What about trying not to reinvent the wheel?

>>     The most important aspect of having the CRS inside a spatial
>>     literal, is that it relies on RDF and nothing more thus making it
>>     a good choice for the linked open data world.
>     I am sorry, but I don't understand what you wrote. In my mind,
>     having the CRS as a separate URI is much more in the spirit of RDF
>     and linked open data.
>
>
> I am not talking about the spirit of linked data. I am talking about 
> being RDF compliant. stSPARQL and GeoSPARQL made the same design 
> choice and representing geometric information as a new kind of typed 
> literals. To do so, both introduced new datatypes and defined some 
> functions to operate on these datatypes. In order to define formally a 
> datatype (for example as we do in stSPARQL), you have to define 
> formally the lexical space, the value space and a lexical to value 
> mapping function. You cannot define this datatype without knowing the CRS.
What about just using WKT (or some other notation of coordinate 
sequences) as a datatype and modelling geometry as a class? This seems 
to be an essential question, already asked above. Is there a need to 
model geometry as a datatype? Again, I honestly don't know.
>
>>     What you propose has to be encoded as an OWL 2 ontology with the
>>     appropriate cardinality restrictions, and then the users are
>>     forced to adopt this ontology. The linked data story so far (even
>>     though I do not entirely agree with this :) ) has shown that the
>>     only way to achieve adoption is by enforcing as few restrictions
>>     as possible.
>     Could you explain why OWL 2 is necessary?
>
>
> You need OWL 2 to formally define a complex class and then say that a 
> geometry consists of exactly two parts, one that contains the 
> coordinate sequence and another on that contain the CRS. You cannot 
> make such statements using RDFS or OWL.
In my mind, the CRS can be optional. If it is not specified, we assume 
some standard CRS (like in basic geo).
>
>>         Put in a more general perspective: If geographical data are
>>         going to exist in the web of Linked Data, it is good to
>>         depart from historical constructs if better solutions are on
>>         offer. Using a URI to identify a CRS is a very good step in
>>         the rights direction. But why make it impossible to use that
>>         same URI in an RDF triple?
>>
>>
>>     Linked geospatial data are already here (one way or another) :)
>>     For example, a few months ago we published more than 100GB of
>>     linked geospatial data just by publishing two datasets
>>     (http://datahub.io/organization/teleios). I fully agree with you
>>     that we should not reinvent the wheel. That's why I am arguing in
>>     favor of using existing standards for this. So far, I have not
>>     been convinced that functionality-wise there is something missing
>>     from existing standards like GeoSPARQL. We have pointed out some
>>     small issues here and there (spatial aggregates, a transformation
>>     function) or some more important issues (temporal dimension of
>>     spatial data), but I am not convinced that there is something
>>     fundamentally wrong with it. If we follow the evolution of
>>     historical constructs, we can see that OGC started by separating
>>     serilalizations from CRS (however it defined precisely a
>>     mechanism for associating CRS and geometries), and then moved on
>>     and combined them (GML and GeoSPARQL).
>     GML doesn't really count in this respect. It is not a data format,
>     but a way of modelling data, comparable with RDF. By nature, GML
>     incorporates everything. As far as I know, it is only GeoSPARQL
>     that glued the CRS to the coordinate sequence.
>
>
> WKT and GML are both data exchange formats. So is GeoSPARQL in some 
> sense. The SQL implementation of OGC-SFA, allows to say some things 
> about spatial features using the relational data model, while GML and 
> GeoSPARQL allow you to say more things about them using XML and RDF 
> respectively. What I meant is that GML defines a specific schema where 
> you cannot have a geometry serialization without a CRS. In RDF you can 
> do the same only if holding the complete infromation inside a single 
> RDF term. Otherwise, you need an OWL 2 ontology as I was arguing before.
Received on Monday, 6 January 2014 15:39:06 UTC