Re: CRS specification (was: Re: ISA Core Location Vocabulary) from Kostis Kyzirakos on 2014-01-06 (public-locadd@w3.org from January 2014)

From: Kostis Kyzirakos <Kostis.Kyzirakos@cwi.nl>
Date: Mon, 6 Jan 2014 20:09:17 +0100
To: "Frans Knibbe | Geodan" <frans.knibbe@geodan.nl>
Cc: LocAdd W3C CG Public Mailing list <public-locadd@w3.org>
Message-ID: <CAJUi=VHvWtQpB5wJyY_uB5E-CpKoYXOMm0Fa2TO8ZOsKPQRveA@mail.gmail.com>
Hi,


> Well, that depends on the definition of what a geometry is, doesn't it?
> If we define a geometry to be a certain set of coordinates in a certain
> CRS, then changing either the coordinates or the CRS gives us a different
> geometry. In my mind, that is most logical and useful way of defining the
> concept of geometry.
>

A set of point let's say coordinates should uniquely identify a point in
the earth. This requires both the coordinates and the CRS.


> Consider a geographical feature (with a single URI) that has multiple sets
> of coordinates in different CRS's. Probably only one set was truly measured
> and the others were calculated by a transformation function. Those other
> sets of coordinates can contain all kinds of errors introduced by the
> transformation and its parameters. Can we then truly state that the
> coordinates are equal? Or would that be saying something like 1.00000000 =
> 1.0000001?
>

It is a known fact that transformation between CRS comes with a penalty in
terms of accuracy (more or less). This however does not mean that a
geometry should have a unique serialization. On the contrary. For example,
GeoSPARQL defines two properties, geo:hasGeometry and
geo:hasDefaultGeometry to be used from users.
On the other hand, there are cases that geometries objects might be
representing with very different approximations (e.g., polygon, point) and
this is perfectly valid in some scenarios. So, we cannot enforce a
one-and-only serialization for a geometric object.

>    You could also use different triples to refer to different
> approximations of the same geographic feature (e.g., a city can be
> represented by point, a bounding box, a very detailed polyon etc). The
> problem is that after storing them, you cannot get back which CRS
> corresponds to each spatial literal.
>
> True, if you only have the literal, e.g. "POLYGON((97372 487152,97372
> 580407,149636 580407,149636 487152,97372 487152))", you can not get the
> CRS. But is that a problem? If you have the triple you can get the CRS
> (provided one is specified).
>

You cannot rely on two triples to interpret a single RDF term. See my
comment on the formal definition of a datatype. On the other hand, since we
adopt the open world assumption we certainly cannot assume that only a
single pair of coordinates and CRS will exist in a knowledge base.

>    My opinion is that the serialziation of a geometry should not be
> discussed in the context of this working group. Links to existing (or
> future!) standards has to be at the correct places (this is what this vocab
> does in my opinion).
>
> I agree. But I also think the CRS should not be part of the serialization
> of a geometry.
>
> By the way, I think it is perfectly OK not to specify any CRS and to
> assume a standard CRS in that case (like WGS84). The simpler, the better.
> In fact, I hope that in the future we will have one standard CRS for the
> whole world. Not having to project geographical data to flat space any
> longer helps...
>

Well, I do not agree with you in this. We will definitely have multiple CRS
and there is a good reason for this. Each CRS consider a different
approximation of the earth. For example, WGS84 consider the earth to be
spheroidal, which is perfect in some cases and absolute disaster for
others. It may sound a bit funny at first, but there is also the need in
some cases to model extraterrestrial coordinates :)


>      This is not a correct analogy. Because "POINT(0 0)" has no meaning
>> on its own, while "Αθήνα" has some meaning on its own. You can add as much
>> information as you want, but each RDF term has to be self-contained. You
>> cannot rely on a set of triples to interpret an RDF term. At least this is
>> what all formal treatments of RDF do!
>>
>>  The string "POINT(0 0)" does have meaning. You can tell it is a point
>> with known coordinates. You just don't know the CRS, so you can not draw
>> the point on a map, for instance. Similarly, if you would like to use the
>> word "Αθήνα" in a text document things would probably also go wrong because
>> vital data are missing.
>>
>
>  This has nothing to do with drawing something on a map. Can you say if
> "POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))"^^geo:wktLiteral contains "POINT(5
> 5)"^^geo:wktLiteral if you do not know the underlying CRS (ignore that
> GeoSPARQL defines a default CRS for this example)? No, you cannot. If you
> had two literals like "1"^^xsd:integer and "2"^^xsd:integer, could you say
> where the first is smaller than the second. Sure. But why?
>
> Clearly adding data about an object increases the things you can can do
> with the object. I was only trying to demonstrate that a string like
> "POINT(0 0)" does have meaning. Adding data about the CRS would allow one
> to do more things. Adding a LOD, for example, would allow one to do even
> more. In my view, this does not prove that the CRS can not be decoupled.
> Besides that, decoupling the CRS is not the same as losing the CRS.
>
> Would there be a problem with defining a function like "contains" to work
> on classes instead of data types?  I honestly don't know...
>

As far as I know, SPARQL extension functions can operate on a single or a
set (for aggregates) of RDF terms. In order to have a well defined RDF
literal the following comment about datatype definition has to be respected.

> In general, when defining a datatype, you need to properly define the
> lexical space, the value space and the lexical-to-value mapping function.
> The lexical space is defined by the WKT grammar in this example. However,
> in order to map elements of the lexical space to the value space, you need
> to be able to interpret the coordinates meaning you need to know the
> underlying CRS. This is what I meant when I was saying that RDF terms have
> to be self-contained. Otherwise, you can not give precise semantics which
> is crucial when you are extending a standard like RDF with new datatypes.
>
>


>               Why should we disallow results that contain different CRS?
> Why should we impose more restrictions than necessary?
>  On the contrary, we should encourage usage of all available information
> and not impose any restrictions on this matter, unless the user explicitly
> requests so.
>
> How would making it possible to assign CRS's to groups of geometries be a
> restriction? I am afraid I don't get it.
>

I do not see a way to do so in RDF. This is why I said earlier that an OWL
2 ontology is needed for enforcing such rules.

>     Yes. Perhaps is easiest to give an example:
>>
>> ex1:myGeometry
>>     ex2:shape "POLYGON((97372 487152,97372 580407,149636 580407,149636
>> 487152,97372 487152))"^^ex2:wktLiteral;
>>     ex2:shape "Polyshape {97372 487152,97372 580407,149636 580407,149636
>> 487152}"^^ex2:SomeOtherFormat;
>>
>>     ex2:CRS <http://www.opengis.net/def/crs/EPSG/0/28992><http://www.opengis.net/def/crs/EPSG/0/28992>;
>>     ex2:CRS <http://www.other.org/srs/12345><http://www.opengis.net/def/crs/EPSG/0/28992>;
>>  .
>>
>> Note that this is one geometry, because the CRSs and the coordinate
>> strings are equal. It is just their notation that differs.
>>
>> Data publication like this could be useful because consumer A might want
>> WKT while consumer B wants SomeOtherFormat. Likewise, consumer A may like
>> using OGC/EPSG specifications of CRS's while consumer B prefers another
>> authority.
>>
>
>  Which triple contain the EPSG:28992 serialization and which one the
> srs:12345 one?
>
> Both. This describes one set of coordinates (with two different notations)
> and one CRS (with two different notations/authorities).
>
>   Why do you enforce that a spatial feature must have exactly one spatial
> extent?
>
> I do not. A spatial feature is not the same as a geometry. If you think I
> was implying that then there has probably been a lot of misunderstanding.
>
>    Isn't it possible for a municipality for example to be represented by
> a very detailed polygon that can be used for map construction, a bounding
> box that can be used for approximate answers and a point that can be used
> for drawing a dot on a map?
>
> Sure, that is possible. In my mind, the municipality (the spatial feature)
> can have lots of different geometries. The geometries can differ in
> multiple ways: they can have different complexity, a different spatial
> resolution, a different time of measurement or a different CRS. It is up to
> the consumer to select whatever geometry is best suited for her purposes.
>
>      And in some cases the CRS is irrelevant for interpretation. Perhaps
> a consumer wants to know if the coordinates are known. Or she (I will go
> along with that :-)) would like to know the number of coordinates (to get a
> measure of shape complexity). Or she would like to know the geometry type
> (is it a point or a line or a polygon or a multipolygon?). Or she would
> like to know if the geometry is 2D or 3D....
>
>>
>  What you are describing is metadata about the geometry. This information
> can be asserted using additional triples.
>
> You are saying that the CRS is essential and therefore not 'metadata',
> right? And that things that are essential can not exist in separate
> triples?
>
> I agree that the CRS is very important. But in some cases it is not
> needed. But more importantly, why can't something very important not exist
> in a separate triple?
>

It is not a well defined RDF literal.

>     The OGC-SFA standard has two parts. In the first part it defines the
> grammar for representing points, linestrings, polygons etc and a grammar
> for representing CRS. For example, the following is the WKT representation
> of WGS84:
>
> GEOGCS["WGS 84",
>        DATUM["WGS_1984", SPHEROID["WGS
> 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],
>        PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]],
>        UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],
>        AUTHORITY["EPSG","4326"]]
>
>  On the second part of the standard, it defines the SQL realization of
> the first part. This is where it defines for examples that a spatial RDBMS
> should have identify  feature tables, geometry columns, CRS, etc.
>
>  Ok, but is there something in the original standards that says it is not
> possible to use a WKT representation of a geometry without a specification
> of a CRS? As far as I know, if you agree to use WKT as a notation for
> geometry, it is all right to do so without concatenating a CRS URI.
>
>
AFAIK, if you do so, you cannot use it inside a spatial function. Perhaps
it depends on the implementation. For example, if a CRS is not defined,
PostGIS defaults to srid -1 which means the Cartesian coordinate space,
meaning that a geometry is always associated to a CRS.


>      On the contrary! When storing a spatial literal, you start by
>> reading the CRS, create the appropriate precision model, and then use a WKT
>> parser for example for the rest. If you have to keep in memory or secondary
>> storage triples until all required parts are gathered things can get
>> extremely costly very easily (for example performance will vary according
>> to the ordering of the triples within a file!).
>>
>>  Well, that could be the case in some software, but in the software that
>> I have experience with the CRS is not specified at the geometry level, but
>> at some higher level. First some kind of container must exist, and it has a
>> CRS property. Then the coordinate sequences are used to create geometric
>> objects.
>>
>
>  Regardless the implementation, consider the implication of having to
> gather all parts of a geometry that is dispersed among different triples in
> order to interpret it.
>
> I would not easily refer to all properties of a geometry being child
> objects as all parts being dispersed. The properties would be exactly the
> triples one gets when the URI of the geometry is dereferenced.
>
>    Either way, we should not define something based on current or future
> implemenations.
>
> What about trying not to reinvent the wheel?
>

Standards is not the same things as implementations :)
I meant that the locn vocabulary should not define its own way on
representing geometries. This is work of other working groups. So the locn
vocab should delegate this work to others :)

>             The most important aspect of having the CRS inside a spatial
>> literal, is that it relies on RDF and nothing more thus making it a good
>> choice for the linked open data world.
>>
>>  I am sorry, but I don't understand what you wrote. In my mind, having
>> the CRS as a separate URI is much more in the spirit of RDF and linked open
>> data.
>>
>
>  I am not talking about the spirit of linked data. I am talking about
> being RDF compliant. stSPARQL and GeoSPARQL made the same design choice and
> representing geometric information as a new kind of typed literals. To do
> so, both introduced new datatypes and defined some functions to operate on
> these datatypes. In order to define formally a datatype (for example as we
> do in stSPARQL), you have to define formally the lexical space, the value
> space and a lexical to value mapping function. You cannot define this
> datatype without knowing the CRS.
>
> What about just using WKT (or some other notation of coordinate sequences)
> as a datatype and modelling geometry as a class? This seems to be an
> essential question, already asked above. Is there a need to model geometry
> as a datatype? Again, I honestly don't know.
>

If you want to define afterwards a set of SPARQL extension functions that
provide the geospatial functionality that is provided by RDBMS for quite
some while, this is the only logical choice I see.

>         You need OWL 2 to formally define a complex class and then say
> that a geometry consists of exactly two parts, one that contains the
> coordinate sequence and another on that contain the CRS. You cannot make
> such statements using RDFS or OWL.
>
> In my mind, the CRS can be optional. If it is not specified, we assume
> some standard CRS (like in basic geo).
>

Defaulting to a CRS is a good choice. The essential part here is that a
default CRS means that a geometric object is always associated to a CRS.

Cheers,
Kostis
Received on Monday, 6 January 2014 19:10:17 UTC