Re: CRS specification (was: Re: ISA Core Location Vocabulary) from Kostis Kyzirakos on 2014-01-06 (public-locadd@w3.org from January 2014)

From: Kostis Kyzirakos <Kostis.Kyzirakos@cwi.nl>
Date: Mon, 6 Jan 2014 14:40:52 +0100
To: "Frans Knibbe | Geodan" <frans.knibbe@geodan.nl>
Cc: LocAdd W3C CG Public Mailing list <public-locadd@w3.org>
Message-ID: <CAJUi=VE1M75o=JdOxFj0JhKrvuN_wRM0iSdGKvz6=hwErK1dZw@mail.gmail.com>
It is always interesting to exchange opinions :)

>   I agree that a sequence of coordinates should be associated with a CRS.
>> In my opinion, that is exactly what happens in the example I gave:
>>
>>
>> ex1:myGeometry
>>     a ex2:geometry ;
>>     ex2:asWKT "POLYGON((97372 487152,97372 580407,149636 580407,149636
>> 487152,97372 487152))"^^ex2:wktLiteral ;
>>     ex2:CRS <http://www.opengis.net/def/crs/EPSG/0/28992><http://www.opengis.net/def/crs/EPSG/0/28992>;
>>
>>      This is based on the viewpoint of a geometry consisting of a
>> sequence of coordinates and a CRS.  They are both properties of a geometry.
>>
>
>  What happens though if you merge the following graphs:
>
> ex1:myGeometry ex2:asWKT "POLYGON((97372 487152,97372 580407,149636
> 580407,149636 487152,97372 487152))"^^ex2:wktLiteral ;
>                           ex2:CRS
> <http://www.opengis.net/def/crs/EPSG/0/28992><http://www.opengis.net/def/crs/EPSG/0/28992>.
>
> ex1:myGeometry ex2:asWKT " POLYGON((4.54103559631648
> 52.369221013436,4.52469206625503 53.2071950151372,5.30692041653441
> 53.210266927542,5.30843905756704 52.3722183594399,4.54103559631648
> 52.369221013436))"^^ex2:wktLiteral ;
>                           ex2:CRS <
> http://www.opengis.net/def/crs/EPSG/0/4326> .
>
>  One could argue that we can avoid such problems by using different URI
> for different serializations. However, combination of different datasets
> becomes problematic since constraints are introduced...
>
> I would say the two geometries in your examples are different geometries.
> So they can not be merged (have the same URI). Could you explain how a
> combination of different datasets becomes problematic in that case?
>

Actually it is the same geometry expressed in different CRS, and it is
perfectly valid to have all the above statements in an RDF graph.
You could also use different triples to refer to different approximations
of the same geographic feature (e.g., a city can be represented by point, a
bounding box, a very detailed polyon etc). The problem is that after
storing them, you cannot get back which CRS corresponds to each spatial
literal. My opinion is that the serialziation of a geometry should not be
discussed in the context of this working group. Links to existing (or
future!) standards has to be at the correct places (this is what this vocab
does in my opinion).


>  I was not trying to say that the NeoGeo way is better. For the majority
> of use cases making the coordinate sequence one element is fine. I was just
> mentioning NeoGeo to illustrate that the decision to define a certain basic
> building block can be quite arbitrary.
>
>    Majority? My experience says the exact opposite, but this is not the
case here.
Anyhow, we should all try avoiding being empiricists :)


>   This is not a correct analogy. Because "POINT(0 0)" has no meaning on
> its own, while "Αθήνα" has some meaning on its own. You can add as much
> information as you want, but each RDF term has to be self-contained. You
> cannot rely on a set of triples to interpret an RDF term. At least this is
> what all formal treatments of RDF do!
>
> The string "POINT(0 0)" does have meaning. You can tell it is a point with
> known coordinates. You just don't know the CRS, so you can not draw the
> point on a map, for instance. Similarly, if you would like to use the word
> "Αθήνα" in a text document things would probably also go wrong because
> vital data are missing.
>

This has nothing to do with drawing something on a map. Can you say if
"POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))"^^geo:wktLiteral contains "POINT(5
5)"^^geo:wktLiteral if you do not know the underlying CRS (ignore that
GeoSPARQL defines a default CRS for this example)? No, you cannot. If you
had two literals like "1"^^xsd:integer and "2"^^xsd:integer, could you say
where the first is smaller than the second. Sure. But why?

In general, when defining a datatype, you need to properly define the
lexical space, the value space and the lexical-to-value mapping function.
The lexical space is defined by the WKT grammar in this example. However,
in order to map elements of the lexical space to the value space, you need
to be able to interpret the coordinates meaning you need to know the
underlying CRS. This is what I meant when I was saying that RDF terms have
to be self-contained. Otherwise, you can not give precise semantics which
is crucial when you are extending a standard like RDF with new datatypes.


>  Let's take the example of drawing geographical data on a map. In all
> software that I know of, the CRS is specified for the map as a whole, not
> for individual geometries. Or take the example of storing geographical data
> in a database table. The CRS is usually specified at the column level.
>

Usually you define a CRS per layer when drawing data on a map (e.g., ArcGIS
and QGis allows you to do so). In PostGIS you are allowed to have different
CRS at the same column. However, if you choose to do so, you cannot index
the column or use this column in a topological function. In any case, you
are describing an application and not a vocabulary which not the best
motivation IMHO.

>
>>    1. It should be possible to specify the CRS at the level of a data
>>    set or a collection of geometries.
>>
>>   We should be careful here. One could argue that we should be able to
> define the CRS at the level of a triple, at the level of an rdf:set, at the
> level of a named graph and so on (see for example the relevant research on
> provenance). I think that the simplest way to go is to define it at the
> finest level of granularity which is a triple. This allows all other cases
> to be covered as well. After all, RDF is not known for being laconic :D
>
> So that means the CRS should be a separate URI, right?
>

CRS should be identified by URI. I do not believe however that this should
be in a separate triple. There could be such infrmation in the sense that
this is metadata about the geometry, but this has nothing to do with having
them separately from the coordinates.

>     Anyhow, WKT and GML provides geometry collections like MULTIPOINT,
> MULTILINESTRING, MULTIPOLYGON etc. so you have specify a single CRS for a
> geometry collection.
>
> Yes, but those are fixed collections. I think it should be possible to
> assign a CRS to dynamic collections, like the result set of a query for
> instance. And I think it is important to be able to assign the CRS at the
> level of a dataset (things like void:Dataset or dcat:Dataset).
>

Why should we disallow results that contain different CRS? Why should we
impose more restrictions than necessary?
On the contrary, we should encourage usage of all available information and
not impose any restrictions on this matter, unless the user explicitly
requests so.

>
>>    1. It should be possible to easily select data based on CRS in SPARQL
>>    queries.
>>
>>    GeoSPARQL and stSPARQL already provide a function for doing so.
>
> True, but I consider that a workaround. And it is one thing to have a
> specification, and another one to have it working. A getsrid function would
> need to be implemented by all SPARQL software. A plain URI works right
> away.
>

Sure. But there is a reason why GeoSPARQL and stSPARQL have been defined.
The main point is that you cannot represent and query geometric information
without extending RDF and SPARQL (see the previous comment about
semantics).

>
>>    1. Having multiple specifications of the same CRS for a single
>>    geometry should be possible.
>>    2. Having multiple specifications of the same coordinate sequence for
>>    a single geometry should be possible.
>>
>>   I understand that is is desirable to use multiple serializations that
> use different CRS, but I do not understand what exactly you mean here. Can
> you elaborate on this?
>
> Yes. Perhaps is easiest to give an example:
>
> ex1:myGeometry
>     ex2:shape "POLYGON((97372 487152,97372 580407,149636 580407,149636
> 487152,97372 487152))"^^ex2:wktLiteral;
>     ex2:shape "Polyshape {97372 487152,97372 580407,149636 580407,149636
> 487152}"^^ex2:SomeOtherFormat;
>
>     ex2:CRS <http://www.opengis.net/def/crs/EPSG/0/28992><http://www.opengis.net/def/crs/EPSG/0/28992>;
>     ex2:CRS <http://www.other.org/srs/12345><http://www.opengis.net/def/crs/EPSG/0/28992>;
> .
>
> Note that this is one geometry, because the CRSs and the coordinate
> strings are equal. It is just their notation that differs.
>
> Data publication like this could be useful because consumer A might want
> WKT while consumer B wants SomeOtherFormat. Likewise, consumer A may like
> using OGC/EPSG specifications of CRS's while consumer B prefers another
> authority.
>

Which triple contain the EPSG:28992 serialization and which one the
srs:12345 one?
Why do you enforce that a spatial feature must have exactly one spatial
extent?
Isn't it possible for a municipality for example to be represented by a
very detailed polygon that can be used for map construction, a bounding box
that can be used for approximate answers and a point that can be used for
drawing a dot on a map?


>
>>    1. There should not be a single authority for specifying CRS's
>>    (especially if we want specifications to last until the sun goes nova).
>>
>>   +1. This is why using just a URI is a good choice!
>
> I am kind of confused here. Are you actually saying that you agree with
> this point and that the CRS should be specified separately?
>

I agree that CRS should be identified by URI, but not in another triple
separated by the geometry serialization :) A CRS could be used in different
triples as in order to provide metadata about the geometry.

And in some cases the CRS is irrelevant for interpretation. Perhaps a
> consumer wants to know if the coordinates are known. Or she (I will go
> along with that :-)) would like to know the number of coordinates (to get a
> measure of shape complexity). Or she would like to know the geometry type
> (is it a point or a line or a polygon or a multipolygon?). Or she would
> like to know if the geometry is 2D or 3D....
>

What you are describing is metadata about the geometry. This information
can be asserted using additional triples.


>>    1. It should be possible to select only the CRS or only the
>>    coordinates (for specific use cases).
>>
>>   GeoSPARQL and stSPARQL offers a function for selecting the CRS of a
> geometry, and you can use a simple regex to get just the coordinates. As
> you say, this is for specific use cases, so why reinventing the wheel and
> not use existing standards?
>
> I am all for using existing standards. As far as I know (but I could be
> wrong), WKT does not have a CRS specification. It was only added by
> GeoSPARQL. WKT is widely used, it is probably the best supported geometry
> encoding around. So I am all for keeping WKT (although I am doubtful about
> conflating the coordinate sequence and the geometry type).
>

The OGC-SFA standard has two parts. In the first part it defines the
grammar for representing points, linestrings, polygons etc and a grammar
for representing CRS. For example, the following is the WKT representation
of WGS84:

GEOGCS["WGS 84",
       DATUM["WGS_1984", SPHEROID["WGS
84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],
       PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]],
       UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],
       AUTHORITY["EPSG","4326"]]

On the second part of the standard, it defines the SQL realization of the
first part. This is where it defines for examples that a spatial RDBMS
should have identify  feature tables, geometry columns, CRS, etc.


>
>>    1. Processing the coordinates requires removing the CRS specification
>>    from the string, which is undesirable extra processing.
>>
>>   On the contrary! When storing a spatial literal, you start by reading
> the CRS, create the appropriate precision model, and then use a WKT parser
> for example for the rest. If you have to keep in memory or secondary
> storage triples until all required parts are gathered things can get
> extremely costly very easily (for example performance will vary according
> to the ordering of the triples within a file!).
>
> Well, that could be the case in some software, but in the software that I
> have experience with the CRS is not specified at the geometry level, but at
> some higher level. First some kind of container must exist, and it has a
> CRS property. Then the coordinate sequences are used to create geometric
> objects.
>

Regardless the implementation, consider the implication of having to gather
all parts of a geometry that is dispersed among different triples in order
to interpret it.
Either way, we should not define something based on current or future
implemenations.

>     The most important aspect of having the CRS inside a spatial literal,
> is that it relies on RDF and nothing more thus making it a good choice for
> the linked open data world.
>
> I am sorry, but I don't understand what you wrote. In my mind, having the
> CRS as a separate URI is much more in the spirit of RDF and linked open
> data.
>

I am not talking about the spirit of linked data. I am talking about being
RDF compliant. stSPARQL and GeoSPARQL made the same design choice and
representing geometric information as a new kind of typed literals. To do
so, both introduced new datatypes and defined some functions to operate on
these datatypes. In order to define formally a datatype (for example as we
do in stSPARQL), you have to define formally the lexical space, the value
space and a lexical to value mapping function. You cannot define this
datatype without knowing the CRS.


>    What you propose has to be encoded as an OWL 2 ontology with the
> appropriate cardinality restrictions, and then the users are forced to
> adopt this ontology. The linked data story so far (even though I do not
> entirely agree with this :) ) has shown that the only way to achieve
> adoption is by enforcing as few restrictions as possible.
>
> Could you explain why OWL 2 is necessary?
>

You need OWL 2 to formally define a complex class and then say that a
geometry consists of exactly two parts, one that contains the coordinate
sequence and another on that contain the CRS. You cannot make such
statements using RDFS or OWL.


>   Put in a more general perspective: If geographical data are going to
>> exist in the web of Linked Data, it is good to depart from historical
>> constructs if better solutions are on offer. Using a URI to identify a CRS
>> is a very good step in the rights direction. But why make it impossible to
>> use that same URI in an RDF triple?
>>
>
>  Linked geospatial data are already here (one way or another) :) For
> example, a few months ago we published more than 100GB of linked geospatial
> data just by publishing two datasets (
> http://datahub.io/organization/teleios). I fully agree with you that we
> should not reinvent the wheel. That's why I am arguing in favor of using
> existing standards for this. So far, I have not been convinced that
> functionality-wise there is something missing from existing standards like
> GeoSPARQL. We have pointed out some small issues here and there (spatial
> aggregates, a transformation function) or some more important issues
> (temporal dimension of spatial data), but I am not convinced that there is
> something fundamentally wrong with it. If we follow the evolution of
> historical constructs, we can see that OGC started by separating
> serilalizations from CRS (however it defined precisely a mechanism for
> associating CRS and geometries), and then moved on and combined them (GML
> and GeoSPARQL).
>
> GML doesn't really count in this respect. It is not a data format, but a
> way of modelling data, comparable with RDF. By nature, GML incorporates
> everything. As far as I know, it is only GeoSPARQL that glued the CRS to
> the coordinate sequence.
>

WKT and GML are both data exchange formats. So is GeoSPARQL in some sense.
The SQL implementation of OGC-SFA, allows to say some things about spatial
features using the relational data model, while GML and GeoSPARQL allow you
to say more things about them using XML and RDF respectively. What I meant
is that GML defines a specific schema where you cannot have a geometry
serialization without a CRS. In RDF you can do the same only if holding the
complete infromation inside a single RDF term. Otherwise, you need an OWL 2
ontology as I was arguing before.
Received on Monday, 6 January 2014 13:41:48 UTC