ACTION-249: Further work on BP8 (geometry)

Dear Linda, all,

In view of the relevant agenda item of today's call [1], I provide below 
a summary of the discussions we had so far on how to publish geometries 
on the Web. Apologies in advance for the long email.

@All, I kindly ask you to check if what reported here is correct. Any 
comments & revisions are more than welcome.


1. Preferred geometry format(s)

BP8 includes already some guidelines, but based on some discussion 
during the last f2f [2,3], it seems that more explicit recommendations 
would be desirable. What follows is a tentative contribution based on 
what I recall from our discussions.

The reference BP here is the general DWBP B14 principle of providing 
data in multiple formats: https://www.w3.org/TR/dwbp/#MultipleFormats

Applied to geometries, this should ideally imply providing geometries in 
the most used serialisations. However, this may not be always feasible, 
so it is important to identify one or more preferred geometry 
serialisations. One of the requirements here is that such serialisations 
should be preferably Web-friendly. More importantly, we don't want to be 
prescriptive and prevent people from publishing geometries in their 
preferred serialisations. So, the recommendation should sound like:

"Publish geometries in any serialisation you like, but for re-usability 
it is important that you make them available in format X [, Y, Z, ...]."

Most on the discussions we had on "preferred format(s)" were about two 
possible candidates: WKT and GeoJSON. Both are widely used and 
supported. GeoJSON is the most webby one, but also WKT is supported by 
popular Web libraries (as OpenLayers and Leaflet). Moreover, WKT is also 
supported by most triple stores - even those not supporting GeoSPARQL.

The main drawbacks with GeoJSON seem to be related to the fact that in 
its current version it supports only one CRS - namely, CRS84 (i.e., 
WGS84, with lon/lat axis order) - see: 
https://tools.ietf.org/html/rfc7946#section-4

WKT doesn't have this problem, and has other advantages - it's a very 
compact literal form compared to GeoJSON (as well as other geometry 
serialisations), it's case insensitive, and it has a corresponding 
binary encoding (WKB).

There are however a number of issues:
- WKT is available in a number of flavours - e.g., the original WKT 
format, the extended variant supported in PostGIS (EWKT) [2], the 
GeoSPARQL variant
- The axis order is implemented inconsistently. For instance, in 
PostGIS, by default it's lon/lat, irrespective of the CRS, whereas 
GeoSPARQL requires the use of the axis order specified in the CRS

It has been pointed out that GML does not have the issues above, since 
both CRS and axis order can be explicitly specified. However, my 
understanding of the relevant discussion is that GML is not considered 
webby enough, and it has limited support in Web / LD applications, tools 
and platforms.

Trying to come to a conclusion, this is my personal understanding:

(a) We cannot avoid recommending GeoJSON as (one of) the preferred 
geometry serialisation(s), because of its widespread use and support on 
the Web. But with the caveat that it may not be suitable for all use 
cases, due to the CRS issue.

(b) We need also a geometry serialisation not having the GeoJSON issues. 
Between WKT and GML, the former seems to be definitely more suitable for 
Web and LD applications. But in this case we need to decide which 
variant should be recommended, and the rule about the axis order.


2. How to publish geometries on the Web

This point is of course related to the principle of "publishing 
geometries for Web use", but also to the idea of making geometries a 
"first-class citizen" on the Web.

I think that issue here boils done to whether geometries should be 
published along with the relevant spatial things, or independently.

There was some discussion in the last f2f [2,3] about the two options of 
denoting geometries with blank nodes or URIs. Linda provided an example 
from Ireland, where the rationale about using blank nodes is that the 
data provider would like people to link to their spatial things, and not 
to the geometries.

 From this perspective, using URIs for geometries should be based on use 
cases where people would like to link instead to the geometry itself. 
Which, I think, is basically related to the question whether / in which 
cases a geometry is "re-usable".

A possible use case is when you need to link to some "authoritative" 
geometry - e.g., an administrative boundary maintained by an 
institutional agency. Using the relevant URI would ensure not only that 
I'm referring always to the official and up to date version of the 
geometry, but I implicitly provide provenance information.

This is not different from linking to and re-using data maintained by 
external organisations - e.g., as in the work illustrated by Bart in 
Lisbon, where fire depts. re-use cadastral data not by copying them 
locally, but linking to them.

So, IMO, in BP8 we should mention both approaches, clarifying the 
different use cases they are addressing. And we can also mention that, 
depending on the solution used, the publication of geometries in 
multiple serialisations is different - e.g., for geometry URIs, HTTP 
conneg can be used.


3. Geometries in RDF

The main points of discussion seem have been focussed on the following 
topics:

(a) Recommended vocabularies and/or best practices for using them

(b) Which information should be included in the RDF representation of a 
geometry

In general, both relate to Josh's work on the revision of GeoSPARQL. 
However, as far as existing vocabularies are concerned, my understanding 
is that the only consolidated agreement we have is the use of Basic Geo 
for point geometries. For other geometry types, bboxes, centroids, etc. 
we suggest a number of options. The question is whether this is enough, 
or we should instead provide some more specific recommendations. I can 
try to collect a number of examples from the reference vocabularies, if 
this may be helpful.

About point (b), there was some discussion during the last f2f [2,3], 
but I'm not sure an agreement was reached. One point quite controversial 
from the very beginning is whether the RDF representation should include 
the CRS separately from the geometry specification (this can be very 
much dependent on how a geometry is modelled). Another issue was about 
linking a geometry to related geometries (which seems to imply the use 
of URIs for geometries).

I think here it would be crucial to have real-world examples as a 
starting point, and possibly suggest how the can be improved.


Thanks, and sorry again for the long mail.

Meet you later

Andrea

----
[1]https://lists.w3.org/Archives/Public/public-sdw-wg/2017Jan/0067.html
[2]https://www.w3.org/2016/12/15-sdw-minutes
[3]https://www.w3.org/2016/12/16-sdw-minutes
[4]http://postgis.net/docs/ST_GeomFromEWKT.html

-- 
Andrea Perego, Ph.D.
Scientific / Technical Project Officer
European Commission DG JRC
Directorate B - Growth and Innovation
Unit B6 - Digital Economy
Via E. Fermi, 2749 - TP 262
21027 Ispra VA, Italy

https://ec.europa.eu/jrc/

----
The views expressed are purely those of the writer and may
not in any circumstances be regarded as stating an official
position of the European Commission.

Received on Wednesday, 18 January 2017 09:21:33 UTC