W3C home > Mailing lists > Public > public-sdw-wg@w3.org > July 2016

Re: geosparql:asWkt feedback from NL

From: Clemens Portele <portele@interactive-instruments.de>
Date: Fri, 29 Jul 2016 07:41:49 +0000
To: Frans Knibbe <frans.knibbe@geodan.nl>
CC: "Veer, Rein van (Rein.vanVeer@kadaster.nl)" <rein.vanveer@kadaster.nl>, "Farla, Joost (Joost.Farla@kadaster.nl)" <joost.farla@kadaster.nl>, "Joshua Lieberman" <jlieberman@tumblingwalls.com>, matthew perry <matthew.perry@oracle.com>, Linda van den Brink <l.vandenbrink@geonovum.nl>, SDW WG Public List <public-sdw-wg@w3.org>, "Maria, Pano (Pano.Maria@kadaster.nl)" <pano.maria@kadaster.nl>, "Brattinga, Marco" <marco.brattinga@ordina.nl>
Message-ID: <etPan.579b08c1.7d444149.375@interactive-instruments.de>
Hi Frans,

Yes, if the geometry literal is supposed to be a single string literal from which all the information about the geometry can be derived it makes sense to include the information about the CRS in the string, too. But why is geometry supposed to be a single string literal? Is there agreement it should be in the case of spatial data on the web?

Not necessarily, it is one approach, but not the only one. In general, there are usually at least two options for expressing something that has some kind of structure in a serialization: You can express it as a literal value or use the structuring capabilities of the (model underpinning the) serialization to express the structure. Or a mix of both.

The choice will depend on how the data is supposed to be used.

If I remember the discussions in GeoSPARQL correctly, we decided to use the first option (serialize the object in a single string literal) because the value of exposing the structural information of a geometry in RDF - which comes with an extra implementation cost - was unclear, we had existing serializations we wanted to reuse (WKT, GML) and we could specify functions as SPARQL extensions that provided the capability to do useful things with the geometries.

As a result, GeoSPARQL 1.0 specifies ogc:geomLiteral which is a placeholder for the geometry literal serialization (currently WKT or GML) in a single literal. A future version of GeoSPARQL may also expose geometry information through RDF properties, or provide additional functions where needed, but I think the literal serialization would continue to be supported, including for backwards compatibility.

I think most other serializations of spatial data on the Web tend towards the other option. For example, Basic Geo exposes lat/lon as RDF properties, GML exposes the geometry structure using XML Schema components, GeoJSON exposes the geometry structure using JSON components. Note that in these cases also the coordinates are not "one literal". In GML it is a list of decimals, in GeoJSON it is an array of numbers.

In my view it is impossible to identify only one approach for the Web regarding what is put in a literal and what is exposed as structure that works in all serializations and all use cases.

Best regards,
Clemens



On 28. Juli 2016 at 15:47:40, Frans Knibbe (frans.knibbe@geodan.nl<mailto:frans.knibbe@geodan.nl>) wrote:

Hi Clemens,

LOL, you boomeranged me with my own argument.

Yes, if the geometry literal is supposed to be a single string literal from which all the information about the geometry can be derived it makes sense to include the information about the CRS in the string, too. But why is geometry supposed to be a single string literal? Is there agreement it should be in the case of spatial data on the web?

I think the main question is one of atomicity: Which should be the basic building blocks of geometry? We agree (somewhat) on what the basic attributes of geometry are, but the point is which of those parts should always be taken together. One approach would be to have them all together in a single literal, for the sake of simplicity. Another is to view them as separate parts, to be flexible in building different data representations. The neogeo approach goes that way the furthest, even taking apart the array of coordinates. I think having the coordinates in one literal and to have the other attributes (CRS, type, dimensionality, ...) as separate parts hits the sweet spot in terms of practicality and versatility.

In (variants of) WKT we see several parts of geometry taken together. That is understandable given the reason why it was invented. But RDF has its own ways of expressing and linking things. It makes sense to me to rethink the way we want to express the way we can express geometry by taking advantage of what RDF has to offer. There seems to be agreement that it is a good idea to let a CRS reference be a URI than can be resolved into data describing the URI. But the URI should be made known as a URI in order to be able to take advantage of it. For example, if the CRS is stored as a URI it can be used in URI matching algorithms in triple stores.

Next to that, I can think of cases where a geometry does not have a CRS. An SVG geometry without a CRS is valid, if I am correct. For historical data the CRS could be unknown. By making the CRS mandatory we would exclude certain types of geometric data.

Regards,
Frans















On 27 July 2016 at 14:43, Clemens Portele <portele@interactive-instruments.de<mailto:portele@interactive-instruments.de>> wrote:
Hi Frans,

I think you make a good argument why the current GeoSPARQL approach makes sense - at least from the perspective of having a simple mechanism for representing a geometry in RDF. As you say, "a geometry is characterised by many attributes (coordinates, geometry type, level of detail, CRS, number of dimensions,..)" [*]. Since the asWKT literal is supposed to be a single string literal from which all the information about the geometry can be derived (e.g. for indexing or for conversion to other representations), it makes sense to include the information about the CRS in the string, too.

The use of the language tag is not possible, by the way, as RDF requires that the tags conforms to https://tools.ietf.org/html/bcp47#section-2.1 (which our CRS IDs don’t). It would also seem weird to use this mechanism, as the language tag is not a generic mechanism to support selectors for different serializations (which RDF does not support), but only for the language case.

Best regards,
Clemens

[*] I do not think that 'level of detail'  is a characteristic of a geometry, maybe it is a characteristic of a feature representation, but not of the geometry itself, but that is secondary here.



On 27. Juli 2016 at 12:47:19, Frans Knibbe (frans.knibbe@geodan.nl<mailto:frans.knibbe@geodan.nl>) wrote:

Hi Matt, Josh,

Whether or not it makes sense to have a CRS reference be part of a WKT literal is a returning point of debate. It has been discussed at length in the Locations and Addresses community group. No clear conclusion was found, so I would really welcome further discussion.

I am in favour of removing the CRS reference from the WKT string (or any other geometry datatype). Here are some reasons why:

  1.  A CRS reference should be a URI, not part of a string literal.
  2.  A geometry is characterised by many attributes (coordinates, geometry type, level of detail, CRS, number of dimensions,..). To conflate all in a single text string would be inflexible.
  3.  In many cases it makes sense to define a CRS at a higher level, for instance for a dataset/graph, or for a class (e.g. class GeomCRS84).
  4.  It seems to me that GML, GeoJSON and KML are more elaborate schemes than WKT, which is a single text string, a data type. So it makes more sense for GML, GeoJSON and KML to include some sort of CRS reference.
  5.  The orginal WKT did not have a CRS part (although CRS can be desribed in WKT, but that is another matter).

I have to admit it is a not an easy topic. It has to do with the nature of geometry: which attributes of a geometry are its intrinsic parts? Can a geometry exist as a naked string of coordinates?  I think it can.

I see a parallel with text. Take the word "map". You will probably have interpreted the word, and probably the interpretation was wrong. I meant the Dutch word "map", which means "folder" in English. I should have written "map"^^nl, that would have prevented the misunderstanding... This example shows that language is an essential attribute of text strings. Omitting it causes errors. But still it works in practice. We have the freedom to attach languages to text strings, or to infer the language from context. I think it would not hurt to have the same freedom with geometric coordinate strings.

Matt's arguments against are valid, of course. Backward compatibility is a problem, but not insolvable. And I can imagine that a solution for triple store management can be found too, with some creative thought (named graphs for all supported CRSs?). I wonder if modern index management in triple stores always assumes that all relevant data is contained in a single triple. What if you would want to have an index of names of people, or of toponyms? Some form of distinction between types of text strings would be needed. What if a recommendation takes the form of having CRS-typed predicates, something like (<ex:geom1234> <geo:crs84Coordinates> "6 50")?

Greetings,
Frans



On 25 July 2016 at 17:03, matthew perry <matthew.perry@oracle.com<mailto:matthew.perry@oracle.com>> wrote:

Hi Josh,

I would be pretty strongly opposed to removing the encoded CRS reference from wktLiteral.

If you look at other formats like GML and GeoJSON, those geometry literals include encoded CRS information, and it is implicitly encoded in KML because there is only one possible CRS. WKT is really the only major serialization that lacks CRS information, so to me it seems better to add CRS information to WKT so that it is consistent with the other serializations instead of depending on some other property of the geometry that may or may not exist in a particular dataset.

From a triplestore implementer's point of view, creating a spatial index for a GeoSPARQL dataset would be an order of magnitude harder if CRS information is not encoded in the geometry literal itself and you have to resort to looking for other triples to determine the CRS, as these triples may be missing and will be updated over time. Plus, such a change for GeoSPARQL 1.1 would not be backwards compatible with GeoSPARQL 1.0, which would cause a lot of headaches.

Thanks,
Matt

On 7/25/2016 10:30 AM, Joshua Lieberman wrote:
Hi,

I am a bit concerned about proliferating versions of geomLiteral and asWKT properties. It’s a big step already to make a geometry a first-class object with a global identifier and all the management issues that raises. Others have also noted that queries get considerably more complicated if one has to peer into the literals in order to get the right result. The alternative is to treat the asWKT and other coordinate properties as the data properties they are, dependent on the geometry object they are a part of. Then the coordinate system is defined by the crs property of the geometry. It may even be best to remove the CRS reference from the WktLiteral to improve interoperability between that and “regular” WKT that is returned by database functions.

Another consideration is to make it easier for software systems to provide the right CRS and translate between CRS’s as needed. One way might be to negotiate CRS as part of the content format for the geometry, as we have proposed for encoding format (e.g. application/ttl; geomLiteral=“WKT”; crs=“CRS84"). This at least doesn’t proliferate encodings or languages.

It would also, I think, simplify making queries that don’t have to explicitly select a geometry to test spatial relations or filters and allow the responding system to select or transform CRS’s as needed to process the query.

Josh

On Jul 25, 2016, at 9:34 AM, Brattinga, Marco <Marco.Brattinga@ordina.nl<mailto:Marco.Brattinga@ordina.nl>> wrote:

Hi Matt,

Thanks for pointing us to the geof:getSRID, this is indeed what we need for that particular requirement.

As for the encoding of CRS: we propose to encode the CRS into the language tag, not the datatype. But you could argue that this would proliferate the set of languages…

Marco

Van: matthew perry [mailto:matthew.perry@oracle.com]
Verzonden: maandag 25 juli 2016 15:26
Aan: Linda van den Brink; public-sdw-comments@w3.org<mailto:public-sdw-comments@w3.org>
CC: Brattinga, Marco; Veer, Rein van (Rein.vanVeer@kadaster.nl<mailto:Rein.vanVeer@kadaster.nl>); Farla, Joost (Joost.Farla@kadaster.nl<mailto:Joost.Farla@kadaster.nl>); Maria, Pano (Pano.Maria@kadaster.nl<mailto:Pano.Maria@kadaster.nl>)
Onderwerp: Re: geosparql:asWkt feedback from NL


Hi Linda,

Thanks for forwarding the comments.

One of the downsides of encoding the CRS info into the WKT literal is that you can't directly process the string with existing WKT tools, but it's pretty trivial to read a few bytes and strip the CRS URI off. I would be concerned with a proliferation of different datatypes if we encoded the CRS into the datatype URI. Creating subproperties of ogc:asWKT seems like a good, practical approach though.

By the way, GeoSPARQL already defines a function to return the CRS of a WKT literal:

8.7.10 Function: geof:getsrid

geof:getSRID (geom: ogc:geomLiteral): xsd:anyURI

Returns the spatial reference system URI for geom.

Cheers,
Matt


On 7/25/2016 3:22 AM, Linda van den Brink wrote:
Hi all,

From the developers at the Dutch Kadaster I got the email below, detailing some of the problems they have with CRS detection and selection in the current (web) standards. They also suggest some interesting solutions.

(sent to the comments list so they can participate in any discussion)

Linda

Van: Brattinga, Marco [mailto:Marco.Brattinga@ordina.nl]
Verzonden: zaterdag 23 juli 2016 22:49
Aan: Linda van den Brink; Veer, Rein van (Rein.vanVeer@kadaster.nl<mailto:Rein.vanVeer@kadaster.nl>); Farla, Joost (Joost.Farla@kadaster.nl<mailto:Joost.Farla@kadaster.nl>); Maria, Pano (Pano.Maria@kadaster.nl<mailto:Pano.Maria@kadaster.nl>)
CC: Brattinga, Marco (Marco.Brattinga@kadaster.nl<mailto:Marco.Brattinga@kadaster.nl>)
Onderwerp: RE: geosparql:asWkt uitdaging icm CRS-en

Linda,

As you know, at the Dutch Land Registry, we are currently making all our public data available as Linked Open Data. Because most of our data contains a spatial component, we are very interested in the work of the spatial on the web workgroup.

We would like to raise some questions and have the opportunity to share our concerns and experiences.

The situation:
-          Our data should not only be available as Linked Open Data, but also as JSON-LD and JSON data via REST API’s;
-          Currently, we store our spatial information as WKT strings;
-          Most of the original spatial data is represented as RD (the Dutch CRS, EPSG:28992), and some geospatial experts would really like to use the data in its original CRS;
-          But most “regular” webdevelopers would like to use the data as CRS84;
-          As far as we know, a “regular” WKT string doesn’t contain a reference to the CRS, and this should be figured out from the context;
-          The current geosparql specification specifies that the asWKT object is a WKT string, prefixed with a CRS, represented with its URI name, or –if absent- CRS84 is assumed;

We use the geosparql specification, so a particular resource will have a property geosparql:hasGeometry, with a reference to a resource of the class geosparql:Geometry, and this latter resource has a geosparql:asWKT property, with a WKT string as the object.

Our challenges:
-          We like the idea of a separate geometry. But we would like to include multiple WKT-strings, each with its own CRS, just like you would have an rdfs:label with multiple languages;
-          The current situation means that we would have to encode the CRS in the WKT-string and that means that it is not really a WKT string any more (which presents problems if we want to use it for our REST API, which users don’t understand the encoding of the CRS);
-          Another problem is, that you’ll get multiple asWKT triples, you have to parse the string if you want to select just one of the triples. This is not nice (at least we would like to have a function available, just like the lang() function)

At this moment, we’ve solved the problem by introducing a subPropertyOf asWKT, for every CRS:

pdok:asWKT-RD rdfs:subPropertyOf geosparql:asWKT

Every Geometry in our dataset has one geosparql:asWKT with a WKT string without a CRS (meaning that it should be CRS84, which is fine), and a property pdok:asWKT-RD with the semantics that it also shouldn’t contain a CRS, because EPSG:28992 is assumed. It works and is compliant to the standards, but not very nice.

What we really would like is:
-          A more elegant way of encoding the CRS. Maybe you could do it just like a language tag, for example: <Geo> geosparql:asWKT “POINT(53,2 5,6)”@EPSG:28992;
-          A function to check for a particular CRS, similar to lang(), for example: crs(?wkt) (which would result a literal or maybe a IRI representing the CRS)

Because most spatial encodings can be converted between each other, even a better approach might be to have a transformation service (toCRS(?wkt,?crs)).

Last, but not least: it would be very much appreciated if a user could request for a particular CRS, and the response could “tell” what the CRS is. We would like to suggest using http-accept-crs and a crs-content-type kind of headers, just like a language accept-header or a serialization accept-header: having content negotiation available for CRS’s as well.

With regards,
Marco

This e-mail and any attachments are confidential and are solely intended for the addressee. If you are not the intended recipient, please notify the sender and delete and/or destroy this message and any attachments immediately. It is prohibited to copy, to distribute, to disclose or to use this e-mail and any attachments in any other way. Ordina N.V. and/or its group companies do not accept any responsibility nor liability for any damage resulting from the content of and/or the transmission of this message.





Disclaimer
Dit bericht met eventuele bijlagen is vertrouwelijk en uitsluitend bestemd voor de geadresseerde. Indien u niet de bedoelde ontvanger bent, wordt u verzocht de afzender te waarschuwen en dit bericht met eventuele bijlagen direct te verwijderen en/of te vernietigen. Het is niet toegestaan dit bericht en eventuele bijlagen te vermenigvuldigen, door te sturen, openbaar te maken, op te slaan of op andere wijze te gebruiken. Ordina N.V. en/of haar groepsmaatschappijen accepteren geen verantwoordelijkheid of aansprakelijkheid voor schade die voortvloeit uit de inhoud en/of de verzending van dit bericht.

This e-mail and any attachments are confidential and are solely intended for the addressee. If you are not the intended recipient, please notify the sender and delete and/or destroy this message and any attachments immediately. It is prohibited to copy, to distribute, to disclose or to use this e-mail and any attachments in any other way. Ordina N.V. and/or its group companies do not accept any responsibility nor liability for any damage resulting from the content of and/or the transmission of this message.




Received on Friday, 29 July 2016 07:42:39 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:31:24 UTC