RE: Content negotiation of spatial linked data

Andrea, all

On Monday, September 26, 2016 1:19 PM, Andrea Perego [mailto:andrea.perego@jrc.ec.europa.eu] wrote:
> My comments inline.

My too:

> On 23/09/2016 20:25, Joshua Lieberman wrote:
> > Hi Frans,
> >
> > If only! There is no one standard for a list of numbers. Each language
> > does it a little differently, and each runtime library is full of
> > details for encoding types of numbers for particular hardware / firmware
> > architectures.
> >
> > Beyond this, though, geometry is not just a list of numbers. The numbers
> > are actually parameters for a geometric model of a real world feature.
> > This is true for other aspects of reality, to be sure, but specific or
> > even unique models are also required to connect numbers to each of those
> > aspects (e.g. sensor models!). So there is a great deal of agreement on
> > how to exchange the coordinates of a geometry, but each style of usage
> > encodes the aspects of the model in a different language, paradigm or
> > platform. They may also mangle the agreement, to be fair. GeoJSON is
> > close but not identical to WKT. KML and GML also make changes (e.g.
> > coordinate order). GeoSPARQL has an asWKT property which is not actually
> > WKT but EWKT (including the CRS). This is easier to ingest into PostGIS
> > directly but messes up discovery and managing first class geometry
> > entities. Then there are times when a text string is appropriate, other
> > when binary is better (WKT vs WKB). And by the way, ISO 8601 is not
> > always sufficient to standardize time coordinate representation.
> 
> +1. I see the issue of geometry encoding as analogous to the CRS's one
> (and to the more general one of "providing data in multiple formats").
> So, likewise, we shouldn't prevent people from using their own geometry
> encoding(s). What we could at most do, is to recommend they publish also
> in an encoding (to be decided) facilitating a wide re-use of geometry
> data (i.e., the equivalent, if any, of WGS84/CRS84 for CRSs).

Yes, I too think that this is a case of providing multiple formats. And it should definitely be a recommendation to serve the data in one to-be-specified encoding to promote interoperability.

> Actually, we had to address this issue in the GeoDCAT-AP WG, as I
> explained in an earlier email
> (https://lists.w3.org/Archives/Public/public-sdw-wg/2015Jun/0167.html).
> We were not able to come up with 1 recommended encoding. Eventually, we
> decided for 2 equally recommended, namely, WKT and GML, whereas GeoJSON
> ranked 2nd.
> 
> Not great for interoperability, but I think this gives an idea of the
> difficulty to find an agreement on the encoding that (quoting Ed) "rules
> 'em all".
> 
> A probably more reasonable approach is to recommend different encodings
> based on the use case - e.g.:
> - will your geometry data be used in Web applications? Provide also GeoJSON
> - will your geometry data be used in triple stores? Provide also WKT
> (AFAIK, triple stores have been supporting WKT for quite a while now,
> but not GeoJSON).

But what if we don't know who is using the data? And we cannot just say that if you serve RDF use WKT and if you serve JSON use GeoJSON since (hopefully) much of the JSON will be JSON-LD which is both... Is there a way to serialise GeoJSON in RDF?

> > It would be nice to encourage a strict  and correct asWKT serialization,
> > but you know that the horses such as GeoJSON keep leaving the barn, so
> > we’ll have to settle for clear and precise conversions. That leaves the
> > possibility of content negotiation. It seems dangerous to me to
> > negotiate not only the format of a response document but also the
> > formats of components within that document as a separate media-type. So
> > my preference is to stick with something like application/rdf+xml for
> > overall documents and define an accept-extension for coordinate format,
> > e.g. application/rdf+xml; geom=wkt

Yes, but in order to do that we must ensure that all content-types support additional parameters, and e. g. RDF/XML (application/rdf+xml) and Turtle (text/turtle) currently don't. I don't know how much work it would be to update all necessary content type registrations.

> > I don’t think it would be a good idea to count on fetching raw
> > coordinate strings by URL and managing them around the Web. We would be
> > extending things enough to make geometries first-class entities that can
> > be linked to by features, but also have a minimal set of other
> > properties besides the coordinate string that identify the model the
> > coordinates for which the coordinates are parametes and allow some
> > discovery.
> 
> I see your point, Josh, but, considering the current situation,
> supporting HTTP conneg for geometry encodings (WKT, GeoJSON, GML, KML,
> etc.) would be already a great step, IMO, towards a wider re-use of
> geometry data. And the advantage is that this can be implemented without
> extending the current HTTP header specifications.

I fear that we have to extend current http header specifications one way or the other. Not all content types allow parameters in the accept-header and I don't know another currently implemented way (which is why I propose schema-negotiation). I think the best we can do right now is to keep thinking about this and then try to find a solution in Amsterdam in November.

> Of course, this does not solve the issue about the different flavours of
> WKT ("plain" WKT, PostGIS's EWKT, GeoSPARQL WKT) or GML (i.e., CRS
> specified with a URN or an HTTP URI - as in GeoSPARQL), unless we think
> of specific media types for each of them. Or, alternatively, the use of
> the already mentioned proposal for "profile-based" content negotiation,
> where the profile URI denotes the variant defined in a given spec (e.g.,
> GeoSPARQL).

Well, if the geometries are first-class resources they could have their own media types, e. g. text/wkt, text/ewkt or application/gml+xml. Registering those shouldn't be too hard. My proposal of profiles is more biased towards the case where the geometries are embedded in the returned data (as defined in an application profile).

> About geometry encoding in RDF graphs, instead of choosing the encoding
> to be used, it is probably better to include more than one. E.g.,
> GeoSPARQL has :asWKT and :asGML: so why not using both? Of course,
> there's always the alternative of using geometry URIs instead, as
> Andreas pointed out earlier in this thread
> (https://lists.w3.org/Archives/Public/public-sdw-wg/2016Sep/0188.html).

Particularly if it's a very small geometry (e. g. a point or a bounding box) it feels like overkill to make the geometry an entity of its own; in this case it feels much easier just to have it as part of returned graph. Including two encodings in the graph is of course a possibility too, but then we should agree on a (small) discrete set of encodings, so that we don't end up bloating the graph with asWKT, asGML and asGeoJSON...

Best,

Lars

Received on Tuesday, 27 September 2016 12:26:11 UTC