Re: Clarification required: BP6 "use HTTP URIs for spatial things" from Clemens Portele on 2016-08-31 (public-sdw-wg@w3.org from August 2016)

From: Clemens Portele <portele@interactive-instruments.de>
Date: Wed, 31 Aug 2016 07:29:53 +0000
To: Jeremy Tandy <jeremy.tandy@gmail.com>, Bill Roberts <bill@swirrl.com>, Linda van den Brink <l.vandenbrink@geonovum.nl>
CC: SDW WG Public List <public-sdw-wg@w3.org>
Message-ID: <etPan.57c68772.20583d1d.123@interactive-instruments.de>

(Sorry for the delay in responding. I did not see it earlier due to other emails in the thread that followed after it.)

In general both explicit redirects and reverse proxies are options and I think it will depend on the setup what is preferred. When it gets to OGC web services, in most cases a 303 makes more sense to me as there is no reason to hide the implementation and typically OGC-web-service-aware clients will have access to the complete capabilities of the service anyhow including requests that do not have a persistent URI.

Clemens

On 24 August 2016 at 10:22:17, Jeremy Tandy (jeremy.tandy@gmail.com<mailto:jeremy.tandy@gmail.com>) wrote:

Thanks Clemens. Redirects (as you describe) are an important part of making sure that we have durable uris that resolve!

Would you recommend a HTTP 303 redirect or a pass through proxy that obscures the "implementation" URL?

Jeremy
On Wed, 24 Aug 2016 at 09:16, Clemens Portele <portele@interactive-instruments.de<mailto:portele@interactive-instruments.de>> wrote:
I agree, but it seems to me that we have lost a common case where using redirection should still at least be considered by data publishers, i.e. when the URL at which you get a resource representation is likely to change with time.

Note that I am not talking about the Spatial Thing changing - this is covered in item 3 of BP6, but a change in the URL, e.g. due to a change in the version of the OGC web service standard that is used in the implementation.

An example would be a redirect from a minted URI for a spatial thing to its WFS 2.0.0 GetFeatureById stored query URL, which may change due to organisational or software changes.

DWBP discusses such redirects in general, but only/mainly for dataset resources, so maybe it is worth to at least mention this in the SDWBP document?

Best regards,
Clemens

On 24 August 2016 at 09:25:33, Jeremy Tandy (jeremy.tandy@gmail.com<mailto:jeremy.tandy@gmail.com>) wrote:

Thanks Linda. More clear examples where being "correct" (in terms of avoiding uri collisions by using two distinct uris) is making things worse because users take the wrong one!

So, as a WG, are we content to recommend this "indirect identification" pattern where thing & info resource identifiers are conflated?

Bill has added some good points about how to avoid impacts of uri collision- by using the (dataset) metadata to talk about licenses and creators for the information ...
On Wed, 24 Aug 2016 at 07:52, Linda van den Brink <l.vandenbrink@geonovum.nl<mailto:l.vandenbrink@geonovum.nl>> wrote:
Experience from the Netherlands: we have the id/doc pattern in our URI strategy, based on the Cool URIs note [8] and the ISA study on persistent identifiers [9].

That being said, same as Bill I also notice data users getting confused and generally using the /doc/ URI as that is the one they can copy from their browser address bar. This is not only casual confusion but also ends up in published information resources.

You see this, for example, all over the CB-NL which is a vocabulary for the building sector and contains links to other Dutch standards such as IMGeo, an information model and vocabulary for large scale topography. E.g. the CB-NL concept of ‘Gebouw’ (Building) [10] links to two IMGeo concepts ‘Pand’ (building part) and ‘Overig Bouwwerk’ (other construction) using their /doc/ URIs. If you click on Pand (which doesn’t have its own landing page in CB-NL so I can’t include the link) you will see it includes the /doc/ URI as the identifier of Pand.

This is an example where it occurs in vocabularies, but I also see it happen with identifiers for data instances.

[8]: https://www.w3.org/TR/cooluris/

[9]: https://joinup.ec.europa.eu/sites/default/files/D7.1.3%20-%20Study%20on%20persistent%20URIs_0.pdf

10: http://ont.cbnl.org/cb/def/Gebouw

Linda

Van: Jeremy Tandy [mailto:jeremy.tandy@gmail.com<mailto:jeremy.tandy@gmail.com>]
Verzonden: dinsdag 23 augustus 2016 20:57
Aan: Bill Roberts
CC: SDW WG Public List
Onderwerp: Re: Clarification required: BP6 "use HTTP URIs for spatial things"

Thanks Bill. Sounds very coherent ... I hoped for some responses such as this based on practical experience. Jeremy
On Tue, 23 Aug 2016 at 19:41, Bill Roberts <bill@swirrl.com<mailto:bill@swirrl.com>> wrote:
ah Jeremy, you are a brave man to poke the sleeping beast of httpRange-14.

But I'll get my thoughts in early, then I can tune out of the ensuing mail avalanche :-)

When publishing Linked Data about places we (at Swirrl) generally do the id/doc fandango, but to be honest I think data users either don't notice, or they get confused by it. In the applications we are working with (and I acknowledge that others may have different applications and different experiences), it wouldn't cause any problems to have a single URI, the 'id' URI if you like. We just don't find a need to say anything about the /doc/ URI. If we were starting again, I'd probably ditch the /doc/ and the 303 and rely on context and a little bit of documentation to make it clear what we mean.

The place where we find a need to talk about creators and licences and modified dates is in metadata about datasets where a dataset might be a collection of information about a bunch of places - and we treat datasets as an 'information resource'. If someone requests a dataset URI we return a status code of 200 and the dataset metadata as the response. That metadata includes info on where to get all the contents of the dataset if you want that.

By the way, though it's sensible and consistent, I find that the implied and parallel property stuff makes it more rather than less complicated.

Bill

On 23 August 2016 at 17:37, Jeremy Tandy <jeremy.tandy@gmail.com<mailto:jeremy.tandy@gmail.com>> wrote:
All-

Linda has done a great job of consolidating the best practices are use of identifiers. We have just one [1] now.

Reading though just now, it occurred to me that there's still an open issue about identifier assignment ...

W3C's Architecture of the World Wide Web constraint "URIs identify a single resource" [2] asserts "Assign distinct URIs to distinct resources" in order to avoid URI collisions [2a] which "often imposes a cost in communication due to the effort required to resolve ambiguities". Discussions from earlier years in UK Gov Linked Data working group (and elsewhere) concluded that the "real world thing" and "information resource that describes the real world thing" are separate resources. I think this is based on a (purist?) view when working with RDF of needing to be totally clear on "what's the subject" of each triple ... the thing or the document. This manifests as URIs with `id` or `doc` included somewhere to distinguish between the resources and some RDF triples to clarify that the doc resource is talking about the thing resource etc..

(dangerously close to "httpRange-14" [3] here ... let's avoid that bear trap)

Jeni Tennison's "URLs in Data Primer" draft TAG note captures this practice in §5.3 "Publishing data" [4]:

```
Publishers can help enable more accurate merging of data from different sites if they support URLs for each entity<https://www.w3.org/TR/urls-in-data/#dfn-entity> they or other sites may wish to describe, separate from the landing pages<https://www.w3.org/TR/urls-in-data/#dfn-landing-page> or records<https://www.w3.org/TR/urls-in-data/#dfn-record> that they publish.
```

Yet Architecture of the World Wide Web §2.2.3 "Indirect identification" [5] notes that:

```
To say that the URI "mailto:nadia@example.com<mailto:nadia@example.com>" identifies both an Internet mailbox and Nadia, the person, introduces a URI collision. However, we can use the URI to indirectly identify Nadia. Identifiers are commonly used in this way.
```

This is consistent with what I recall TimBL saying at TPAC-2015 in regards to Vcard; come the finish, no one really cares to distinguish between the thing and its associated information resource.

... And in most cases, one can use context to determine whether a statement concerns the thing or the information resource. In those cases where you can't, "URLs in Data Primer" suggests some mechanisms to mitigate such confusion [6][7].

I think that in our SDW WG discussion we have concluded that we _are_ content to use "indirect identification" - e.g. that we use URIs that conflate the thing and document resource.

Please can we confirm this? Assuming that indirect identification is "approved" as best practice, then it seems prudent to add a note to the BP document saying "don't worry about distinguishing between thing and resource; indirect identification is fine" (etc.)

Thanks, Jeremy

[1]: http://w3c.github.io/sdw/bp/#globally-unique-ids

[2]: https://www.w3.org/TR/webarch/#pr-uri-collision

[2a]: https://www.w3.org/TR/webarch/#URI-collision

[3]: https://www.w3.org/2001/tag/group/track/issues/14

[4]: https://www.w3.org/TR/urls-in-data/#publishing-data

[5]: https://www.w3.org/TR/webarch/#indirect-identification

[6]: https://www.w3.org/TR/urls-in-data/#documenting-properties

[7]: https://www.w3.org/TR/urls-in-data/#authoring-specifications

Received on Wednesday, 31 August 2016 07:30:32 UTC