Re: Clarification required: BP6 "use HTTP URIs for spatial things" from Jeremy Tandy on 2016-08-24 (public-sdw-wg@w3.org from August 2016)

From: Jeremy Tandy <jeremy.tandy@gmail.com>
Date: Wed, 24 Aug 2016 08:22:17 +0000
To: Clemens Portele <portele@interactive-instruments.de>, Bill Roberts <bill@swirrl.com>, Linda van den Brink <l.vandenbrink@geonovum.nl>
Cc: SDW WG Public List <public-sdw-wg@w3.org>
Message-ID: <CADtUq_3=A6RaLzjihqe-+UZRS5is=ECJdtsgURGEqUaGUtWiDg@mail.gmail.com>
Thanks Clemens. Redirects (as you describe) are an important part of making
sure that we have durable uris that resolve!

Would you recommend a HTTP 303 redirect or a pass through proxy that
obscures the "implementation" URL?

Jeremy
On Wed, 24 Aug 2016 at 09:16, Clemens Portele <
portele@interactive-instruments.de> wrote:

> I agree, but it seems to me that we have lost a common case where using
> redirection should still at least be considered by data publishers, i.e.
> when the URL at which you get a resource representation is likely to change
> with time.
>
> Note that I am not talking about the Spatial Thing changing - this is
> covered in item 3 of BP6, but a change in the URL, e.g. due to a change in
> the version of the OGC web service standard that is used in the
> implementation.
>
> An example would be a redirect from a minted URI for a spatial thing to
> its WFS 2.0.0 GetFeatureById stored query URL, which may change due to
> organisational or software changes.
>
> DWBP discusses such redirects in general, but only/mainly for dataset
> resources, so maybe it is worth to at least mention this in the SDWBP
> document?
>
> Best regards,
> Clemens
>
>
> On 24 August 2016 at 09:25:33, Jeremy Tandy (jeremy.tandy@gmail.com)
> wrote:
>
> Thanks Linda. More clear examples where being "correct" (in terms of
> avoiding uri collisions by using two distinct uris) is making things worse
> because users take the wrong one!
>
> So, as a WG, are we content to recommend this "indirect identification"
> pattern where thing & info resource identifiers are conflated?
>
> Bill has added some good points about how to avoid impacts of uri
> collision- by using the (dataset) metadata to talk about licenses and
> creators for the information ...
> On Wed, 24 Aug 2016 at 07:52, Linda van den Brink <
> l.vandenbrink@geonovum.nl> wrote:
>
>> Experience from the Netherlands: we have the id/doc pattern in our URI
>> strategy, based on the Cool URIs note [8] and the ISA study on persistent
>> identifiers [9].
>>
>>
>>
>> That being said, same as Bill I also notice data users getting confused
>> and generally using the /doc/  URI as that is the one they can copy from
>> their browser address bar. This is not only casual confusion but also ends
>> up in published information resources.
>>
>>
>>
>> You see this, for example, all over the CB-NL which is a vocabulary for
>> the building sector and contains links to other Dutch standards such as
>> IMGeo, an information model and vocabulary for large scale topography. E.g.
>> the CB-NL concept of ‘Gebouw’ (Building) [10]  links to two IMGeo concepts
>> ‘Pand’ (building part) and ‘Overig Bouwwerk’ (other construction) using
>> their /doc/ URIs. If you click on Pand (which doesn’t have its own landing
>> page in CB-NL so I can’t include the link) you will see it includes the
>> /doc/  URI as the identifier of Pand.
>>
>>
>>
>> This is an example where it occurs in vocabularies, but I also see it
>> happen with identifiers for data instances.
>>
>>
>>
>> [8]: https://www.w3.org/TR/cooluris/
>>
>> [9]:
>> https://joinup.ec.europa.eu/sites/default/files/D7.1.3%20-%20Study%20on%20persistent%20URIs_0.pdf
>> 10: http://ont.cbnl.org/cb/def/Gebouw
>>
>>
>>
>> Linda
>>
>>
>>
>> *Van:* Jeremy Tandy [mailto:jeremy.tandy@gmail.com]
>> *Verzonden:* dinsdag 23 augustus 2016 20:57
>> *Aan:* Bill Roberts
>> *CC:* SDW WG Public List
>> *Onderwerp:* Re: Clarification required: BP6 "use HTTP URIs for spatial
>> things"
>>
>>
>>
>> Thanks Bill. Sounds very coherent ... I hoped for some responses such as
>> this based on practical experience. Jeremy
>>
>> On Tue, 23 Aug 2016 at 19:41, Bill Roberts <bill@swirrl.com> wrote:
>>
>> ah Jeremy, you are a brave man to poke the sleeping beast of httpRange-14.
>>
>>
>>
>> But I'll get my thoughts in early, then I can tune out of the ensuing
>> mail avalanche :-)
>>
>>
>>
>> When publishing Linked Data about places we (at Swirrl) generally do the
>> id/doc fandango, but to be honest I think data users either don't notice,
>> or they get confused by it.  In the applications we are working with (and I
>> acknowledge that others may have different applications and different
>> experiences), it wouldn't cause any problems to have a single URI, the 'id'
>> URI if you like.  We just don't find a need to say anything about the /doc/
>> URI.  If we were starting again, I'd probably ditch the /doc/ and the 303
>> and rely on context and a little bit of documentation to make it clear what
>> we mean.
>>
>>
>>
>> The place where we find a need to talk about creators and licences and
>> modified dates is in metadata about datasets where a dataset might be a
>> collection of information about a bunch of places - and we treat datasets
>> as an 'information resource'. If someone requests a dataset URI we return a
>> status code of 200 and the dataset metadata as the response.  That metadata
>> includes info on where to get all the contents of the dataset if you want
>> that.
>>
>>
>>
>> By the way, though it's sensible and consistent, I find that the implied
>> and parallel property stuff makes it more rather than less complicated.
>>
>>
>>
>> Bill
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 23 August 2016 at 17:37, Jeremy Tandy <jeremy.tandy@gmail.com> wrote:
>>
>> All-
>>
>>
>>
>> Linda has done a great job of consolidating the best practices are use of
>> identifiers. We have just one [1] now.
>>
>>
>>
>> Reading though just now, it occurred to me that there's still an open
>> issue about identifier assignment ...
>>
>>
>>
>> W3C's Architecture of the World Wide Web constraint "URIs identify a
>> single resource" [2] asserts "Assign distinct URIs to distinct resources"
>> in order to avoid URI collisions [2a] which "often imposes a cost in
>> communication due to the effort required to resolve ambiguities".
>> Discussions from earlier years in UK Gov Linked Data working group (and
>> elsewhere) concluded that the "real world thing" and "information resource
>> that describes the real world thing" are separate resources. I think this
>> is based on a (purist?) view when working with RDF of needing to be totally
>> clear on "what's the subject" of each triple ... the thing or the document.
>> This manifests as URIs with `id` or `doc` included somewhere to distinguish
>> between the resources and some RDF triples to clarify that the doc resource
>> is talking about the thing resource etc..
>>
>>
>>
>> (dangerously close to "httpRange-14" [3] here ... let's avoid that bear
>> trap)
>>
>>
>>
>> Jeni Tennison's "URLs in Data Primer" draft TAG note captures this
>> practice in §5.3 "Publishing data" [4]:
>>
>>
>>
>> ```
>>
>> Publishers can help enable more accurate merging of data from different
>> sites if they support URLs for each entity
>> <https://www.w3.org/TR/urls-in-data/#dfn-entity> they or other sites may
>> wish to describe, separate from the landing pages
>> <https://www.w3.org/TR/urls-in-data/#dfn-landing-page> or records
>> <https://www.w3.org/TR/urls-in-data/#dfn-record> that they publish.
>>
>> ```
>>
>>
>>
>> Yet Architecture of the World Wide Web §2.2.3 "Indirect identification"
>> [5] notes that:
>>
>>
>>
>> ```
>>
>> To say that the URI "mailto:nadia@example.com" identifies both an
>> Internet mailbox and Nadia, the person, introduces a URI collision.
>> However, we can use the URI to indirectly identify Nadia. Identifiers are
>> commonly used in this way.
>>
>> ```
>>
>>
>>
>> This is consistent with what I recall TimBL saying at TPAC-2015 in
>> regards to Vcard; come the finish, no one really cares to distinguish
>> between the thing and its associated information resource.
>>
>>
>>
>> ... And in most cases, one can use context to determine whether a
>> statement concerns the thing or the information resource. In those cases
>> where you can't, "URLs in Data Primer" suggests some mechanisms to mitigate
>> such confusion [6][7].
>>
>>
>>
>> I think that in our SDW WG discussion we have concluded that we _are_
>> content to use "indirect identification" - e.g. that we use URIs that
>> conflate the thing and document resource.
>>
>>
>>
>> Please can we confirm this? Assuming that indirect identification is
>> "approved" as best practice, then it seems prudent to add a note to the BP
>> document saying "don't worry about distinguishing between thing and
>> resource; indirect identification is fine" (etc.)
>>
>>
>>
>> Thanks, Jeremy
>>
>>
>>
>> [1]: http://w3c.github.io/sdw/bp/#globally-unique-ids
>>
>> [2]: https://www.w3.org/TR/webarch/#pr-uri-collision
>>
>> [2a]: https://www.w3.org/TR/webarch/#URI-collision
>>
>> [3]: https://www.w3.org/2001/tag/group/track/issues/14
>>
>> [4]: https://www.w3.org/TR/urls-in-data/#publishing-data
>>
>> [5]: https://www.w3.org/TR/webarch/#indirect-identification
>>
>> [6]: https://www.w3.org/TR/urls-in-data/#documenting-properties
>>
>> [7]: https://www.w3.org/TR/urls-in-data/#authoring-specifications
>>
>>
>>
>>
Received on Wednesday, 24 August 2016 08:22:57 UTC