Re: Clarification required: BP6 "use HTTP URIs for spatial things" from Jeremy Tandy on 2016-08-31 (public-sdw-wg@w3.org from August 2016)

From: Jeremy Tandy <jeremy.tandy@gmail.com>
Date: Wed, 31 Aug 2016 07:50:26 +0000
To: Clemens Portele <portele@interactive-instruments.de>, Rob Atkinson <rob@metalinkage.com.au>, Phil Archer <phila@w3.org>, Linda van den Brink <l.vandenbrink@geonovum.nl>, Bill Roberts <bill@swirrl.com>
Cc: SDW WG Public List <public-sdw-wg@w3.org>
Message-ID: <CADtUq_2dT8S-VnMUf5wjDGsQXJuPiOv20Oi5UviwNQFF7RGSqA@mail.gmail.com>
Thanks Rob & Clemens ...
On Wed, 31 Aug 2016 at 08:30, Clemens Portele <
portele@interactive-instruments.de> wrote:

> +1
>
>
> On 30 August 2016 at 10:10:26, Jeremy Tandy (jeremy.tandy@gmail.com)
> wrote:
>
> Hi. It would be good to close this issue out & include our collective
> recommendation in the BP doc working draft.
>
> PROPOSAL: SDW working group recommends use of "indirect identifiers" for
> spatial things
>
> ... I'll start the voting.
>
> +1
>
> Jeremy
>
> (BTW, to make sense of the PROPOSAL you'll need to read the email thread)
>
> On Fri, 26 Aug 2016 at 10:12 Linda van den Brink <
> l.vandenbrink@geonovum.nl> wrote:
>
>> So… do we agree we can recommend indirect identifiers, or do we try to
>> fix the issue with getting the correct identifier as Rob describes?
>>
>>
>>
>> While waiting for this I’ve updated the issue and the text referring to
>> the issue in BP6.
>>
>>
>>
>> *Van:* Rob Atkinson [mailto:rob@metalinkage.com.au]
>> *Verzonden:* woensdag 24 augustus 2016 13:56
>> *Aan:* Jeremy Tandy; Phil Archer; Linda van den Brink; Bill Roberts
>>
>>
>> *CC:* SDW WG Public List
>>
>> *Onderwerp:* Re: Clarification required: BP6 "use HTTP URIs for spatial
>> things"
>>
>>
>>
>> Hi
>>
>>
>>
>> Agree this is a real concern - people cant be blamed for doing the
>> obvious, if dumb, thing..
>>
>>
>>
>> I think we should take note of best practice in the HTML world - which is
>> often to include a citable link to a resource in the rendered view.  Or a
>> "share" or something similar. We can also put fairly explicit annotation in
>> machine-readable code - stating that the resource is about the URI - and
>> even notes saying when citing this resource use the URI....
>>
>>
>>
>> I'd also like to see browsers evolve to offer you the original link or
>> the redirected when cutting and pasting - how hard can it be!
>>
>>
>>
>> Maybe we can get Ed to ask around Google Chrome team for suggestions on
>> how best to handle this :-)
>>
>>
>>
>> Rob
>>
>>
>>
>>
>>
>>
>>
>> On Wed, 24 Aug 2016 at 18:27 Jeremy Tandy <jeremy.tandy@gmail.com> wrote:
>>
>> Yes, I think so ... And we should do so if we are recommending "indirect
>> identification".
>>
>> Jeremy
>>
>> On Wed, 24 Aug 2016 at 09:24, Phil Archer <phila@w3.org> wrote:
>>
>> Bill's comments also made me think about some of the classic arguments,
>> such as that a lake doesn't have a last updated date and isn't 435KB
>> big. Which are true, however, that kind of metadata generally comes from
>> the server, i.e. the HTTP layer. That's an over simplification but the
>> point is that it is relatively easy to avoid deliberately creating
>> misleading metadata - metadata about the doc rather than the thing it
>> describes - and it's also generally easy to avoid looking for that
>> metadata.
>>
>> Is there scope for some BP advice there?
>>
>> Phil.
>>
>> On 24/08/2016 08:25, Jeremy Tandy wrote:
>> > Thanks Linda. More clear examples where being "correct" (in terms of
>> > avoiding uri collisions by using two distinct uris) is making things
>> worse
>> > because users take the wrong one!
>> >
>> > So, as a WG, are we content to recommend this "indirect identification"
>> > pattern where thing & info resource identifiers are conflated?
>> >
>> > Bill has added some good points about how to avoid impacts of uri
>> > collision- by using the (dataset) metadata to talk about licenses and
>> > creators for the information ...
>> > On Wed, 24 Aug 2016 at 07:52, Linda van den Brink <
>> l.vandenbrink@geonovum.nl>
>> > wrote:
>> >
>> >> Experience from the Netherlands: we have the id/doc pattern in our URI
>> >> strategy, based on the Cool URIs note [8] and the ISA study on
>> persistent
>> >> identifiers [9].
>> >>
>> >>
>> >>
>> >> That being said, same as Bill I also notice data users getting confused
>> >> and generally using the /doc/  URI as that is the one they can copy
>> from
>> >> their browser address bar. This is not only casual confusion but also
>> ends
>> >> up in published information resources.
>> >>
>> >>
>> >>
>> >> You see this, for example, all over the CB-NL which is a vocabulary for
>> >> the building sector and contains links to other Dutch standards such as
>> >> IMGeo, an information model and vocabulary for large scale topography.
>> E.g.
>> >> the CB-NL concept of ‘Gebouw’ (Building) [10]  links to two IMGeo
>> concepts
>> >> ‘Pand’ (building part) and ‘Overig Bouwwerk’ (other construction) using
>> >> their /doc/ URIs. If you click on Pand (which doesn’t have its own
>> landing
>> >> page in CB-NL so I can’t include the link) you will see it includes the
>> >> /doc/  URI as the identifier of Pand.
>> >>
>> >>
>> >>
>> >> This is an example where it occurs in vocabularies, but I also see it
>> >> happen with identifiers for data instances.
>> >>
>> >>
>> >>
>> >> [8]: https://www.w3.org/TR/cooluris/
>> >>
>> >> [9]:
>> >>
>> https://joinup.ec.europa.eu/sites/default/files/D7.1.3%20-%20Study%20on%20persistent%20URIs_0.pdf
>> >> 10: http://ont.cbnl.org/cb/def/Gebouw
>> >>
>> >>
>> >>
>> >> Linda
>> >>
>> >>
>> >>
>> >> *Van:* Jeremy Tandy [mailto:jeremy.tandy@gmail.com]
>> >> *Verzonden:* dinsdag 23 augustus 2016 20:57
>> >> *Aan:* Bill Roberts
>> >> *CC:* SDW WG Public List
>> >> *Onderwerp:* Re: Clarification required: BP6 "use HTTP URIs for spatial
>> >> things"
>> >>
>> >>
>> >>
>> >> Thanks Bill. Sounds very coherent ... I hoped for some responses such
>> as
>> >> this based on practical experience. Jeremy
>> >>
>> >> On Tue, 23 Aug 2016 at 19:41, Bill Roberts <bill@swirrl.com> wrote:
>> >>
>> >> ah Jeremy, you are a brave man to poke the sleeping beast of
>> httpRange-14.
>> >>
>> >>
>> >>
>> >> But I'll get my thoughts in early, then I can tune out of the ensuing
>> mail
>> >> avalanche :-)
>> >>
>> >>
>> >>
>> >> When publishing Linked Data about places we (at Swirrl) generally do
>> the
>> >> id/doc fandango, but to be honest I think data users either don't
>> notice,
>> >> or they get confused by it.  In the applications we are working with
>> (and I
>> >> acknowledge that others may have different applications and different
>> >> experiences), it wouldn't cause any problems to have a single URI, the
>> 'id'
>> >> URI if you like.  We just don't find a need to say anything about the
>> /doc/
>> >> URI.  If we were starting again, I'd probably ditch the /doc/ and the
>> 303
>> >> and rely on context and a little bit of documentation to make it clear
>> what
>> >> we mean.
>> >>
>> >>
>> >>
>> >> The place where we find a need to talk about creators and licences and
>> >> modified dates is in metadata about datasets where a dataset might be a
>> >> collection of information about a bunch of places - and we treat
>> datasets
>> >> as an 'information resource'. If someone requests a dataset URI we
>> return a
>> >> status code of 200 and the dataset metadata as the response.  That
>> metadata
>> >> includes info on where to get all the contents of the dataset if you
>> want
>> >> that.
>> >>
>> >>
>> >>
>> >> By the way, though it's sensible and consistent, I find that the
>> implied
>> >> and parallel property stuff makes it more rather than less complicated.
>> >>
>> >>
>> >>
>> >> Bill
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 23 August 2016 at 17:37, Jeremy Tandy <jeremy.tandy@gmail.com>
>> wrote:
>> >>
>> >> All-
>> >>
>> >>
>> >>
>> >> Linda has done a great job of consolidating the best practices are use
>> of
>> >> identifiers. We have just one [1] now.
>> >>
>> >>
>> >>
>> >> Reading though just now, it occurred to me that there's still an open
>> >> issue about identifier assignment ...
>> >>
>> >>
>> >>
>> >> W3C's Architecture of the World Wide Web constraint "URIs identify a
>> >> single resource" [2] asserts "Assign distinct URIs to distinct
>> resources"
>> >> in order to avoid URI collisions [2a] which "often imposes a cost in
>> >> communication due to the effort required to resolve ambiguities".
>> >> Discussions from earlier years in UK Gov Linked Data working group (and
>> >> elsewhere) concluded that the "real world thing" and "information
>> resource
>> >> that describes the real world thing" are separate resources. I think
>> this
>> >> is based on a (purist?) view when working with RDF of needing to be
>> totally
>> >> clear on "what's the subject" of each triple ... the thing or the
>> document.
>> >> This manifests as URIs with `id` or `doc` included somewhere to
>> distinguish
>> >> between the resources and some RDF triples to clarify that the doc
>> resource
>> >> is talking about the thing resource etc..
>> >>
>> >>
>> >>
>> >> (dangerously close to "httpRange-14" [3] here ... let's avoid that bear
>> >> trap)
>> >>
>> >>
>> >>
>> >> Jeni Tennison's "URLs in Data Primer" draft TAG note captures this
>> >> practice in §5.3 "Publishing data" [4]:
>> >>
>> >>
>> >>
>> >> ```
>> >>
>> >> Publishers can help enable more accurate merging of data from different
>> >> sites if they support URLs for each entity
>> >> <https://www.w3.org/TR/urls-in-data/#dfn-entity> they or other sites
>> may
>> >> wish to describe, separate from the landing pages
>> >> <https://www.w3.org/TR/urls-in-data/#dfn-landing-page> or records
>> >> <https://www.w3.org/TR/urls-in-data/#dfn-record> that they publish.
>> >>
>> >> ```
>> >>
>> >>
>> >>
>> >> Yet Architecture of the World Wide Web §2.2.3 "Indirect identification"
>> >> [5] notes that:
>> >>
>> >>
>> >>
>> >> ```
>> >>
>> >> To say that the URI "mailto:nadia@example.com" identifies both an
>> >> Internet mailbox and Nadia, the person, introduces a URI collision.
>> >> However, we can use the URI to indirectly identify Nadia. Identifiers
>> are
>> >> commonly used in this way.
>> >>
>> >> ```
>> >>
>> >>
>> >>
>> >> This is consistent with what I recall TimBL saying at TPAC-2015 in
>> regards
>> >> to Vcard; come the finish, no one really cares to distinguish between
>> the
>> >> thing and its associated information resource.
>> >>
>> >>
>> >>
>> >> ... And in most cases, one can use context to determine whether a
>> >> statement concerns the thing or the information resource. In those
>> cases
>> >> where you can't, "URLs in Data Primer" suggests some mechanisms to
>> mitigate
>> >> such confusion [6][7].
>> >>
>> >>
>> >>
>> >> I think that in our SDW WG discussion we have concluded that we _are_
>> >> content to use "indirect identification" - e.g. that we use URIs that
>> >> conflate the thing and document resource.
>> >>
>> >>
>> >>
>> >> Please can we confirm this? Assuming that indirect identification is
>> >> "approved" as best practice, then it seems prudent to add a note to
>> the BP
>> >> document saying "don't worry about distinguishing between thing and
>> >> resource; indirect identification is fine" (etc.)
>> >>
>> >>
>> >>
>> >> Thanks, Jeremy
>> >>
>> >>
>> >>
>> >> [1]: http://w3c.github.io/sdw/bp/#globally-unique-ids
>> >>
>> >> [2]: https://www.w3.org/TR/webarch/#pr-uri-collision
>> >>
>> >> [2a]: https://www.w3.org/TR/webarch/#URI-collision
>> >>
>> >> [3]: https://www.w3.org/2001/tag/group/track/issues/14
>> >>
>> >> [4]: https://www.w3.org/TR/urls-in-data/#publishing-data
>> >>
>> >> [5]: https://www.w3.org/TR/webarch/#indirect-identification
>> >>
>> >> [6]: https://www.w3.org/TR/urls-in-data/#documenting-properties
>> >>
>> >> [7]: https://www.w3.org/TR/urls-in-data/#authoring-specifications
>> >>
>> >>
>> >>
>> >>
>> >
>>
>> --
>>
>>
>> Phil Archer
>> W3C Data Activity Lead
>> http://www.w3.org/2013/data/
>>
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1
>>
>>
Received on Wednesday, 31 August 2016 07:51:09 UTC