Re: Clarification required: BP6 "use HTTP URIs for spatial things" from Jeremy Tandy on 2016-08-24 (public-sdw-wg@w3.org from August 2016)

From: Jeremy Tandy <jeremy.tandy@gmail.com>
Date: Wed, 24 Aug 2016 08:27:59 +0000
To: Phil Archer <phila@w3.org>, Linda van den Brink <l.vandenbrink@geonovum.nl>, Bill Roberts <bill@swirrl.com>
Cc: SDW WG Public List <public-sdw-wg@w3.org>
Message-ID: <CADtUq_3MmerfaRXE8GYnt-0U=Y4zuL5tQxzBqqoZDquDFD15yw@mail.gmail.com>
Yes, I think so ... And we should do so if we are recommending "indirect
identification".

Jeremy
On Wed, 24 Aug 2016 at 09:24, Phil Archer <phila@w3.org> wrote:

> Bill's comments also made me think about some of the classic arguments,
> such as that a lake doesn't have a last updated date and isn't 435KB
> big. Which are true, however, that kind of metadata generally comes from
> the server, i.e. the HTTP layer. That's an over simplification but the
> point is that it is relatively easy to avoid deliberately creating
> misleading metadata - metadata about the doc rather than the thing it
> describes - and it's also generally easy to avoid looking for that
> metadata.
>
> Is there scope for some BP advice there?
>
> Phil.
>
> On 24/08/2016 08:25, Jeremy Tandy wrote:
> > Thanks Linda. More clear examples where being "correct" (in terms of
> > avoiding uri collisions by using two distinct uris) is making things
> worse
> > because users take the wrong one!
> >
> > So, as a WG, are we content to recommend this "indirect identification"
> > pattern where thing & info resource identifiers are conflated?
> >
> > Bill has added some good points about how to avoid impacts of uri
> > collision- by using the (dataset) metadata to talk about licenses and
> > creators for the information ...
> > On Wed, 24 Aug 2016 at 07:52, Linda van den Brink <
> l.vandenbrink@geonovum.nl>
> > wrote:
> >
> >> Experience from the Netherlands: we have the id/doc pattern in our URI
> >> strategy, based on the Cool URIs note [8] and the ISA study on
> persistent
> >> identifiers [9].
> >>
> >>
> >>
> >> That being said, same as Bill I also notice data users getting confused
> >> and generally using the /doc/  URI as that is the one they can copy from
> >> their browser address bar. This is not only casual confusion but also
> ends
> >> up in published information resources.
> >>
> >>
> >>
> >> You see this, for example, all over the CB-NL which is a vocabulary for
> >> the building sector and contains links to other Dutch standards such as
> >> IMGeo, an information model and vocabulary for large scale topography.
> E.g.
> >> the CB-NL concept of ‘Gebouw’ (Building) [10]  links to two IMGeo
> concepts
> >> ‘Pand’ (building part) and ‘Overig Bouwwerk’ (other construction) using
> >> their /doc/ URIs. If you click on Pand (which doesn’t have its own
> landing
> >> page in CB-NL so I can’t include the link) you will see it includes the
> >> /doc/  URI as the identifier of Pand.
> >>
> >>
> >>
> >> This is an example where it occurs in vocabularies, but I also see it
> >> happen with identifiers for data instances.
> >>
> >>
> >>
> >> [8]: https://www.w3.org/TR/cooluris/
> >>
> >> [9]:
> >>
> https://joinup.ec.europa.eu/sites/default/files/D7.1.3%20-%20Study%20on%20persistent%20URIs_0.pdf
> >> 10: http://ont.cbnl.org/cb/def/Gebouw
> >>
> >>
> >>
> >> Linda
> >>
> >>
> >>
> >> *Van:* Jeremy Tandy [mailto:jeremy.tandy@gmail.com]
> >> *Verzonden:* dinsdag 23 augustus 2016 20:57
> >> *Aan:* Bill Roberts
> >> *CC:* SDW WG Public List
> >> *Onderwerp:* Re: Clarification required: BP6 "use HTTP URIs for spatial
> >> things"
> >>
> >>
> >>
> >> Thanks Bill. Sounds very coherent ... I hoped for some responses such as
> >> this based on practical experience. Jeremy
> >>
> >> On Tue, 23 Aug 2016 at 19:41, Bill Roberts <bill@swirrl.com> wrote:
> >>
> >> ah Jeremy, you are a brave man to poke the sleeping beast of
> httpRange-14.
> >>
> >>
> >>
> >> But I'll get my thoughts in early, then I can tune out of the ensuing
> mail
> >> avalanche :-)
> >>
> >>
> >>
> >> When publishing Linked Data about places we (at Swirrl) generally do the
> >> id/doc fandango, but to be honest I think data users either don't
> notice,
> >> or they get confused by it.  In the applications we are working with
> (and I
> >> acknowledge that others may have different applications and different
> >> experiences), it wouldn't cause any problems to have a single URI, the
> 'id'
> >> URI if you like.  We just don't find a need to say anything about the
> /doc/
> >> URI.  If we were starting again, I'd probably ditch the /doc/ and the
> 303
> >> and rely on context and a little bit of documentation to make it clear
> what
> >> we mean.
> >>
> >>
> >>
> >> The place where we find a need to talk about creators and licences and
> >> modified dates is in metadata about datasets where a dataset might be a
> >> collection of information about a bunch of places - and we treat
> datasets
> >> as an 'information resource'. If someone requests a dataset URI we
> return a
> >> status code of 200 and the dataset metadata as the response.  That
> metadata
> >> includes info on where to get all the contents of the dataset if you
> want
> >> that.
> >>
> >>
> >>
> >> By the way, though it's sensible and consistent, I find that the implied
> >> and parallel property stuff makes it more rather than less complicated.
> >>
> >>
> >>
> >> Bill
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 23 August 2016 at 17:37, Jeremy Tandy <jeremy.tandy@gmail.com>
> wrote:
> >>
> >> All-
> >>
> >>
> >>
> >> Linda has done a great job of consolidating the best practices are use
> of
> >> identifiers. We have just one [1] now.
> >>
> >>
> >>
> >> Reading though just now, it occurred to me that there's still an open
> >> issue about identifier assignment ...
> >>
> >>
> >>
> >> W3C's Architecture of the World Wide Web constraint "URIs identify a
> >> single resource" [2] asserts "Assign distinct URIs to distinct
> resources"
> >> in order to avoid URI collisions [2a] which "often imposes a cost in
> >> communication due to the effort required to resolve ambiguities".
> >> Discussions from earlier years in UK Gov Linked Data working group (and
> >> elsewhere) concluded that the "real world thing" and "information
> resource
> >> that describes the real world thing" are separate resources. I think
> this
> >> is based on a (purist?) view when working with RDF of needing to be
> totally
> >> clear on "what's the subject" of each triple ... the thing or the
> document.
> >> This manifests as URIs with `id` or `doc` included somewhere to
> distinguish
> >> between the resources and some RDF triples to clarify that the doc
> resource
> >> is talking about the thing resource etc..
> >>
> >>
> >>
> >> (dangerously close to "httpRange-14" [3] here ... let's avoid that bear
> >> trap)
> >>
> >>
> >>
> >> Jeni Tennison's "URLs in Data Primer" draft TAG note captures this
> >> practice in §5.3 "Publishing data" [4]:
> >>
> >>
> >>
> >> ```
> >>
> >> Publishers can help enable more accurate merging of data from different
> >> sites if they support URLs for each entity
> >> <https://www.w3.org/TR/urls-in-data/#dfn-entity> they or other sites
> may
> >> wish to describe, separate from the landing pages
> >> <https://www.w3.org/TR/urls-in-data/#dfn-landing-page> or records
> >> <https://www.w3.org/TR/urls-in-data/#dfn-record> that they publish.
> >>
> >> ```
> >>
> >>
> >>
> >> Yet Architecture of the World Wide Web §2.2.3 "Indirect identification"
> >> [5] notes that:
> >>
> >>
> >>
> >> ```
> >>
> >> To say that the URI "mailto:nadia@example.com" identifies both an
> >> Internet mailbox and Nadia, the person, introduces a URI collision.
> >> However, we can use the URI to indirectly identify Nadia. Identifiers
> are
> >> commonly used in this way.
> >>
> >> ```
> >>
> >>
> >>
> >> This is consistent with what I recall TimBL saying at TPAC-2015 in
> regards
> >> to Vcard; come the finish, no one really cares to distinguish between
> the
> >> thing and its associated information resource.
> >>
> >>
> >>
> >> ... And in most cases, one can use context to determine whether a
> >> statement concerns the thing or the information resource. In those cases
> >> where you can't, "URLs in Data Primer" suggests some mechanisms to
> mitigate
> >> such confusion [6][7].
> >>
> >>
> >>
> >> I think that in our SDW WG discussion we have concluded that we _are_
> >> content to use "indirect identification" - e.g. that we use URIs that
> >> conflate the thing and document resource.
> >>
> >>
> >>
> >> Please can we confirm this? Assuming that indirect identification is
> >> "approved" as best practice, then it seems prudent to add a note to the
> BP
> >> document saying "don't worry about distinguishing between thing and
> >> resource; indirect identification is fine" (etc.)
> >>
> >>
> >>
> >> Thanks, Jeremy
> >>
> >>
> >>
> >> [1]: http://w3c.github.io/sdw/bp/#globally-unique-ids
> >>
> >> [2]: https://www.w3.org/TR/webarch/#pr-uri-collision
> >>
> >> [2a]: https://www.w3.org/TR/webarch/#URI-collision
> >>
> >> [3]: https://www.w3.org/2001/tag/group/track/issues/14
> >>
> >> [4]: https://www.w3.org/TR/urls-in-data/#publishing-data
> >>
> >> [5]: https://www.w3.org/TR/webarch/#indirect-identification
> >>
> >> [6]: https://www.w3.org/TR/urls-in-data/#documenting-properties
> >>
> >> [7]: https://www.w3.org/TR/urls-in-data/#authoring-specifications
> >>
> >>
> >>
> >>
> >
>
> --
>
>
> Phil Archer
> W3C Data Activity Lead
> http://www.w3.org/2013/data/
>
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
>
Received on Wednesday, 24 August 2016 08:28:40 UTC