Clarification required: BP6 "use HTTP URIs for spatial things" from Jeremy Tandy on 2016-08-23 (public-sdw-wg@w3.org from August 2016)

From: Jeremy Tandy <jeremy.tandy@gmail.com>
Date: Tue, 23 Aug 2016 15:37:56 +0000
To: SDW WG Public List <public-sdw-wg@w3.org>
Message-ID: <CADtUq_3HpA85T72F-TG8ykCabnpctthV1OctOR7ZFGh2A3L0UA@mail.gmail.com>

All-

Linda has done a great job of consolidating the best practices are use of
identifiers. We have just one [1] now.

Reading though just now, it occurred to me that there's still an open issue
about identifier assignment ...

W3C's Architecture of the World Wide Web constraint "URIs identify a single
resource" [2] asserts "Assign distinct URIs to distinct resources" in order
to avoid URI collisions [2a] which "often imposes a cost in communication
due to the effort required to resolve ambiguities". Discussions from
earlier years in UK Gov Linked Data working group (and elsewhere) concluded
that the "real world thing" and "information resource that describes the
real world thing" are separate resources. I think this is based on a
(purist?) view when working with RDF of needing to be totally clear on
"what's the subject" of each triple ... the thing or the document. This
manifests as URIs with `id` or `doc` included somewhere to distinguish
between the resources and some RDF triples to clarify that the doc resource
is talking about the thing resource etc..

(dangerously close to "httpRange-14" [3] here ... let's avoid that bear
trap)

Jeni Tennison's "URLs in Data Primer" draft TAG note captures this practice
in §5.3 "Publishing data" [4]:

```
Publishers can help enable more accurate merging of data from different
sites if they support URLs for each entity
<https://www.w3.org/TR/urls-in-data/#dfn-entity> they or other sites may
wish to describe, separate from the landing pages
<https://www.w3.org/TR/urls-in-data/#dfn-landing-page> or records
<https://www.w3.org/TR/urls-in-data/#dfn-record> that they publish.
```

Yet Architecture of the World Wide Web §2.2.3 "Indirect identification" [5]
notes that:

```
To say that the URI "mailto:nadia@example.com" identifies both an Internet
mailbox and Nadia, the person, introduces a URI collision. However, we can
use the URI to indirectly identify Nadia. Identifiers are commonly used in
this way.
```

This is consistent with what I recall TimBL saying at TPAC-2015 in regards
to Vcard; come the finish, no one really cares to distinguish between the
thing and its associated information resource.

... And in most cases, one can use context to determine whether a statement
concerns the thing or the information resource. In those cases where you
can't, "URLs in Data Primer" suggests some mechanisms to mitigate such
confusion [6][7].

I think that in our SDW WG discussion we have concluded that we _are_
content to use "indirect identification" - e.g. that we use URIs that
conflate the thing and document resource.

Please can we confirm this? Assuming that indirect identification is
"approved" as best practice, then it seems prudent to add a note to the BP
document saying "don't worry about distinguishing between thing and
resource; indirect identification is fine" (etc.)

Thanks, Jeremy

[1]: http://w3c.github.io/sdw/bp/#globally-unique-ids
[2]: https://www.w3.org/TR/webarch/#pr-uri-collision
[2a]: https://www.w3.org/TR/webarch/#URI-collision
[3]: https://www.w3.org/2001/tag/group/track/issues/14
[4]: https://www.w3.org/TR/urls-in-data/#publishing-data
[5]: https://www.w3.org/TR/webarch/#indirect-identification
[6]: https://www.w3.org/TR/urls-in-data/#documenting-properties
[7]: https://www.w3.org/TR/urls-in-data/#authoring-specifications

Received on Tuesday, 23 August 2016 15:38:36 UTC