Re: Clarification required: BP6 "use HTTP URIs for spatial things" from Bill Roberts on 2016-08-23 (public-sdw-wg@w3.org from August 2016)

From: Bill Roberts <bill@swirrl.com>
Date: Tue, 23 Aug 2016 20:41:23 +0200
To: Jeremy Tandy <jeremy.tandy@gmail.com>
Cc: SDW WG Public List <public-sdw-wg@w3.org>
Message-ID: <CAMTVsuke=54niNLca96Kv9EOv-g8LvcPJLR0Gp=cV7U5QTSdPQ@mail.gmail.com>
ah Jeremy, you are a brave man to poke the sleeping beast of httpRange-14.

But I'll get my thoughts in early, then I can tune out of the ensuing mail
avalanche :-)

When publishing Linked Data about places we (at Swirrl) generally do the
id/doc fandango, but to be honest I think data users either don't notice,
or they get confused by it.  In the applications we are working with (and I
acknowledge that others may have different applications and different
experiences), it wouldn't cause any problems to have a single URI, the 'id'
URI if you like.  We just don't find a need to say anything about the /doc/
URI.  If we were starting again, I'd probably ditch the /doc/ and the 303
and rely on context and a little bit of documentation to make it clear what
we mean.

The place where we find a need to talk about creators and licences and
modified dates is in metadata about datasets where a dataset might be a
collection of information about a bunch of places - and we treat datasets
as an 'information resource'. If someone requests a dataset URI we return a
status code of 200 and the dataset metadata as the response.  That metadata
includes info on where to get all the contents of the dataset if you want
that.

By the way, though it's sensible and consistent, I find that the implied
and parallel property stuff makes it more rather than less complicated.

Bill






On 23 August 2016 at 17:37, Jeremy Tandy <jeremy.tandy@gmail.com> wrote:

> All-
>
> Linda has done a great job of consolidating the best practices are use of
> identifiers. We have just one [1] now.
>
> Reading though just now, it occurred to me that there's still an open
> issue about identifier assignment ...
>
> W3C's Architecture of the World Wide Web constraint "URIs identify a
> single resource" [2] asserts "Assign distinct URIs to distinct resources"
> in order to avoid URI collisions [2a] which "often imposes a cost in
> communication due to the effort required to resolve ambiguities".
> Discussions from earlier years in UK Gov Linked Data working group (and
> elsewhere) concluded that the "real world thing" and "information resource
> that describes the real world thing" are separate resources. I think this
> is based on a (purist?) view when working with RDF of needing to be totally
> clear on "what's the subject" of each triple ... the thing or the document.
> This manifests as URIs with `id` or `doc` included somewhere to distinguish
> between the resources and some RDF triples to clarify that the doc resource
> is talking about the thing resource etc..
>
> (dangerously close to "httpRange-14" [3] here ... let's avoid that bear
> trap)
>
> Jeni Tennison's "URLs in Data Primer" draft TAG note captures this
> practice in §5.3 "Publishing data" [4]:
>
> ```
> Publishers can help enable more accurate merging of data from different
> sites if they support URLs for each entity
> <https://www.w3.org/TR/urls-in-data/#dfn-entity> they or other sites may
> wish to describe, separate from the landing pages
> <https://www.w3.org/TR/urls-in-data/#dfn-landing-page> or records
> <https://www.w3.org/TR/urls-in-data/#dfn-record> that they publish.
> ```
>
> Yet Architecture of the World Wide Web §2.2.3 "Indirect identification"
> [5] notes that:
>
> ```
> To say that the URI "mailto:nadia@example.com" identifies both an
> Internet mailbox and Nadia, the person, introduces a URI collision.
> However, we can use the URI to indirectly identify Nadia. Identifiers are
> commonly used in this way.
> ```
>
> This is consistent with what I recall TimBL saying at TPAC-2015 in regards
> to Vcard; come the finish, no one really cares to distinguish between the
> thing and its associated information resource.
>
> ... And in most cases, one can use context to determine whether a
> statement concerns the thing or the information resource. In those cases
> where you can't, "URLs in Data Primer" suggests some mechanisms to mitigate
> such confusion [6][7].
>
> I think that in our SDW WG discussion we have concluded that we _are_
> content to use "indirect identification" - e.g. that we use URIs that
> conflate the thing and document resource.
>
> Please can we confirm this? Assuming that indirect identification is
> "approved" as best practice, then it seems prudent to add a note to the BP
> document saying "don't worry about distinguishing between thing and
> resource; indirect identification is fine" (etc.)
>
> Thanks, Jeremy
>
> [1]: http://w3c.github.io/sdw/bp/#globally-unique-ids
> [2]: https://www.w3.org/TR/webarch/#pr-uri-collision
> [2a]: https://www.w3.org/TR/webarch/#URI-collision
> [3]: https://www.w3.org/2001/tag/group/track/issues/14
> [4]: https://www.w3.org/TR/urls-in-data/#publishing-data
> [5]: https://www.w3.org/TR/webarch/#indirect-identification
> [6]: https://www.w3.org/TR/urls-in-data/#documenting-properties
> [7]: https://www.w3.org/TR/urls-in-data/#authoring-specifications
>
Received on Tuesday, 23 August 2016 18:41:54 UTC