Re: Clarification required: BP6 "use HTTP URIs for spatial things" from Andrea Perego on 2016-09-05 (public-sdw-wg@w3.org from September 2016)

From: Andrea Perego <andrea.perego@jrc.ec.europa.eu>
Date: Mon, 05 Sep 2016 11:12:23 +0200
To: Linda van den Brink <l.vandenbrink@geonovum.nl>, Jeremy Tandy <jeremy.tandy@gmail.com>, Bill Roberts <bill@swirrl.com>
Cc: SDW WG Public List <public-sdw-wg@w3.org>, Peter Parslow <peter.parslow@ordnancesurvey.co.uk>
Message-id: <964c7927-c645-ed02-586e-e02f001c888d@jrc.ec.europa.eu>
Hello, everyone.

I'm trying to go through all the mail discussions after being unplugged 
for one month - my apologies in advance if my comments are not 
completely aligned with the latest developments.


Just for sake of completeness wrt the existing guidelines / 
implementations, I think it may be worth summarising the UK work on URI 
sets mentioned by Jeremy at the beginning of this thread - in 
particular, the one concerning "Designing URI Sets for Location" [1]. I 
kindly ask anyone in the WG with a better understanding of this 
specification to correct possible mistakes in what I say below.


About the /id/ & /doc/ dilemma:

Besides /id/ and /doc/, [1] includes a specific URI pattern for "spatial 
objects", namely, /so/. Note that in [1], following INSPIRE, "spatial 
object" = ISO 19100 "(geographic) feature":

http://inspire.ec.europa.eu/glossary/SpatialObject

(On this topic, see also the relevant comment on GH from Peter Parslow 
(cc'ed) [2]).

The examples in [1] refer to Manchester Piccadilly Station, where you have:

1. The URI for the real-world thing:

http://transport.data.gov.uk/id/station/MAN

2. The URI about the description (metadata) of the real-world thing:

http://transport.data.gov.uk/doc/station/MAN

3. The URIs of two spatial objects (features) "abstracting" the 
real-world thing:

http://location.data.gov.uk/so/tn/RailwayStationNode/nwkr/123456

http://location.data.gov.uk/so/tn/RailwayStationArea/nwkr/456789

4. The URIs for the descriptions (metadata) of the spatial objects above:

http://location.data.gov.uk/doc/tn/RailwayStationNode/nwkr/123456

http://location.data.gov.uk/doc/tn/RailwayStationArea/nwkr/456789

5. URIs for different serialisations / renditions are supported for 
metadata and spatial objects (features) - e.g.:

http://transport.data.gov.uk/doc/station/MAN.csv
http://transport.data.gov.uk/doc/station/MAN.html
http://transport.data.gov.uk/doc/station/MAN.json
http://transport.data.gov.uk/doc/station/MAN.rdf
http://transport.data.gov.uk/doc/station/MAN.text
http://transport.data.gov.uk/doc/station/MAN.ttl
http://transport.data.gov.uk/doc/station/MAN.xml

http://location.data.gov.uk/so/tn/RailwayStationNode/nwkr/123456.gml
http://location.data.gov.uk/so/tn/RailwayStationNode/nwkr/123456.ttl


About the relationships used for linking real-world things, features and 
serialisations / renditions:

(a) The relationship used in the examples to link real-world things to 
the corresponding features is rdfs:seeAlso. On the other hand, the 
relationship between the features and the corresponding real-world thing 
is ":abstracts" (but, AFAIK, this property has no formal definition).

(b) The relationship between the metadata and the described resource 
(real-world thing or feature) is foaf:primaryTopic.

(c) The URI of the actual serialisations of metadata and features are 
specified in the corresponding metadata records with dct:hasFormat.

An example of how the relationships above are used is provided by the 
sample Turtle code in [1] (page 16).

Finally, the use of owl:sameAs is also addressed at pages 6-7 (§34), but 
it's limited to thematic references of real-world things - copy-pasting 
the relevant example:

[[
For example thematic references to Manchester Piccadilly Railway Station 
which is coded “MAN” for customer reservation purposes and “MNCRPIC” for 
timetabling and scheduling purposes give rise to the
following URI:

http://location.data.gov.uk/id/tn/station/crs/MAN
http://location.data.gov.uk/id/tn/station/tiploc/MNCRPIC
]]


With respect to (my understanding of) the discussion in this thread, I 
wonder whether [1] may be a starting point to derive alternative 
(possibly compatible) solutions, with different levels of complexity, on 
how to model (and use URIs for) real-world things, features and their 
renditions.

Cheers,

Andrea

----
[1]https://data.gov.uk/library/designing-uri-sets-for-location
[2]https://github.com/w3c/sdw/issues/206#issuecomment-173215852


On 24/08/2016 08:52, Linda van den Brink wrote:
> Experience from the Netherlands: we have the id/doc pattern in our URI
> strategy, based on the Cool URIs note [8] and the ISA study on
> persistent identifiers [9].
>
>
>
> That being said, same as Bill I also notice data users getting confused
> and generally using the /doc/  URI as that is the one they can copy from
> their browser address bar. This is not only casual confusion but also
> ends up in published information resources.
>
>
>
> You see this, for example, all over the CB-NL which is a vocabulary for
> the building sector and contains links to other Dutch standards such as
> IMGeo, an information model and vocabulary for large scale topography.
> E.g. the CB-NL concept of ‘Gebouw’ (Building) [10]  links to two IMGeo
> concepts ‘Pand’ (building part) and ‘Overig Bouwwerk’ (other
> construction) using their /doc/ URIs. If you click on Pand (which
> doesn’t have its own landing page in CB-NL so I can’t include the link)
> you will see it includes the /doc/  URI as the identifier of Pand.
>
>
>
> This is an example where it occurs in vocabularies, but I also see it
> happen with identifiers for data instances.
>
>
>
> [8]: https://www.w3.org/TR/cooluris/
>
> [9]:
> https://joinup.ec.europa.eu/sites/default/files/D7.1.3%20-%20Study%20on%20persistent%20URIs_0.pdf
>
>
>
>     10: http://ont.cbnl.org/cb/def/Gebouw
>
>
>
> Linda
>
>
>
> *Van:*Jeremy Tandy [mailto:jeremy.tandy@gmail.com]
> *Verzonden:* dinsdag 23 augustus 2016 20:57
> *Aan:* Bill Roberts
> *CC:* SDW WG Public List
> *Onderwerp:* Re: Clarification required: BP6 "use HTTP URIs for spatial
> things"
>
>
>
> Thanks Bill. Sounds very coherent ... I hoped for some responses such as
> this based on practical experience. Jeremy
>
> On Tue, 23 Aug 2016 at 19:41, Bill Roberts <bill@swirrl.com
> <mailto:bill@swirrl.com>> wrote:
>
>     ah Jeremy, you are a brave man to poke the sleeping beast of
>     httpRange-14.
>
>
>
>     But I'll get my thoughts in early, then I can tune out of the
>     ensuing mail avalanche :-)
>
>
>
>     When publishing Linked Data about places we (at Swirrl) generally do
>     the id/doc fandango, but to be honest I think data users either
>     don't notice, or they get confused by it.  In the applications we
>     are working with (and I acknowledge that others may have different
>     applications and different experiences), it wouldn't cause any
>     problems to have a single URI, the 'id' URI if you like.  We just
>     don't find a need to say anything about the /doc/ URI.  If we were
>     starting again, I'd probably ditch the /doc/ and the 303 and rely on
>     context and a little bit of documentation to make it clear what we mean.
>
>
>
>     The place where we find a need to talk about creators and licences
>     and modified dates is in metadata about datasets where a dataset
>     might be a collection of information about a bunch of places - and
>     we treat datasets as an 'information resource'. If someone requests
>     a dataset URI we return a status code of 200 and the dataset
>     metadata as the response.  That metadata includes info on where to
>     get all the contents of the dataset if you want that.
>
>
>
>     By the way, though it's sensible and consistent, I find that the
>     implied and parallel property stuff makes it more rather than less
>     complicated.
>
>
>
>     Bill
>
>
>
>
>
>
>
>
>
>
>
>
>
>     On 23 August 2016 at 17:37, Jeremy Tandy <jeremy.tandy@gmail.com
>     <mailto:jeremy.tandy@gmail.com>> wrote:
>
>     All-
>
>
>
>     Linda has done a great job of consolidating the best practices are
>     use of identifiers. We have just one [1] now.
>
>
>
>     Reading though just now, it occurred to me that there's still an
>     open issue about identifier assignment ...
>
>
>
>     W3C's Architecture of the World Wide Web constraint "URIs identify a
>     single resource" [2] asserts "Assign distinct URIs to distinct
>     resources" in order to avoid URI collisions [2a] which "often
>     imposes a cost in communication due to the effort required to
>     resolve ambiguities". Discussions from earlier years in UK Gov
>     Linked Data working group (and elsewhere) concluded that the "real
>     world thing" and "information resource that describes the real world
>     thing" are separate resources. I think this is based on a (purist?)
>     view when working with RDF of needing to be totally clear on "what's
>     the subject" of each triple ... the thing or the document. This
>     manifests as URIs with `id` or `doc` included somewhere to
>     distinguish between the resources and some RDF triples to clarify
>     that the doc resource is talking about the thing resource etc..
>
>
>
>     (dangerously close to "httpRange-14" [3] here ... let's avoid that
>     bear trap)
>
>
>
>     Jeni Tennison's "URLs in Data Primer" draft TAG note captures this
>     practice in §5.3 "Publishing data" [4]:
>
>
>
>     ```
>
>     Publishers can help enable more accurate merging of data from
>     different sites if they support URLs for each entity
>     <https://www.w3.org/TR/urls-in-data/#dfn-entity> they or other sites
>     may wish to describe, separate from the landing pages
>     <https://www.w3.org/TR/urls-in-data/#dfn-landing-page> or records
>     <https://www.w3.org/TR/urls-in-data/#dfn-record> that they publish.
>
>     ```
>
>
>
>     Yet Architecture of the World Wide Web §2.2.3 "Indirect
>     identification" [5] notes that:
>
>
>
>     ```
>
>     To say that the URI "mailto:nadia@example.com
>     <mailto:nadia@example.com>" identifies both an Internet mailbox and
>     Nadia, the person, introduces a URI collision. However, we can use
>     the URI to indirectly identify Nadia. Identifiers are commonly used
>     in this way.
>
>     ```
>
>
>
>     This is consistent with what I recall TimBL saying at TPAC-2015 in
>     regards to Vcard; come the finish, no one really cares to
>     distinguish between the thing and its associated information resource.
>
>
>
>     ... And in most cases, one can use context to determine whether a
>     statement concerns the thing or the information resource. In those
>     cases where you can't, "URLs in Data Primer" suggests some
>     mechanisms to mitigate such confusion [6][7].
>
>
>
>     I think that in our SDW WG discussion we have concluded that we
>     _are_ content to use "indirect identification" - e.g. that we use
>     URIs that conflate the thing and document resource.
>
>
>
>     Please can we confirm this? Assuming that indirect identification is
>     "approved" as best practice, then it seems prudent to add a note to
>     the BP document saying "don't worry about distinguishing between
>     thing and resource; indirect identification is fine" (etc.)
>
>
>
>     Thanks, Jeremy
>
>
>
>     [1]: http://w3c.github.io/sdw/bp/#globally-unique-ids
>
>     [2]: https://www.w3.org/TR/webarch/#pr-uri-collision
>
>     [2a]: https://www.w3.org/TR/webarch/#URI-collision
>
>     [3]: https://www.w3.org/2001/tag/group/track/issues/14
>
>     [4]: https://www.w3.org/TR/urls-in-data/#publishing-data
>
>     [5]: https://www.w3.org/TR/webarch/#indirect-identification
>
>     [6]: https://www.w3.org/TR/urls-in-data/#documenting-properties
>
>     [7]: https://www.w3.org/TR/urls-in-data/#authoring-specifications
>
>
>

-- 
Andrea Perego, Ph.D.
Scientific / Technical Project Officer
European Commission DG JRC
Directorate B - Growth and Innovation
Unit B6 - Digital Economy
Via E. Fermi, 2749 - TP 262
21027 Ispra VA, Italy

https://ec.europa.eu/jrc/

----
The views expressed are purely those of the writer and may
not in any circumstances be regarded as stating an official
position of the European Commission.
Received on Monday, 5 September 2016 09:13:16 UTC