- From: Andrea Perego <andrea.perego@jrc.ec.europa.eu>
- Date: Mon, 05 Sep 2016 11:12:23 +0200
- To: Linda van den Brink <l.vandenbrink@geonovum.nl>, Jeremy Tandy <jeremy.tandy@gmail.com>, Bill Roberts <bill@swirrl.com>
- Cc: SDW WG Public List <public-sdw-wg@w3.org>, Peter Parslow <peter.parslow@ordnancesurvey.co.uk>
Hello, everyone. I'm trying to go through all the mail discussions after being unplugged for one month - my apologies in advance if my comments are not completely aligned with the latest developments. Just for sake of completeness wrt the existing guidelines / implementations, I think it may be worth summarising the UK work on URI sets mentioned by Jeremy at the beginning of this thread - in particular, the one concerning "Designing URI Sets for Location" [1]. I kindly ask anyone in the WG with a better understanding of this specification to correct possible mistakes in what I say below. About the /id/ & /doc/ dilemma: Besides /id/ and /doc/, [1] includes a specific URI pattern for "spatial objects", namely, /so/. Note that in [1], following INSPIRE, "spatial object" = ISO 19100 "(geographic) feature": http://inspire.ec.europa.eu/glossary/SpatialObject (On this topic, see also the relevant comment on GH from Peter Parslow (cc'ed) [2]). The examples in [1] refer to Manchester Piccadilly Station, where you have: 1. The URI for the real-world thing: http://transport.data.gov.uk/id/station/MAN 2. The URI about the description (metadata) of the real-world thing: http://transport.data.gov.uk/doc/station/MAN 3. The URIs of two spatial objects (features) "abstracting" the real-world thing: http://location.data.gov.uk/so/tn/RailwayStationNode/nwkr/123456 http://location.data.gov.uk/so/tn/RailwayStationArea/nwkr/456789 4. The URIs for the descriptions (metadata) of the spatial objects above: http://location.data.gov.uk/doc/tn/RailwayStationNode/nwkr/123456 http://location.data.gov.uk/doc/tn/RailwayStationArea/nwkr/456789 5. URIs for different serialisations / renditions are supported for metadata and spatial objects (features) - e.g.: http://transport.data.gov.uk/doc/station/MAN.csv http://transport.data.gov.uk/doc/station/MAN.html http://transport.data.gov.uk/doc/station/MAN.json http://transport.data.gov.uk/doc/station/MAN.rdf http://transport.data.gov.uk/doc/station/MAN.text http://transport.data.gov.uk/doc/station/MAN.ttl http://transport.data.gov.uk/doc/station/MAN.xml http://location.data.gov.uk/so/tn/RailwayStationNode/nwkr/123456.gml http://location.data.gov.uk/so/tn/RailwayStationNode/nwkr/123456.ttl About the relationships used for linking real-world things, features and serialisations / renditions: (a) The relationship used in the examples to link real-world things to the corresponding features is rdfs:seeAlso. On the other hand, the relationship between the features and the corresponding real-world thing is ":abstracts" (but, AFAIK, this property has no formal definition). (b) The relationship between the metadata and the described resource (real-world thing or feature) is foaf:primaryTopic. (c) The URI of the actual serialisations of metadata and features are specified in the corresponding metadata records with dct:hasFormat. An example of how the relationships above are used is provided by the sample Turtle code in [1] (page 16). Finally, the use of owl:sameAs is also addressed at pages 6-7 (§34), but it's limited to thematic references of real-world things - copy-pasting the relevant example: [[ For example thematic references to Manchester Piccadilly Railway Station which is coded “MAN” for customer reservation purposes and “MNCRPIC” for timetabling and scheduling purposes give rise to the following URI: http://location.data.gov.uk/id/tn/station/crs/MAN http://location.data.gov.uk/id/tn/station/tiploc/MNCRPIC ]] With respect to (my understanding of) the discussion in this thread, I wonder whether [1] may be a starting point to derive alternative (possibly compatible) solutions, with different levels of complexity, on how to model (and use URIs for) real-world things, features and their renditions. Cheers, Andrea ---- [1]https://data.gov.uk/library/designing-uri-sets-for-location [2]https://github.com/w3c/sdw/issues/206#issuecomment-173215852 On 24/08/2016 08:52, Linda van den Brink wrote: > Experience from the Netherlands: we have the id/doc pattern in our URI > strategy, based on the Cool URIs note [8] and the ISA study on > persistent identifiers [9]. > > > > That being said, same as Bill I also notice data users getting confused > and generally using the /doc/ URI as that is the one they can copy from > their browser address bar. This is not only casual confusion but also > ends up in published information resources. > > > > You see this, for example, all over the CB-NL which is a vocabulary for > the building sector and contains links to other Dutch standards such as > IMGeo, an information model and vocabulary for large scale topography. > E.g. the CB-NL concept of ‘Gebouw’ (Building) [10] links to two IMGeo > concepts ‘Pand’ (building part) and ‘Overig Bouwwerk’ (other > construction) using their /doc/ URIs. If you click on Pand (which > doesn’t have its own landing page in CB-NL so I can’t include the link) > you will see it includes the /doc/ URI as the identifier of Pand. > > > > This is an example where it occurs in vocabularies, but I also see it > happen with identifiers for data instances. > > > > [8]: https://www.w3.org/TR/cooluris/ > > [9]: > https://joinup.ec.europa.eu/sites/default/files/D7.1.3%20-%20Study%20on%20persistent%20URIs_0.pdf > > > > 10: http://ont.cbnl.org/cb/def/Gebouw > > > > Linda > > > > *Van:*Jeremy Tandy [mailto:jeremy.tandy@gmail.com] > *Verzonden:* dinsdag 23 augustus 2016 20:57 > *Aan:* Bill Roberts > *CC:* SDW WG Public List > *Onderwerp:* Re: Clarification required: BP6 "use HTTP URIs for spatial > things" > > > > Thanks Bill. Sounds very coherent ... I hoped for some responses such as > this based on practical experience. Jeremy > > On Tue, 23 Aug 2016 at 19:41, Bill Roberts <bill@swirrl.com > <mailto:bill@swirrl.com>> wrote: > > ah Jeremy, you are a brave man to poke the sleeping beast of > httpRange-14. > > > > But I'll get my thoughts in early, then I can tune out of the > ensuing mail avalanche :-) > > > > When publishing Linked Data about places we (at Swirrl) generally do > the id/doc fandango, but to be honest I think data users either > don't notice, or they get confused by it. In the applications we > are working with (and I acknowledge that others may have different > applications and different experiences), it wouldn't cause any > problems to have a single URI, the 'id' URI if you like. We just > don't find a need to say anything about the /doc/ URI. If we were > starting again, I'd probably ditch the /doc/ and the 303 and rely on > context and a little bit of documentation to make it clear what we mean. > > > > The place where we find a need to talk about creators and licences > and modified dates is in metadata about datasets where a dataset > might be a collection of information about a bunch of places - and > we treat datasets as an 'information resource'. If someone requests > a dataset URI we return a status code of 200 and the dataset > metadata as the response. That metadata includes info on where to > get all the contents of the dataset if you want that. > > > > By the way, though it's sensible and consistent, I find that the > implied and parallel property stuff makes it more rather than less > complicated. > > > > Bill > > > > > > > > > > > > > > On 23 August 2016 at 17:37, Jeremy Tandy <jeremy.tandy@gmail.com > <mailto:jeremy.tandy@gmail.com>> wrote: > > All- > > > > Linda has done a great job of consolidating the best practices are > use of identifiers. We have just one [1] now. > > > > Reading though just now, it occurred to me that there's still an > open issue about identifier assignment ... > > > > W3C's Architecture of the World Wide Web constraint "URIs identify a > single resource" [2] asserts "Assign distinct URIs to distinct > resources" in order to avoid URI collisions [2a] which "often > imposes a cost in communication due to the effort required to > resolve ambiguities". Discussions from earlier years in UK Gov > Linked Data working group (and elsewhere) concluded that the "real > world thing" and "information resource that describes the real world > thing" are separate resources. I think this is based on a (purist?) > view when working with RDF of needing to be totally clear on "what's > the subject" of each triple ... the thing or the document. This > manifests as URIs with `id` or `doc` included somewhere to > distinguish between the resources and some RDF triples to clarify > that the doc resource is talking about the thing resource etc.. > > > > (dangerously close to "httpRange-14" [3] here ... let's avoid that > bear trap) > > > > Jeni Tennison's "URLs in Data Primer" draft TAG note captures this > practice in §5.3 "Publishing data" [4]: > > > > ``` > > Publishers can help enable more accurate merging of data from > different sites if they support URLs for each entity > <https://www.w3.org/TR/urls-in-data/#dfn-entity> they or other sites > may wish to describe, separate from the landing pages > <https://www.w3.org/TR/urls-in-data/#dfn-landing-page> or records > <https://www.w3.org/TR/urls-in-data/#dfn-record> that they publish. > > ``` > > > > Yet Architecture of the World Wide Web §2.2.3 "Indirect > identification" [5] notes that: > > > > ``` > > To say that the URI "mailto:nadia@example.com > <mailto:nadia@example.com>" identifies both an Internet mailbox and > Nadia, the person, introduces a URI collision. However, we can use > the URI to indirectly identify Nadia. Identifiers are commonly used > in this way. > > ``` > > > > This is consistent with what I recall TimBL saying at TPAC-2015 in > regards to Vcard; come the finish, no one really cares to > distinguish between the thing and its associated information resource. > > > > ... And in most cases, one can use context to determine whether a > statement concerns the thing or the information resource. In those > cases where you can't, "URLs in Data Primer" suggests some > mechanisms to mitigate such confusion [6][7]. > > > > I think that in our SDW WG discussion we have concluded that we > _are_ content to use "indirect identification" - e.g. that we use > URIs that conflate the thing and document resource. > > > > Please can we confirm this? Assuming that indirect identification is > "approved" as best practice, then it seems prudent to add a note to > the BP document saying "don't worry about distinguishing between > thing and resource; indirect identification is fine" (etc.) > > > > Thanks, Jeremy > > > > [1]: http://w3c.github.io/sdw/bp/#globally-unique-ids > > [2]: https://www.w3.org/TR/webarch/#pr-uri-collision > > [2a]: https://www.w3.org/TR/webarch/#URI-collision > > [3]: https://www.w3.org/2001/tag/group/track/issues/14 > > [4]: https://www.w3.org/TR/urls-in-data/#publishing-data > > [5]: https://www.w3.org/TR/webarch/#indirect-identification > > [6]: https://www.w3.org/TR/urls-in-data/#documenting-properties > > [7]: https://www.w3.org/TR/urls-in-data/#authoring-specifications > > > -- Andrea Perego, Ph.D. Scientific / Technical Project Officer European Commission DG JRC Directorate B - Growth and Innovation Unit B6 - Digital Economy Via E. Fermi, 2749 - TP 262 21027 Ispra VA, Italy https://ec.europa.eu/jrc/ ---- The views expressed are purely those of the writer and may not in any circumstances be regarded as stating an official position of the European Commission.
Received on Monday, 5 September 2016 09:13:16 UTC