- From: Jeremy Tandy <jeremy.tandy@gmail.com>
- Date: Wed, 24 Aug 2016 08:27:59 +0000
- To: Phil Archer <phila@w3.org>, Linda van den Brink <l.vandenbrink@geonovum.nl>, Bill Roberts <bill@swirrl.com>
- Cc: SDW WG Public List <public-sdw-wg@w3.org>
- Message-ID: <CADtUq_3MmerfaRXE8GYnt-0U=Y4zuL5tQxzBqqoZDquDFD15yw@mail.gmail.com>
Yes, I think so ... And we should do so if we are recommending "indirect identification". Jeremy On Wed, 24 Aug 2016 at 09:24, Phil Archer <phila@w3.org> wrote: > Bill's comments also made me think about some of the classic arguments, > such as that a lake doesn't have a last updated date and isn't 435KB > big. Which are true, however, that kind of metadata generally comes from > the server, i.e. the HTTP layer. That's an over simplification but the > point is that it is relatively easy to avoid deliberately creating > misleading metadata - metadata about the doc rather than the thing it > describes - and it's also generally easy to avoid looking for that > metadata. > > Is there scope for some BP advice there? > > Phil. > > On 24/08/2016 08:25, Jeremy Tandy wrote: > > Thanks Linda. More clear examples where being "correct" (in terms of > > avoiding uri collisions by using two distinct uris) is making things > worse > > because users take the wrong one! > > > > So, as a WG, are we content to recommend this "indirect identification" > > pattern where thing & info resource identifiers are conflated? > > > > Bill has added some good points about how to avoid impacts of uri > > collision- by using the (dataset) metadata to talk about licenses and > > creators for the information ... > > On Wed, 24 Aug 2016 at 07:52, Linda van den Brink < > l.vandenbrink@geonovum.nl> > > wrote: > > > >> Experience from the Netherlands: we have the id/doc pattern in our URI > >> strategy, based on the Cool URIs note [8] and the ISA study on > persistent > >> identifiers [9]. > >> > >> > >> > >> That being said, same as Bill I also notice data users getting confused > >> and generally using the /doc/ URI as that is the one they can copy from > >> their browser address bar. This is not only casual confusion but also > ends > >> up in published information resources. > >> > >> > >> > >> You see this, for example, all over the CB-NL which is a vocabulary for > >> the building sector and contains links to other Dutch standards such as > >> IMGeo, an information model and vocabulary for large scale topography. > E.g. > >> the CB-NL concept of ‘Gebouw’ (Building) [10] links to two IMGeo > concepts > >> ‘Pand’ (building part) and ‘Overig Bouwwerk’ (other construction) using > >> their /doc/ URIs. If you click on Pand (which doesn’t have its own > landing > >> page in CB-NL so I can’t include the link) you will see it includes the > >> /doc/ URI as the identifier of Pand. > >> > >> > >> > >> This is an example where it occurs in vocabularies, but I also see it > >> happen with identifiers for data instances. > >> > >> > >> > >> [8]: https://www.w3.org/TR/cooluris/ > >> > >> [9]: > >> > https://joinup.ec.europa.eu/sites/default/files/D7.1.3%20-%20Study%20on%20persistent%20URIs_0.pdf > >> 10: http://ont.cbnl.org/cb/def/Gebouw > >> > >> > >> > >> Linda > >> > >> > >> > >> *Van:* Jeremy Tandy [mailto:jeremy.tandy@gmail.com] > >> *Verzonden:* dinsdag 23 augustus 2016 20:57 > >> *Aan:* Bill Roberts > >> *CC:* SDW WG Public List > >> *Onderwerp:* Re: Clarification required: BP6 "use HTTP URIs for spatial > >> things" > >> > >> > >> > >> Thanks Bill. Sounds very coherent ... I hoped for some responses such as > >> this based on practical experience. Jeremy > >> > >> On Tue, 23 Aug 2016 at 19:41, Bill Roberts <bill@swirrl.com> wrote: > >> > >> ah Jeremy, you are a brave man to poke the sleeping beast of > httpRange-14. > >> > >> > >> > >> But I'll get my thoughts in early, then I can tune out of the ensuing > mail > >> avalanche :-) > >> > >> > >> > >> When publishing Linked Data about places we (at Swirrl) generally do the > >> id/doc fandango, but to be honest I think data users either don't > notice, > >> or they get confused by it. In the applications we are working with > (and I > >> acknowledge that others may have different applications and different > >> experiences), it wouldn't cause any problems to have a single URI, the > 'id' > >> URI if you like. We just don't find a need to say anything about the > /doc/ > >> URI. If we were starting again, I'd probably ditch the /doc/ and the > 303 > >> and rely on context and a little bit of documentation to make it clear > what > >> we mean. > >> > >> > >> > >> The place where we find a need to talk about creators and licences and > >> modified dates is in metadata about datasets where a dataset might be a > >> collection of information about a bunch of places - and we treat > datasets > >> as an 'information resource'. If someone requests a dataset URI we > return a > >> status code of 200 and the dataset metadata as the response. That > metadata > >> includes info on where to get all the contents of the dataset if you > want > >> that. > >> > >> > >> > >> By the way, though it's sensible and consistent, I find that the implied > >> and parallel property stuff makes it more rather than less complicated. > >> > >> > >> > >> Bill > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> On 23 August 2016 at 17:37, Jeremy Tandy <jeremy.tandy@gmail.com> > wrote: > >> > >> All- > >> > >> > >> > >> Linda has done a great job of consolidating the best practices are use > of > >> identifiers. We have just one [1] now. > >> > >> > >> > >> Reading though just now, it occurred to me that there's still an open > >> issue about identifier assignment ... > >> > >> > >> > >> W3C's Architecture of the World Wide Web constraint "URIs identify a > >> single resource" [2] asserts "Assign distinct URIs to distinct > resources" > >> in order to avoid URI collisions [2a] which "often imposes a cost in > >> communication due to the effort required to resolve ambiguities". > >> Discussions from earlier years in UK Gov Linked Data working group (and > >> elsewhere) concluded that the "real world thing" and "information > resource > >> that describes the real world thing" are separate resources. I think > this > >> is based on a (purist?) view when working with RDF of needing to be > totally > >> clear on "what's the subject" of each triple ... the thing or the > document. > >> This manifests as URIs with `id` or `doc` included somewhere to > distinguish > >> between the resources and some RDF triples to clarify that the doc > resource > >> is talking about the thing resource etc.. > >> > >> > >> > >> (dangerously close to "httpRange-14" [3] here ... let's avoid that bear > >> trap) > >> > >> > >> > >> Jeni Tennison's "URLs in Data Primer" draft TAG note captures this > >> practice in §5.3 "Publishing data" [4]: > >> > >> > >> > >> ``` > >> > >> Publishers can help enable more accurate merging of data from different > >> sites if they support URLs for each entity > >> <https://www.w3.org/TR/urls-in-data/#dfn-entity> they or other sites > may > >> wish to describe, separate from the landing pages > >> <https://www.w3.org/TR/urls-in-data/#dfn-landing-page> or records > >> <https://www.w3.org/TR/urls-in-data/#dfn-record> that they publish. > >> > >> ``` > >> > >> > >> > >> Yet Architecture of the World Wide Web §2.2.3 "Indirect identification" > >> [5] notes that: > >> > >> > >> > >> ``` > >> > >> To say that the URI "mailto:nadia@example.com" identifies both an > >> Internet mailbox and Nadia, the person, introduces a URI collision. > >> However, we can use the URI to indirectly identify Nadia. Identifiers > are > >> commonly used in this way. > >> > >> ``` > >> > >> > >> > >> This is consistent with what I recall TimBL saying at TPAC-2015 in > regards > >> to Vcard; come the finish, no one really cares to distinguish between > the > >> thing and its associated information resource. > >> > >> > >> > >> ... And in most cases, one can use context to determine whether a > >> statement concerns the thing or the information resource. In those cases > >> where you can't, "URLs in Data Primer" suggests some mechanisms to > mitigate > >> such confusion [6][7]. > >> > >> > >> > >> I think that in our SDW WG discussion we have concluded that we _are_ > >> content to use "indirect identification" - e.g. that we use URIs that > >> conflate the thing and document resource. > >> > >> > >> > >> Please can we confirm this? Assuming that indirect identification is > >> "approved" as best practice, then it seems prudent to add a note to the > BP > >> document saying "don't worry about distinguishing between thing and > >> resource; indirect identification is fine" (etc.) > >> > >> > >> > >> Thanks, Jeremy > >> > >> > >> > >> [1]: http://w3c.github.io/sdw/bp/#globally-unique-ids > >> > >> [2]: https://www.w3.org/TR/webarch/#pr-uri-collision > >> > >> [2a]: https://www.w3.org/TR/webarch/#URI-collision > >> > >> [3]: https://www.w3.org/2001/tag/group/track/issues/14 > >> > >> [4]: https://www.w3.org/TR/urls-in-data/#publishing-data > >> > >> [5]: https://www.w3.org/TR/webarch/#indirect-identification > >> > >> [6]: https://www.w3.org/TR/urls-in-data/#documenting-properties > >> > >> [7]: https://www.w3.org/TR/urls-in-data/#authoring-specifications > >> > >> > >> > >> > > > > -- > > > Phil Archer > W3C Data Activity Lead > http://www.w3.org/2013/data/ > > http://philarcher.org > +44 (0)7887 767755 > @philarcher1 >
Received on Wednesday, 24 August 2016 08:28:40 UTC