Re: Clarification required: BP6 "use HTTP URIs for spatial things" from Rob Atkinson on 2016-09-05 (public-sdw-wg@w3.org from September 2016)

From: Rob Atkinson <rob@metalinkage.com.au>
Date: Mon, 05 Sep 2016 00:59:32 +0000
To: Simon.Cox@csiro.au, rob@metalinkage.com.au, janowicz@ucsb.edu, frans.knibbe@geodan.nl
Cc: jlieberman@tumblingwalls.com, jeremy.tandy@gmail.com, public-sdw-wg@w3.org
Message-ID: <CACfF9LxzUYd4dRgFQAarzd8DpgZNAz=Hjh_i=A9RMEUT=479KA@mail.gmail.com>
I agree we shouldn't, but that's the sort of thing people do - and we can
provide a BP to avoid it in favour of using indirect URIs, which then
provides the option of declaring equivalence without stating nonsense.


Rob


On Mon, 5 Sep 2016 at 10:04 <Simon.Cox@csiro.au> wrote:

> *Rob* wrote:
>
>
>
> Ø  consider two resources with URIs R1 and R2
>
> Ø
>
> Ø  R1 ns:dateEdited 12/1/2001
>
> Ø
>
> Ø  R2 ns:dateEdited 6/6/2006
>
> Ø
>
> Ø  R1 owl:sameAs R2  then leads to ambiguity regarding the value of the
> functional property ns:dateEdited
>
>
>
> Where R1 and R2 are representations or descriptions of a (real-world)
> thing, possibly a graph of RDF triples.
>
>
>
> However, in a separate part of the thread, *Jeremy* wrote:
>
>
>
> Ø  few people will care to name the representation / graph at all.
>
>
>
> In other words, the URIs R1 and R2 are usually not treated with much
> respect. So it is unlikely that we would be in the business of making
> sameAs statements about these.
>
>
>
> Simon
>
>
>
>
>
> *From:* Rob Atkinson [mailto:rob@metalinkage.com.au]
> *Sent:* Saturday, 3 September 2016 8:05 AM
> *To:* janowicz@ucsb.edu; Frans Knibbe <frans.knibbe@geodan.nl>
>
>
> *Cc:* Joshua Lieberman <jlieberman@tumblingwalls.com>; Jeremy Tandy <
> jeremy.tandy@gmail.com>; SDW WG Public List <public-sdw-wg@w3.org>
>
> *Subject:* Re: Clarification required: BP6 "use HTTP URIs for spatial
> things"
>
>
>
>
>
> A few things - this is a rich discussion and we have identified several
> parts (which is probably why the original issue was hard to pin down)
>
>
>
> I'm glad we have coaxed one elephant out - the sameAs semantics issue.
> For me this is the litmus test whether a URL can be used as a URI for a
> thing or not.
>
>
>
> (and this is where one of the issues about SIRF Jeremy raised comes in -
> but I dont think we need to worry about specific approach, rather the
> criteria for whether a URI is a good one for identification purposes.  I
> think we simply make a strong statement that you dont use a URL as a URI if
> it is not stable and it does not make sense to use owl:sameAs.
>
> This pretty much rules out any direct URL to a single representation:
>
>
>
> consider two resources with URIs R1 and R2
>
>
>
> R1 ns:dateEdited 12/1/2001
>
>
>
> R2 ns:dateEdited 6/6/2006
>
>
>
> R1 owl:sameAs R2  then leads to ambiguity regarding the value of the
> functional property ns:dateEdited
>
>
>
> however
>
>
>
> U1 --303--> R1
>
> U2 --303--> R2
>
>
>
> can (and should be) represented as
>
> U1 ns:hasRepresentation R1
>
> U2 ns:hasRepresentation R2
>
>
>
> U1 owl:sameAS U2
>
> entails
>
> U1 ns:hasRepresentation R1, R2
>
>
>
> which doesnt make any stupid statements about the properties. It also
> allows us to make useful metadata statements about R1, R2 as required.
>
>
>
> Whilst this is a general concern, we see issues of identification
> stability, multiple representations, non-unique naming being significant to
> spatial data and I think we can and should therefore extend the general
> DWBP with an example using spatial representations and provide a more
> concrete best practice.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sat, 3 Sep 2016 at 00:40 Krzysztof Janowicz <janowicz@ucsb.edu> wrote:
>
> I am no expert on the matter, but several sources tell me that if <A>
> <owl:sameAs> <B>, then all statements that can be made about A will also be
> true for B, and vice versa. It seems that the lighthouse example breaks at
> that point. For example, in Jeremy's example one of the lighthouse
> representations has a height of 41 m. It is likely that that statement will
> be false for the representation of the lighthouse as a ruin.
>
>
>
> Can we be sure that if we recommend using owl:sameAs to assert that two
> resources are really the same thing, everyone and everything is aware of
> the logical consequences?
>
>
>
> This is exactly the key point. If A owl:sameAs B than A and B signify the
> same entity and thus every *statement* about A is a statement about B. It
> works well with Jeremy's example. The fact that the ruin no longer is 41m
> tall is an example of the need for spatiotemporal scoping of predicates not
> a shortcoming of owl:sameAs. Also, keep in mind that RDF statements have
> nothing to do with facts or truth; they are just sets of statements. This
> is were the power of RDF comes from.
>
> Best,
> Krzysztof
>
>
>
>
>
> On 09/02/2016 02:20 AM, Frans Knibbe wrote:
>
>
>
> On 1 September 2016 at 23:42, Krzysztof Janowicz <janowicz@ucsb.edu>
> wrote:
>
>
> Hi,
>
>
> So as representations, these are not “owl:sameAs”.
>
>
>
> Just for clarification. owl:sameAs is only concerned with the mapping of
> IRIs to (real world) entities and not 'representations' (leaving aside the
> fact that everything is a representation in some sense). I.e., it is about
> 'identity'. To give an extreme example, a URI may refer to the Eddystone
> Lighthouse which may be classified as /Lighthouse/ in some repository.
> Another URI established 50 years from now can still refer to this
> particular (4th) lighthouse and classify it as a /Ruin/. Another 50 years
> into the future, there may be yet another URI that refers to the fact that
> at some stage there was a ruin here of the 4th lighthouse called Eddystone
> while there is nothing physical left of it, and, thus, it is neither
> classified as /Ruin/ nor /Lighthouse/. In fact, we do not even need to
> introduce the concept of "real world" here as we can also establish a
> sameAs relation between two URIs that point to Zeus. Please note that this
> is different from establish a sameAs link between a particular statue of
> Zeus in a particular museum and Zeus as the god of thunder. Finally, the
> purpose of establishing sameAs links is typically data fusion/conflation
> (no matter whether this is done ad-hoc, manually, or (offline)
> computationally) .
>
>
>
> I am no expert on the matter, but several sources tell me that if <A>
> <owl:sameAs> <B>, then all statements that can be made about A will also be
> true for B, and vice versa. It seems that the lighthouse example breaks at
> that point. For example, in Jeremy's example one of the lighthouse
> representations has a height of 41 m. It is likely that that statement will
> be false for the representation of the lighthouse as a ruin.
>
>
>
> Can we be sure that if we recommend using owl:sameAs to assert that two
> resources are really the same thing, everyone and everything is aware of
> the logical consequences?
>
>
>
> Regards,
>
> Frans
>
>
>
>
>
>
> Best,
> Jano
>
>
> On 08/31/2016 06:38 AM, Joshua Lieberman wrote:
>
> Jeremy,
>
>
>
> So as representations, these are not “owl:sameAs”. We assume that as
> feature data, each refers to a real world entity, but we don’t assert that
> this VerticalObstruction is the same individual as this
> MaritimeNavigationAid. We just are suspecting or asserting that the same
> real world thing is being discerned in two different ways. Someone may
> define a lighthouse class as subclassing both, otherwise a slightly
> specialized relation (e.g. sdwgeo:sameRealWorldEntityAs) would be useful
> here.
>
>
>
> Josh
>
>
>
> On Aug 31, 2016, at 8:41 AM, Jeremy Tandy <jeremy.tandy@gmail.com> wrote:
>
>
>
> > That still leaves a gap in expressing whether two feature data entities
> represent the same real world entity. Perhaps we need a "sameFeatureAs"
> predicate to address this.
>
>
>
> @josh - can we clarify my understanding please?
>
>
>
> In the BP doc §4 "Spatial things, features and geometry" [1] I use a
> lighthouse example, so I'll continue with that ...
>
>
>
> We have one real lighthouse (Eddystone Lighthouse) that is discerned as a
> different Type by different communities: "VerticalObstruction" and
> "MaritimeNavigationAid". In ISO 19100 parlance, these are two distinct
> feature types. The two "Features" might be encoded in GML as follows
> (forgive any errors in my illustrative example):
>
>
>
> <VerticalObstruction gml:id="a">
>
>     <gml:name>Eddystone</gml:name>
>
>     <gml:identifier codeSpace="
> http://example.com/sar/features/vo/">EDY</gml:identifier>
>
>     <geometry>
>
>         <gml:Point gml:id="a-p1" srsDimension="2" srsName="EPSG:4326">
>
>             <gml:pos>50.184 -4.268</gml:pos>
>
>         </gml:Point>
>
>     </geometry>
>
>     <height uom="m">41</height>
>
> </VerticalObstruction>
>
>
>
> <MaritimeNavigationAid gml:id="b">
>
>     <gml:name>Eddystone Lighthouse</gml:name>
>
>     <gml:identifier codeSpace="http://example.org/maritime/navaid/
> ">2650253</gml:identifier>
>
>     <geo>
>
>         <gml:Point gml:id="b-p1" srsDimension="2" srsName="EPSG:4326">
>
>             <gml:pos>50.2 -4.3</gml:pos>
>
>         </gml:Point>
>
>     </geo>
>
>     <lightCharacteristic>
>
>         ...
>
>     </lightCharacteristic>
>
> </MaritimeNavigationAid>
>
>
>
> So we have two Features (which we collectively have agreed are "spatial
> things"), with identifiers <http://example.com/sar/features/vo/EDY> and <
> http://example.org/maritime/navaid/2650253>. Respectively, the XML
> elements that describe these features are identified as "a" and "b" using
> the @gml:id attribute.
>
>
>
> If we are using "indirect identification" then _both_ <
> http://example.com/sar/features/vo/EDY> and <
> http://example.org/maritime/navaid/2650253> are treated as identifiers
> for the _real_ Eddystone Lighthouse; we simply don't care to differentiate
> between the real world thing and the information record. In which case,
> <owl:sameAs>  would seem sufficient? The "height" and "lightCharacteristic"
> properties are both applicable to the real Eddystone Lighthouse. Some
> judgement would be required to decide which point geometry ("geo" or
> "geometry" property) is considered "best".
>
>
>
> The way I think about it, @gml:id is more like the identifier for a named
> graph; a container for a set of properties ...
>
>
>
> Am I missing something???
>
>
>
> Jeremy
>
>
>
>
>
> [1]: http://w3c.github.io/sdw/bp/#spatial-things-features-and-geometry
>
>
>
> On Wed, 31 Aug 2016 at 12:42 Joshua Lieberman <
> jlieberman@tumblingwalls.com> wrote:
>
> If we are asserting that spatial data on the Web is "always" feature data
> that represents a real world entity, then yes, we don't have the general
> Web "is it or isn't it physical" ambiguity and can assume that a feature
> data identifier also and indirectly identifies the feature. That still
> leaves a gap in expressing whether two feature data entities represent the
> same real world entity. Perhaps we need a "sameFeatureAs" predicate to
> address this.
>
>
>
> Josh
>
> Joshua Lieberman, Ph.D.
>
> Principal, Tumbling Walls Consultancy
>
> Tel/Direct: +1 617-431-6431
>
> jlieberman@tumblingwalls.com
>
>
> On Aug 31, 2016, at 07:29, Frans Knibbe <frans.knibbe@geodan.nl> wrote:
>
> Hello,
>
>
>
> As stated before, I don't think the httpRange-14 problem exists in our
> domain of discourse. I think (and hope) that confusion can only occur when
> the things that are described are digital things, or things that can be
> transmitted over a computer network, like web pages or mail boxes. It seems
> to me that spatial things are never that type of thing. Therefore there is
> no reason to take precautions against possible confusion.
>
>
>
> That probably means +1.
>
>
>
> Greetings,
>
> Frans
>
>
>
>
>
>
>
> On 31 August 2016 at 09:50, Jeremy Tandy <jeremy.tandy@gmail.com> wrote:
>
> Thanks Rob & Clemens ...
>
>
>
> On Wed, 31 Aug 2016 at 08:30, Clemens Portele <
> portele@interactive-instruments.de> wrote:
>
> +1
>
>
>
> On 30 August 2016 at 10:10:26, Jeremy Tandy (jeremy.tandy@gmail.com)
> wrote:
>
> Hi. It would be good to close this issue out & include our collective
> recommendation in the BP doc working draft.
>
>
>
> PROPOSAL: SDW working group recommends use of "indirect identifiers" for
> spatial things
>
>
>
> ... I'll start the voting.
>
>
>
> +1
>
>
>
> Jeremy
>
>
>
> (BTW, to make sense of the PROPOSAL you'll need to read the email thread)
>
>
>
> On Fri, 26 Aug 2016 at 10:12 Linda van den Brink <
> l.vandenbrink@geonovum.nl> wrote:
>
> So… do we agree we can recommend indirect identifiers, or do we try to fix
> the issue with getting the correct identifier as Rob describes?
>
>
>
> While waiting for this I’ve updated the issue and the text referring to
> the issue in BP6.
>
>
>
> *Van:* Rob Atkinson [mailto:rob@metalinkage.com.au]
> *Verzonden:* woensdag 24 augustus 2016 13:56
> *Aan:* Jeremy Tandy; Phil Archer; Linda van den Brink; Bill Roberts
>
>
> *CC:* SDW WG Public List
>
> *Onderwerp:* Re: Clarification required: BP6 "use HTTP URIs for spatial
> things"
>
>
>
> Hi
>
>
>
> Agree this is a real concern - people cant be blamed for doing the
> obvious, if dumb, thing..
>
>
>
> I think we should take note of best practice in the HTML world - which is
> often to include a citable link to a resource in the rendered view.  Or a
> "share" or something similar. We can also put fairly explicit annotation in
> machine-readable code - stating that the resource is about the URI - and
> even notes saying when citing this resource use the URI....
>
>
>
> I'd also like to see browsers evolve to offer you the original link or the
> redirected when cutting and pasting - how hard can it be!
>
>
>
> Maybe we can get Ed to ask around Google Chrome team for suggestions on
> how best to handle this :-)
>
>
>
> Rob
>
>
>
>
>
>
>
> On Wed, 24 Aug 2016 at 18:27 Jeremy Tandy <jeremy.tandy@gmail.com> wrote:
>
> Yes, I think so ... And we should do so if we are recommending "indirect
> identification".
>
> Jeremy
>
> On Wed, 24 Aug 2016 at 09:24, Phil Archer <phila@w3.org> wrote:
>
> Bill's comments also made me think about some of the classic arguments,
> such as that a lake doesn't have a last updated date and isn't 435KB
> big. Which are true, however, that kind of metadata generally comes from
> the server, i.e. the HTTP layer. That's an over simplification but the
> point is that it is relatively easy to avoid deliberately creating
> misleading metadata - metadata about the doc rather than the thing it
> describes - and it's also generally easy to avoid looking for that
> metadata.
>
> Is there scope for some BP advice there?
>
> Phil.
>
> On 24/08/2016 08:25, Jeremy Tandy wrote:
> > Thanks Linda. More clear examples where being "correct" (in terms of
> > avoiding uri collisions by using two distinct uris) is making things
> worse
> > because users take the wrong one!
> >
> > So, as a WG, are we content to recommend this "indirect identification"
> > pattern where thing & info resource identifiers are conflated?
> >
> > Bill has added some good points about how to avoid impacts of uri
> > collision- by using the (dataset) metadata to talk about licenses and
> > creators for the information ...
> > On Wed, 24 Aug 2016 at 07:52, Linda van den Brink <
> l.vandenbrink@geonovum.nl>
> > wrote:
> >
> >> Experience from the Netherlands: we have the id/doc pattern in our URI
> >> strategy, based on the Cool URIs note [8] and the ISA study on
> persistent
> >> identifiers [9].
> >>
> >>
> >>
> >> That being said, same as Bill I also notice data users getting confused
> >> and generally using the /doc/  URI as that is the one they can copy from
> >> their browser address bar. This is not only casual confusion but also
> ends
> >> up in published information resources.
> >>
> >>
> >>
> >> You see this, for example, all over the CB-NL which is a vocabulary for
> >> the building sector and contains links to other Dutch standards such as
> >> IMGeo, an information model and vocabulary for large scale topography.
> E.g.
> >> the CB-NL concept of ‘Gebouw’ (Building) [10]  links to two IMGeo
> concepts
> >> ‘Pand’ (building part) and ‘Overig Bouwwerk’ (other construction) using
> >> their /doc/ URIs. If you click on Pand (which doesn’t have its own
> landing
> >> page in CB-NL so I can’t include the link) you will see it includes the
> >> /doc/  URI as the identifier of Pand.
> >>
> >>
> >>
> >> This is an example where it occurs in vocabularies, but I also see it
> >> happen with identifiers for data instances.
> >>
> >>
> >>
> >> [8]: https://www.w3.org/TR/cooluris/
> >>
> >> [9]:
> >>
> https://joinup.ec.europa.eu/sites/default/files/D7.1.3%20-%20Study%20on%20persistent%20URIs_0.pdf
> >> 10: http://ont.cbnl.org/cb/def/Gebouw
> >>
> >>
> >>
> >> Linda
> >>
> >>
> >>
> >> *Van:* Jeremy Tandy [mailto:jeremy.tandy@gmail.com]
> >> *Verzonden:* dinsdag 23 augustus 2016 20:57
> >> *Aan:* Bill Roberts
> >> *CC:* SDW WG Public List
> >> *Onderwerp:* Re: Clarification required: BP6 "use HTTP URIs for spatial
> >> things"
> >>
> >>
> >>
> >> Thanks Bill. Sounds very coherent ... I hoped for some responses such as
> >> this based on practical experience. Jeremy
> >>
> >> On Tue, 23 Aug 2016 at 19:41, Bill Roberts <bill@swirrl.com> wrote:
> >>
> >> ah Jeremy, you are a brave man to poke the sleeping beast of
> httpRange-14.
> >>
> >>
> >>
> >> But I'll get my thoughts in early, then I can tune out of the ensuing
> mail
> >> avalanche :-)
> >>
> >>
> >>
> >> When publishing Linked Data about places we (at Swirrl) generally do the
> >> id/doc fandango, but to be honest I think data users either don't
> notice,
> >> or they get confused by it.  In the applications we are working with
> (and I
> >> acknowledge that others may have different applications and different
> >> experiences), it wouldn't cause any problems to have a single URI, the
> 'id'
> >> URI if you like.  We just don't find a need to say anything about the
> /doc/
> >> URI.  If we were starting again, I'd probably ditch the /doc/ and the
> 303
> >> and rely on context and a little bit of documentation to make it clear
> what
> >> we mean.
> >>
> >>
> >>
> >> The place where we find a need to talk about creators and licences and
> >> modified dates is in metadata about datasets where a dataset might be a
> >> collection of information about a bunch of places - and we treat
> datasets
> >> as an 'information resource'. If someone requests a dataset URI we
> return a
> >> status code of 200 and the dataset metadata as the response.  That
> metadata
> >> includes info on where to get all the contents of the dataset if you
> want
> >> that.
> >>
> >>
> >>
> >> By the way, though it's sensible and consistent, I find that the implied
> >> and parallel property stuff makes it more rather than less complicated.
> >>
> >>
> >>
> >> Bill
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 23 August 2016 at 17:37, Jeremy Tandy <jeremy.tandy@gmail.com>
> wrote:
> >>
> >> All-
> >>
> >>
> >>
> >> Linda has done a great job of consolidating the best practices are use
> of
> >> identifiers. We have just one [1] now.
> >>
> >>
> >>
> >> Reading though just now, it occurred to me that there's still an open
> >> issue about identifier assignment ...
> >>
> >>
> >>
> >> W3C's Architecture of the World Wide Web constraint "URIs identify a
> >> single resource" [2] asserts "Assign distinct URIs to distinct
> resources"
> >> in order to avoid URI collisions [2a] which "often imposes a cost in
> >> communication due to the effort required to resolve ambiguities".
> >> Discussions from earlier years in UK Gov Linked Data working group (and
> >> elsewhere) concluded that the "real world thing" and "information
> resource
> >> that describes the real world thing" are separate resources. I think
> this
> >> is based on a (purist?) view when working with RDF of needing to be
> totally
> >> clear on "what's the subject" of each triple ... the thing or the
> document.
> >> This manifests as URIs with `id` or `doc` included somewhere to
> distinguish
> >> between the resources and some RDF triples to clarify that the doc
> resource
> >> is talking about the thing resource etc..
> >>
> >>
> >>
> >> (dangerously close to "httpRange-14" [3] here ... let's avoid that bear
> >> trap)
> >>
> >>
> >>
> >> Jeni Tennison's "URLs in Data Primer" draft TAG note captures this
> >> practice in §5.3 "Publishing data" [4]:
> >>
> >>
> >>
> >> ```
> >>
> >> Publishers can help enable more accurate merging of data from different
> >> sites if they support URLs for each entity
> >> <https://www.w3.org/TR/urls-in-data/#dfn-entity> they or other sites
> may
> >> wish to describe, separate from the landing pages
> >> <https://www.w3.org/TR/urls-in-data/#dfn-landing-page> or records
> >> <https://www.w3.org/TR/urls-in-data/#dfn-record> that they publish.
> >>
> >> ```
> >>
> >>
> >>
> >> Yet Architecture of the World Wide Web §2.2.3 "Indirect identification"
> >> [5] notes that:
> >>
> >>
> >>
> >> ```
> >>
> >> To say that the URI "mailto:nadia@example.com" identifies both an
> >> Internet mailbox and Nadia, the person, introduces a URI collision.
> >> However, we can use the URI to indirectly identify Nadia. Identifiers
> are
> >> commonly used in this way.
> >>
> >> ```
> >>
> >>
> >>
> >> This is consistent with what I recall TimBL saying at TPAC-2015 in
> regards
> >> to Vcard; come the finish, no one really cares to distinguish between
> the
> >> thing and its associated information resource.
> >>
> >>
> >>
> >> ... And in most cases, one can use context to determine whether a
> >> statement concerns the thing or the information resource. In those cases
> >> where you can't, "URLs in Data Primer" suggests some mechanisms to
> mitigate
> >> such confusion [6][7].
> >>
> >>
> >>
> >> I think that in our SDW WG discussion we have concluded that we _are_
> >> content to use "indirect identification" - e.g. that we use URIs that
> >> conflate the thing and document resource.
> >>
> >>
> >>
> >> Please can we confirm this? Assuming that indirect identification is
> >>
>
>
Received on Monday, 5 September 2016 01:00:32 UTC