- From: Jeremy Tandy <jeremy.tandy@gmail.com>
- Date: Thu, 08 Sep 2016 10:05:19 +0000
- To: Rob Atkinson <rob@metalinkage.com.au>, janowicz@ucsb.edu, Simon.Cox@csiro.au, frans.knibbe@geodan.nl
- Cc: jlieberman@tumblingwalls.com, public-sdw-wg@w3.org
- Message-ID: <CADtUq_05WTWEWWfuRm0P0ZdvpFnB5rznXn53WzxcPB-5Yic1XA@mail.gmail.com>
On yesterday's BP sub-group call (minutes [1]) we had a short discussion about the conclusion (see previous email in this thread [2]). I think broad agreement was reached that this was a way forward - but probably requires folks to read through the summary themselves. As I am away until TPAC it's unlikely that the conclusion of this thread will make it into the BP doc itself ... Jeremy [1]: https://www.w3.org/2016/09/07-sdwbp-minutes [2]: https://lists.w3.org/Archives/Public/public-sdw-wg/2016Sep/0096.html On Mon, 5 Sep 2016 at 17:18 Jeremy Tandy <jeremy.tandy@gmail.com> wrote: > Hi - thanks to all of you who have contributed to this discussion. It's > been big, but I think worthwhile. > > @andrea - thanks for taking the time to summarise the guidance from the UK > (from memory, I think you got it spot on). I recall from those discussions > that INSPIRE's "spatial object" (identified with /so/ pattern) is an > information object. The examples you use illustrate the difficulty in > combining ("fusing") information from different sources; do the "Railway > Station Node" < > http://location.data.gov.uk/so/tn/RailwayStationNode/nwkr/123456> (with > properties including point location and number of platforms) and "Railway > Station Area" < > http://location.data.gov.uk/so/tn/RailwayStationArea/nwkr/456789> (with > properties including the area describing the "topographical limits of the > facilities of a railway station -buildings, railway yards, installations > and equipment") actually identify Manchester Piccadilly Railway Station (< > http://transport.data.gov.uk/id/station/MAN>) - or do they identify > aspects of Manchester Piccadilly Railway Station? This is why > reconciliation of identifiers is hard - and beyond our scope. @eparsons > once said "here be dragons" [0] > > Based on implementation experience, the /id/ and /doc/ pattern tends to > cause more confusion that it solves [1A][1B] > > ... > > I conclude from the entire thread that: > > 1) indirect identification for spatial things is preferred in situations > where people do not need to explicitly refer to the document resource that > describes the spatial thing (the "representation") ... This means that we > will only have a single URI in use for a spatial thing and its > representation - no need for /id/ and /doc/ URI pairs. > > @bill had some suggestions (see [1A]) about how to avoid confusion by > putting the (metadata) properties about the information resource, such as > "last-update-time", in the dataset description. Here we see evidence of the > pattern of metadata for an individual item being inherited from a dataset > description. This relates to the more general requirement about how we > attach metadata in a consistent way. I mentioned this in the discussion > about crowd-sourced spatial data [2] ... > > 2) where URI collision (caused by using the same URI for spatial thing and > the representation) is a problem, or the representation is provided via URL > of some other system (e.g. a WFS endpoint's getFeature request) then the > representation itself will also be identified using a URI (or URL) ... the > URI for the spatial thing should be resolved by redirecting the HTTP > request to the URL of the representation > > 3) there is no overarching pattern that can be used in all situations to > identify the representation (because a representation will often be > provided by a URL for a service endpoint) - although the /id/ and /doc/ > pattern is widely used > > 4) although proliferation of identifiers for spatial things is > discouraged, it is conceivable that a given spatial thing may be identified > by through multiple URIs (the "non-unique naming problem" - different > people each use their own identifiers), e.g. U1 and U2, each of which may > resolve to provide different information about the spatial thing. > <owl:sameAs> is the appropriate predicate / relationship to link these URIs > together (@jano refers to this as a "co-reference solution"). If the > representations associated with U1 and U2 are explicitly identified, e.g. > U1 has representation R1 and U2 has representation R2, <owl:sameAs> must > not be used to link the URLs of the representations as these are _not_ the > same. However, given that { <U1> owl:sameAs <U2> }, then U1 has > representations R1 _and_ R2 ... and both representations also apply to U2 > too. This provides a *basis* for data fusion but will likely require > further insight to determine which properties are considered valid (ref. > the previous comments about inaccurate data and "spatiotemporal scoping"). > > 5) a spatial thing may have multiple representations; content negotiation > provides a mechanism for a user agent to select a preferred MIME-type, > charset, encoding and language, but there is a gap in (best) practice about > how a user agent can request a representation using a specific schema or > data model (see [3]) ... the Linked Data API, SIRF and "profile" Link > Header (RFC 6909) provide examples of how this might be achieved > > 6) there is a gap in (best) practice about how a server might advertise > the availability of multiple representations; e.g. based on different data > sources and/or different data models. > > > [0]: https://lists.w3.org/Archives/Public/public-sdw-wg/2015Sep/0058.html > [1A]: https://lists.w3.org/Archives/Public/public-sdw-wg/2016Aug/0141.html > > [1B]: https://lists.w3.org/Archives/Public/public-sdw-wg/2016Aug/0148.html > > [2]: https://lists.w3.org/Archives/Public/public-sdw-wg/2016Sep/0058.html > [3]: http://geo4web-testbed.github.io/topic4/#h.n0gkernttzw0 > <http://geo4web-testbed.github.io/topic4/#h.n0gkernttzw0> > > On Mon, 5 Sep 2016 at 08:05 Rob Atkinson <rob@metalinkage.com.au> wrote: > >> >> I am perhaps missing the point here and there are a lot of terms here >> whose precise meaning may or may not be well understood ( conflation, >> co-resolution, fusion ) , but isnt owl:sameAs also specifically a statement >> that some sort of fusion by entailment of shared properties using (some >> flavour of OWL) is meaningful? >> >> The issue here is nevertheless about the rationale for using "indirect >> URIs" - I still think the example is relevant - and such an example could >> be used to explain why it is important to do so. >> >> It also addresses the issue of whether some canonical property (with >> appropriate subProperty specialisations) for the relationship between a URI >> and one or more resources is needed - and whether a BP can be identified >> for this. >> >> Rob >> >> >> On Mon, 5 Sep 2016 at 13:31 Krzysztof Janowicz <janowicz@ucsb.edu> wrote: >> >>> >>> >>> >>> *Hi, * >>> >>> *Rob* wrote: >>> >>> >>> >>> Ø consider two resources with URIs R1 and R2 >>> >>> Ø >>> >>> Ø R1 ns:dateEdited 12/1/2001 >>> >>> Ø >>> >>> Ø R2 ns:dateEdited 6/6/2006 >>> >>> Ø >>> >>> Ø R1 owl:sameAs R2 then leads to ambiguity regarding the value of the >>> functional property ns:dateEdited >>> >>> >>> >>> >>> Note that this is not a owl:sameAs issue. I think it is very important >>> to distinguish between co-reference resolutions (using owl:SameAs, >>> skos:closeMatch,...) and data conflation (data fusion). owl:SameAs handles >>> co-reference resolution. Data fusion is still an open research issue >>> (despite tons of work in the DB community). The fact that ns:dateEdited may >>> be defined as a functional property in some ontology will also have no >>> effect on the RDF triples as such. >>> >>> Best, >>> Krzysztof >>> >>> >>> >>> >>> On 09/04/2016 05:04 PM, Simon.Cox@csiro.au wrote: >>> >>> *Rob* wrote: >>> >>> >>> >>> Ø consider two resources with URIs R1 and R2 >>> >>> Ø >>> >>> Ø R1 ns:dateEdited 12/1/2001 >>> >>> Ø >>> >>> Ø R2 ns:dateEdited 6/6/2006 >>> >>> Ø >>> >>> Ø R1 owl:sameAs R2 then leads to ambiguity regarding the value of the >>> functional property ns:dateEdited >>> >>> >>> >>> Where R1 and R2 are representations or descriptions of a (real-world) >>> thing, possibly a graph of RDF triples. >>> >>> >>> >>> However, in a separate part of the thread, *Jeremy* wrote: >>> >>> >>> >>> Ø few people will care to name the representation / graph at all. >>> >>> >>> >>> In other words, the URIs R1 and R2 are usually not treated with much >>> respect. So it is unlikely that we would be in the business of making >>> sameAs statements about these. >>> >>> >>> >>> Simon >>> >>> >>> >>> >>> >>> *From:* Rob Atkinson [mailto:rob@metalinkage.com.au >>> <rob@metalinkage.com.au>] >>> *Sent:* Saturday, 3 September 2016 8:05 AM >>> *To:* janowicz@ucsb.edu; Frans Knibbe <frans.knibbe@geodan.nl> >>> <frans.knibbe@geodan.nl> >>> *Cc:* Joshua Lieberman <jlieberman@tumblingwalls.com> >>> <jlieberman@tumblingwalls.com>; Jeremy Tandy <jeremy.tandy@gmail.com> >>> <jeremy.tandy@gmail.com>; SDW WG Public List <public-sdw-wg@w3.org> >>> <public-sdw-wg@w3.org> >>> *Subject:* Re: Clarification required: BP6 "use HTTP URIs for spatial >>> things" >>> >>> >>> >>> >>> >>> A few things - this is a rich discussion and we have identified several >>> parts (which is probably why the original issue was hard to pin down) >>> >>> >>> >>> I'm glad we have coaxed one elephant out - the sameAs semantics issue. >>> For me this is the litmus test whether a URL can be used as a URI for a >>> thing or not. >>> >>> >>> >>> (and this is where one of the issues about SIRF Jeremy raised comes in - >>> but I dont think we need to worry about specific approach, rather the >>> criteria for whether a URI is a good one for identification purposes. I >>> think we simply make a strong statement that you dont use a URL as a URI if >>> it is not stable and it does not make sense to use owl:sameAs. >>> >>> This pretty much rules out any direct URL to a single representation: >>> >>> >>> >>> consider two resources with URIs R1 and R2 >>> >>> >>> >>> R1 ns:dateEdited 12/1/2001 >>> >>> >>> >>> R2 ns:dateEdited 6/6/2006 >>> >>> >>> >>> R1 owl:sameAs R2 then leads to ambiguity regarding the value of the >>> functional property ns:dateEdited >>> >>> >>> >>> however >>> >>> >>> >>> U1 --303--> R1 >>> >>> U2 --303--> R2 >>> >>> >>> >>> can (and should be) represented as >>> >>> U1 ns:hasRepresentation R1 >>> >>> U2 ns:hasRepresentation R2 >>> >>> >>> >>> U1 owl:sameAS U2 >>> >>> entails >>> >>> U1 ns:hasRepresentation R1, R2 >>> >>> >>> >>> which doesnt make any stupid statements about the properties. It also >>> allows us to make useful metadata statements about R1, R2 as required. >>> >>> >>> >>> Whilst this is a general concern, we see issues of identification >>> stability, multiple representations, non-unique naming being significant to >>> spatial data and I think we can and should therefore extend the general >>> DWBP with an example using spatial representations and provide a more >>> concrete best practice. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Sat, 3 Sep 2016 at 00:40 Krzysztof Janowicz <janowicz@ucsb.edu> >>> wrote: >>> >>> I am no expert on the matter, but several sources tell me that if <A> >>> <owl:sameAs> <B>, then all statements that can be made about A will also be >>> true for B, and vice versa. It seems that the lighthouse example breaks at >>> that point. For example, in Jeremy's example one of the lighthouse >>> representations has a height of 41 m. It is likely that that statement will >>> be false for the representation of the lighthouse as a ruin. >>> >>> >>> >>> Can we be sure that if we recommend using owl:sameAs to assert that two >>> resources are really the same thing, everyone and everything is aware of >>> the logical consequences? >>> >>> >>> >>> This is exactly the key point. If A owl:sameAs B than A and B signify >>> the same entity and thus every *statement* about A is a statement about B. >>> It works well with Jeremy's example. The fact that the ruin no longer is >>> 41m tall is an example of the need for spatiotemporal scoping of predicates >>> not a shortcoming of owl:sameAs. Also, keep in mind that RDF statements >>> have nothing to do with facts or truth; they are just sets of statements. >>> This is were the power of RDF comes from. >>> >>> Best, >>> Krzysztof >>> >>> >>> >>> >>> >>> On 09/02/2016 02:20 AM, Frans Knibbe wrote: >>> >>> >>> >>> On 1 September 2016 at 23:42, Krzysztof Janowicz <janowicz@ucsb.edu> >>> wrote: >>> >>> >>> Hi, >>> >>> >>> So as representations, these are not “owl:sameAs”. >>> >>> >>> >>> Just for clarification. owl:sameAs is only concerned with the mapping of >>> IRIs to (real world) entities and not 'representations' (leaving aside the >>> fact that everything is a representation in some sense). I.e., it is about >>> 'identity'. To give an extreme example, a URI may refer to the Eddystone >>> Lighthouse which may be classified as /Lighthouse/ in some repository. >>> Another URI established 50 years from now can still refer to this >>> particular (4th) lighthouse and classify it as a /Ruin/. Another 50 years >>> into the future, there may be yet another URI that refers to the fact that >>> at some stage there was a ruin here of the 4th lighthouse called Eddystone >>> while there is nothing physical left of it, and, thus, it is neither >>> classified as /Ruin/ nor /Lighthouse/. In fact, we do not even need to >>> introduce the concept of "real world" here as we can also establish a >>> sameAs relation between two URIs that point to Zeus. Please note that this >>> is different from establish a sameAs link between a particular statue of >>> Zeus in a particular museum and Zeus as the god of thunder. Finally, the >>> purpose of establishing sameAs links is typically data fusion/conflation >>> (no matter whether this is done ad-hoc, manually, or (offline) >>> computationally) . >>> >>> >>> >>> I am no expert on the matter, but several sources tell me that if <A> >>> <owl:sameAs> <B>, then all statements that can be made about A will also be >>> true for B, and vice versa. It seems that the lighthouse example breaks at >>> that point. For example, in Jeremy's example one of the lighthouse >>> representations has a height of 41 m. It is likely that that statement will >>> be false for the representation of the lighthouse as a ruin. >>> >>> >>> >>> Can we be sure that if we recommend using owl:sameAs to assert that two >>> resources are really the same thing, everyone and everything is aware of >>> the logical consequences? >>> >>> >>> >>> Regards, >>> >>> Frans >>> >>> >>> >>> >>> >>> >>> Best, >>> Jano >>> >>> >>> On 08/31/2016 06:38 AM, Joshua Lieberman wrote: >>> >>> Jeremy, >>> >>> >>> >>> So as representations, these are not “owl:sameAs”. We assume that as >>> feature data, each refers to a real world entity, but we don’t assert that >>> this VerticalObstruction is the same individual as this >>> MaritimeNavigationAid. We just are suspecting or asserting that the same >>> real world thing is being discerned in two different ways. Someone may >>> define a lighthouse class as subclassing both, otherwise a slightly >>> specialized relation (e.g. sdwgeo:sameRealWorldEntityAs) would be useful >>> here. >>> >>> >>> >>> Josh >>> >>> >>> >>> On Aug 31, 2016, at 8:41 AM, Jeremy Tandy <jeremy.tandy@gmail.com> >>> wrote: >>> >>> >>> >>> > That still leaves a gap in expressing whether two feature data >>> entities represent the same real world entity. Perhaps we need a >>> "sameFeatureAs" predicate to address this. >>> >>> >>> >>> @josh - can we clarify my understanding please? >>> >>> >>> >>> In the BP doc §4 "Spatial things, features and geometry" [1] I use a >>> lighthouse example, so I'll continue with that ... >>> >>> >>> >>> We have one real lighthouse (Eddystone Lighthouse) that is discerned as >>> a different Type by different communities: "VerticalObstruction" and >>> "MaritimeNavigationAid". In ISO 19100 parlance, these are two distinct >>> feature types. The two "Features" might be encoded in GML as follows >>> (forgive any errors in my illustrative example): >>> >>> >>> >>> <VerticalObstruction gml:id="a"> >>> >>> <gml:name>Eddystone</gml:name> >>> >>> <gml:identifier codeSpace=" >>> http://example.com/sar/features/vo/">EDY</gml:identifier> >>> >>> <geometry> >>> >>> <gml:Point gml:id="a-p1" srsDimension="2" srsName="EPSG:4326"> >>> >>> <gml:pos>50.184 -4.268</gml:pos> >>> >>> </gml:Point> >>> >>> </geometry> >>> >>> <height uom="m">41</height> >>> >>> </VerticalObstruction> >>> >>> >>> >>> <MaritimeNavigationAid gml:id="b"> >>> >>> <gml:name>Eddystone Lighthouse</gml:name> >>> >>> <gml:identifier codeSpace="http://example.org/maritime/navaid/ >>> ">2650253</gml:identifier> >>> >>> <geo> >>> >>> <gml:Point gml:id="b-p1" srsDimension="2" srsName="EPSG:4326"> >>> >>> <gml:pos>50.2 -4.3</gml:pos> >>> >>> </gml:Point> >>> >>> </geo> >>> >>> <lightCharacteristic> >>> >>> ... >>> >>> </lightCharacteristic> >>> >>> </MaritimeNavigationAid> >>> >>> >>> >>> So we have two Features (which we collectively have agreed are "spatial >>> things"), with identifiers <http://example.com/sar/features/vo/EDY> and >>> <http://example.org/maritime/navaid/2650253>. Respectively, the XML >>> elements that describe these features are identified as "a" and "b" using >>> the @gml:id attribute. >>> >>> >>> >>> If we are using "indirect identification" then _both_ < >>> http://example.com/sar/features/vo/EDY> and < >>> http://example.org/maritime/navaid/2650253> are treated as identifiers >>> for the _real_ Eddystone Lighthouse; we simply don't care to differentiate >>> between the real world thing and the information record. In which case, >>> <owl:sameAs> would seem sufficient? The "height" and "lightCharacteristic" >>> properties are both applicable to the real Eddystone Lighthouse. Some >>> judgement would be required to decide which point geometry ("geo" or >>> "geometry" property) is considered "best". >>> >>> >>> >>> The way I think about it, @gml:id is more like the identifier for a >>> named graph; a container for a set of properties ... >>> >>> >>> >>> Am I missing something??? >>> >>> >>> >>> Jeremy >>> >>> >>> >>> >>> >>> [1]: http://w3c.github.io/sdw/bp/#spatial-things-features-and-geometry >>> >>> >>> >>> On Wed, 31 Aug 2016 at 12:42 Joshua Lieberman < >>> jlieberman@tumblingwalls.com> wrote: >>> >>> If we are asserting that spatial data on the Web is "always" feature >>> data that represents a real world entity, then yes, we don't have the >>> general Web "is it or isn't it physical" ambiguity and can assume that a >>> feature data identifier also and indirectly identifies the feature. That >>> still leaves a gap in expressing whether two feature data entities >>> represent the same real world entity. Perhaps we need a "sameFeatureAs" >>> predicate to address this. >>> >>> >>> >>> Josh >>> >>> Joshua Lieberman, Ph.D. >>> >>> Principal, Tumbling Walls Consultancy >>> >>> Tel/Direct: +1 617-431-6431 >>> >>> jlieberman@tumblingwalls.com >>> >>> >>> On Aug 31, 2016, at 07:29, Frans Knibbe <frans.knibbe@geodan.nl> wrote: >>> >>> Hello, >>> >>> >>> >>> As stated before, I don't think the httpRange-14 problem exists in our >>> domain of discourse. I think (and hope) that confusion can only occur when >>> the things that are described are digital things, or things that can be >>> transmitted over a computer network, like web pages or mail boxes. It seems >>> to me that spatial things are never that type of thing. Therefore there is >>> no reason to take precautions against possible confusion. >>> >>> >>> >>> That probably means +1. >>> >>> >>> >>> Greetings, >>> >>> Frans >>> >>> >>> >>> >>> >>> >>> >>> On 31 August 2016 at 09:50, Jeremy Tandy <jeremy.tandy@gmail.com> wrote: >>> >>> Thanks Rob & Clemens ... >>> >>> >>> >>> On Wed, 31 Aug 2016 at 08:30, Clemens Portele < >>> portele@interactive-instruments.de> wrote: >>> >>> +1 >>> >>> >>> >>> On 30 August 2016 at 10:10:26, Jeremy Tandy (jeremy.tandy@gmail.com) >>> wrote: >>> >>> Hi. It would be good to close this issue out & include our collective >>> recommendation in the BP doc working draft. >>> >>> >>> >>> PROPOSAL: SDW working group recommends use of "indirect identifiers" for >>> spatial things >>> >>> >>> >>> ... I'll start the voting. >>> >>> >>> >>> +1 >>> >>> >>> >>> Jeremy >>> >>> >>> >>> (BTW, to make sense of the PROPOSAL you'll need to read the email thread) >>> >>> >>> >>> On Fri, 26 Aug 2016 at 10:12 Linda van den Brink < >>> l.vandenbrink@geonovum.nl> wrote: >>> >>> So… do we agree we can recommend indirect identifiers, or do we try to >>> fix the issue with getting the correct identifier as Rob describes? >>> >>> >>> >>> While waiting for this I’ve updated the issue and the text referring to >>> the issue in BP6. >>> >>> >>> >>> *Van:* Rob Atkinson [mailto:rob@metalinkage.com.au] >>> *Verzonden:* woensdag 24 augustus 2016 13:56 >>> *Aan:* Jeremy Tandy; Phil Archer; Linda van den Brink; Bill Roberts >>> >>> >>> *CC:* SDW WG Public List >>> >>> *Onderwerp:* Re: Clarification required: BP6 "use HTTP URIs for spatial >>> things" >>> >>> >>> >>> Hi >>> >>> >>> >>> Agree this is a real concern - people cant be blamed for doing the >>> obvious, if dumb, thing.. >>> >>> >>> >>> I think we should take note of best practice in the HTML world - which >>> is often to include a citable link to a resource in the rendered view. Or >>> a "share" or something similar. We can also put fairly explicit annotation >>> in machine-readable code - stating that the resource is about the URI - and >>> even notes saying when citing this resource use the URI.... >>> >>> >>> >>> I'd also like to see browsers evolve to offer you the original link or >>> the redirected when cutting and pasting - how hard can it be! >>> >>> >>> >>> Maybe we can get Ed to ask around Google Chrome team for suggestions on >>> how best to handle this :-) >>> >>> >>> >>> Rob >>> >>> >>> >>> >>> >>> >>> >>> On Wed, 24 Aug 2016 at 18:27 Jeremy Tandy <jeremy.tandy@gmail.com> >>> wrote: >>> >>> Yes, I think so ... And we should do so if we are recommending "indirect >>> identification". >>> >>> Jeremy >>> >>> On Wed, 24 Aug 2016 at 09:24, Phil Archer <phila@w3.org> wrote: >>> >>> Bill's comments also made me think about some of the classic arguments, >>> such as that a lake doesn't have a last updated date and isn't 435KB >>> big. Which are true, however, that kind of metadata generally comes from >>> the server, i.e. the HTTP layer. That's an over simplification but the >>> point is that it is relatively easy to avoid deliberately creating >>> misleading metadata - metadata about the doc rather than the thing it >>> describes - and it's also generally easy to avoid looking for that >>> metadata. >>> >>> Is there scope for some BP advice there? >>> >>> Phil. >>> >>> On 24/08/2016 08:25, Jeremy Tandy wrote: >>> > Thanks Linda. More clear examples where being "correct" (in terms of >>> > avoiding uri collisions by using two distinct uris) is making things >>> worse >>> > because users take the wrong one! >>> > >>> > So, as a WG, are we content to recommend this "indirect identification" >>> > pattern where thing & info resource identifiers are conflated? >>> > >>> > Bill has added some good points about how to avoid impacts of uri >>> > collision- by using the (dataset) metadata to talk about licenses and >>> > creators for the information ... >>> > On Wed, 24 Aug 2016 at 07:52, Linda van den Brink < >>> l.vandenbrink@geonovum.nl> >>> > wrote: >>> > >>> >> Experience from the Netherlands: we have the id/doc pattern in our URI >>> >> strategy, based on the Cool URIs note [8] and the ISA study on >>> persistent >>> >> identifiers [9]. >>> >> >>> >> >>> >> >>> >> That being said, same as Bill I also notice data users getting >>> confused >>> >> and generally using the /doc/ URI as that is the one they can >>> >>>
Received on Thursday, 8 September 2016 10:06:01 UTC