- From: Jeremy Tandy <jeremy.tandy@gmail.com>
- Date: Mon, 05 Sep 2016 16:18:12 +0000
- To: Rob Atkinson <rob@metalinkage.com.au>, janowicz@ucsb.edu, Simon.Cox@csiro.au, frans.knibbe@geodan.nl
- Cc: jlieberman@tumblingwalls.com, public-sdw-wg@w3.org
- Message-ID: <CADtUq_1+3hQhx1bkig74NW7UMRnJD+kRBuSpmSy++epLN2ceLg@mail.gmail.com>
Hi - thanks to all of you who have contributed to this discussion. It's been big, but I think worthwhile. @andrea - thanks for taking the time to summarise the guidance from the UK (from memory, I think you got it spot on). I recall from those discussions that INSPIRE's "spatial object" (identified with /so/ pattern) is an information object. The examples you use illustrate the difficulty in combining ("fusing") information from different sources; do the "Railway Station Node" < http://location.data.gov.uk/so/tn/RailwayStationNode/nwkr/123456> (with properties including point location and number of platforms) and "Railway Station Area" < http://location.data.gov.uk/so/tn/RailwayStationArea/nwkr/456789> (with properties including the area describing the "topographical limits of the facilities of a railway station -buildings, railway yards, installations and equipment") actually identify Manchester Piccadilly Railway Station (< http://transport.data.gov.uk/id/station/MAN>) - or do they identify aspects of Manchester Piccadilly Railway Station? This is why reconciliation of identifiers is hard - and beyond our scope. @eparsons once said "here be dragons" [0] Based on implementation experience, the /id/ and /doc/ pattern tends to cause more confusion that it solves [1A][1B] ... I conclude from the entire thread that: 1) indirect identification for spatial things is preferred in situations where people do not need to explicitly refer to the document resource that describes the spatial thing (the "representation") ... This means that we will only have a single URI in use for a spatial thing and its representation - no need for /id/ and /doc/ URI pairs. @bill had some suggestions (see [1A]) about how to avoid confusion by putting the (metadata) properties about the information resource, such as "last-update-time", in the dataset description. Here we see evidence of the pattern of metadata for an individual item being inherited from a dataset description. This relates to the more general requirement about how we attach metadata in a consistent way. I mentioned this in the discussion about crowd-sourced spatial data [2] ... 2) where URI collision (caused by using the same URI for spatial thing and the representation) is a problem, or the representation is provided via URL of some other system (e.g. a WFS endpoint's getFeature request) then the representation itself will also be identified using a URI (or URL) ... the URI for the spatial thing should be resolved by redirecting the HTTP request to the URL of the representation 3) there is no overarching pattern that can be used in all situations to identify the representation (because a representation will often be provided by a URL for a service endpoint) - although the /id/ and /doc/ pattern is widely used 4) although proliferation of identifiers for spatial things is discouraged, it is conceivable that a given spatial thing may be identified by through multiple URIs (the "non-unique naming problem" - different people each use their own identifiers), e.g. U1 and U2, each of which may resolve to provide different information about the spatial thing. <owl:sameAs> is the appropriate predicate / relationship to link these URIs together (@jano refers to this as a "co-reference solution"). If the representations associated with U1 and U2 are explicitly identified, e.g. U1 has representation R1 and U2 has representation R2, <owl:sameAs> must not be used to link the URLs of the representations as these are _not_ the same. However, given that { <U1> owl:sameAs <U2> }, then U1 has representations R1 _and_ R2 ... and both representations also apply to U2 too. This provides a *basis* for data fusion but will likely require further insight to determine which properties are considered valid (ref. the previous comments about inaccurate data and "spatiotemporal scoping"). 5) a spatial thing may have multiple representations; content negotiation provides a mechanism for a user agent to select a preferred MIME-type, charset, encoding and language, but there is a gap in (best) practice about how a user agent can request a representation using a specific schema or data model (see [3]) ... the Linked Data API, SIRF and "profile" Link Header (RFC 6909) provide examples of how this might be achieved 6) there is a gap in (best) practice about how a server might advertise the availability of multiple representations; e.g. based on different data sources and/or different data models. [0]: https://lists.w3.org/Archives/Public/public-sdw-wg/2015Sep/0058.html [1A]: https://lists.w3.org/Archives/Public/public-sdw-wg/2016Aug/0141.html [1B]: https://lists.w3.org/Archives/Public/public-sdw-wg/2016Aug/0148.html [2]: https://lists.w3.org/Archives/Public/public-sdw-wg/2016Sep/0058.html [3]: http://geo4web-testbed.github.io/topic4/#h.n0gkernttzw0 <http://geo4web-testbed.github.io/topic4/#h.n0gkernttzw0> On Mon, 5 Sep 2016 at 08:05 Rob Atkinson <rob@metalinkage.com.au> wrote: > > I am perhaps missing the point here and there are a lot of terms here > whose precise meaning may or may not be well understood ( conflation, > co-resolution, fusion ) , but isnt owl:sameAs also specifically a statement > that some sort of fusion by entailment of shared properties using (some > flavour of OWL) is meaningful? > > The issue here is nevertheless about the rationale for using "indirect > URIs" - I still think the example is relevant - and such an example could > be used to explain why it is important to do so. > > It also addresses the issue of whether some canonical property (with > appropriate subProperty specialisations) for the relationship between a URI > and one or more resources is needed - and whether a BP can be identified > for this. > > Rob > > > On Mon, 5 Sep 2016 at 13:31 Krzysztof Janowicz <janowicz@ucsb.edu> wrote: > >> >> >> >> *Hi, * >> >> *Rob* wrote: >> >> >> >> Ø consider two resources with URIs R1 and R2 >> >> Ø >> >> Ø R1 ns:dateEdited 12/1/2001 >> >> Ø >> >> Ø R2 ns:dateEdited 6/6/2006 >> >> Ø >> >> Ø R1 owl:sameAs R2 then leads to ambiguity regarding the value of the >> functional property ns:dateEdited >> >> >> >> >> Note that this is not a owl:sameAs issue. I think it is very important to >> distinguish between co-reference resolutions (using owl:SameAs, >> skos:closeMatch,...) and data conflation (data fusion). owl:SameAs handles >> co-reference resolution. Data fusion is still an open research issue >> (despite tons of work in the DB community). The fact that ns:dateEdited may >> be defined as a functional property in some ontology will also have no >> effect on the RDF triples as such. >> >> Best, >> Krzysztof >> >> >> >> >> On 09/04/2016 05:04 PM, Simon.Cox@csiro.au wrote: >> >> *Rob* wrote: >> >> >> >> Ø consider two resources with URIs R1 and R2 >> >> Ø >> >> Ø R1 ns:dateEdited 12/1/2001 >> >> Ø >> >> Ø R2 ns:dateEdited 6/6/2006 >> >> Ø >> >> Ø R1 owl:sameAs R2 then leads to ambiguity regarding the value of the >> functional property ns:dateEdited >> >> >> >> Where R1 and R2 are representations or descriptions of a (real-world) >> thing, possibly a graph of RDF triples. >> >> >> >> However, in a separate part of the thread, *Jeremy* wrote: >> >> >> >> Ø few people will care to name the representation / graph at all. >> >> >> >> In other words, the URIs R1 and R2 are usually not treated with much >> respect. So it is unlikely that we would be in the business of making >> sameAs statements about these. >> >> >> >> Simon >> >> >> >> >> >> *From:* Rob Atkinson [mailto:rob@metalinkage.com.au >> <rob@metalinkage.com.au>] >> *Sent:* Saturday, 3 September 2016 8:05 AM >> *To:* janowicz@ucsb.edu; Frans Knibbe <frans.knibbe@geodan.nl> >> <frans.knibbe@geodan.nl> >> *Cc:* Joshua Lieberman <jlieberman@tumblingwalls.com> >> <jlieberman@tumblingwalls.com>; Jeremy Tandy <jeremy.tandy@gmail.com> >> <jeremy.tandy@gmail.com>; SDW WG Public List <public-sdw-wg@w3.org> >> <public-sdw-wg@w3.org> >> *Subject:* Re: Clarification required: BP6 "use HTTP URIs for spatial >> things" >> >> >> >> >> >> A few things - this is a rich discussion and we have identified several >> parts (which is probably why the original issue was hard to pin down) >> >> >> >> I'm glad we have coaxed one elephant out - the sameAs semantics issue. >> For me this is the litmus test whether a URL can be used as a URI for a >> thing or not. >> >> >> >> (and this is where one of the issues about SIRF Jeremy raised comes in - >> but I dont think we need to worry about specific approach, rather the >> criteria for whether a URI is a good one for identification purposes. I >> think we simply make a strong statement that you dont use a URL as a URI if >> it is not stable and it does not make sense to use owl:sameAs. >> >> This pretty much rules out any direct URL to a single representation: >> >> >> >> consider two resources with URIs R1 and R2 >> >> >> >> R1 ns:dateEdited 12/1/2001 >> >> >> >> R2 ns:dateEdited 6/6/2006 >> >> >> >> R1 owl:sameAs R2 then leads to ambiguity regarding the value of the >> functional property ns:dateEdited >> >> >> >> however >> >> >> >> U1 --303--> R1 >> >> U2 --303--> R2 >> >> >> >> can (and should be) represented as >> >> U1 ns:hasRepresentation R1 >> >> U2 ns:hasRepresentation R2 >> >> >> >> U1 owl:sameAS U2 >> >> entails >> >> U1 ns:hasRepresentation R1, R2 >> >> >> >> which doesnt make any stupid statements about the properties. It also >> allows us to make useful metadata statements about R1, R2 as required. >> >> >> >> Whilst this is a general concern, we see issues of identification >> stability, multiple representations, non-unique naming being significant to >> spatial data and I think we can and should therefore extend the general >> DWBP with an example using spatial representations and provide a more >> concrete best practice. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, 3 Sep 2016 at 00:40 Krzysztof Janowicz <janowicz@ucsb.edu> wrote: >> >> I am no expert on the matter, but several sources tell me that if <A> >> <owl:sameAs> <B>, then all statements that can be made about A will also be >> true for B, and vice versa. It seems that the lighthouse example breaks at >> that point. For example, in Jeremy's example one of the lighthouse >> representations has a height of 41 m. It is likely that that statement will >> be false for the representation of the lighthouse as a ruin. >> >> >> >> Can we be sure that if we recommend using owl:sameAs to assert that two >> resources are really the same thing, everyone and everything is aware of >> the logical consequences? >> >> >> >> This is exactly the key point. If A owl:sameAs B than A and B signify the >> same entity and thus every *statement* about A is a statement about B. It >> works well with Jeremy's example. The fact that the ruin no longer is 41m >> tall is an example of the need for spatiotemporal scoping of predicates not >> a shortcoming of owl:sameAs. Also, keep in mind that RDF statements have >> nothing to do with facts or truth; they are just sets of statements. This >> is were the power of RDF comes from. >> >> Best, >> Krzysztof >> >> >> >> >> >> On 09/02/2016 02:20 AM, Frans Knibbe wrote: >> >> >> >> On 1 September 2016 at 23:42, Krzysztof Janowicz <janowicz@ucsb.edu> >> wrote: >> >> >> Hi, >> >> >> So as representations, these are not “owl:sameAs”. >> >> >> >> Just for clarification. owl:sameAs is only concerned with the mapping of >> IRIs to (real world) entities and not 'representations' (leaving aside the >> fact that everything is a representation in some sense). I.e., it is about >> 'identity'. To give an extreme example, a URI may refer to the Eddystone >> Lighthouse which may be classified as /Lighthouse/ in some repository. >> Another URI established 50 years from now can still refer to this >> particular (4th) lighthouse and classify it as a /Ruin/. Another 50 years >> into the future, there may be yet another URI that refers to the fact that >> at some stage there was a ruin here of the 4th lighthouse called Eddystone >> while there is nothing physical left of it, and, thus, it is neither >> classified as /Ruin/ nor /Lighthouse/. In fact, we do not even need to >> introduce the concept of "real world" here as we can also establish a >> sameAs relation between two URIs that point to Zeus. Please note that this >> is different from establish a sameAs link between a particular statue of >> Zeus in a particular museum and Zeus as the god of thunder. Finally, the >> purpose of establishing sameAs links is typically data fusion/conflation >> (no matter whether this is done ad-hoc, manually, or (offline) >> computationally) . >> >> >> >> I am no expert on the matter, but several sources tell me that if <A> >> <owl:sameAs> <B>, then all statements that can be made about A will also be >> true for B, and vice versa. It seems that the lighthouse example breaks at >> that point. For example, in Jeremy's example one of the lighthouse >> representations has a height of 41 m. It is likely that that statement will >> be false for the representation of the lighthouse as a ruin. >> >> >> >> Can we be sure that if we recommend using owl:sameAs to assert that two >> resources are really the same thing, everyone and everything is aware of >> the logical consequences? >> >> >> >> Regards, >> >> Frans >> >> >> >> >> >> >> Best, >> Jano >> >> >> On 08/31/2016 06:38 AM, Joshua Lieberman wrote: >> >> Jeremy, >> >> >> >> So as representations, these are not “owl:sameAs”. We assume that as >> feature data, each refers to a real world entity, but we don’t assert that >> this VerticalObstruction is the same individual as this >> MaritimeNavigationAid. We just are suspecting or asserting that the same >> real world thing is being discerned in two different ways. Someone may >> define a lighthouse class as subclassing both, otherwise a slightly >> specialized relation (e.g. sdwgeo:sameRealWorldEntityAs) would be useful >> here. >> >> >> >> Josh >> >> >> >> On Aug 31, 2016, at 8:41 AM, Jeremy Tandy <jeremy.tandy@gmail.com> wrote: >> >> >> >> > That still leaves a gap in expressing whether two feature data entities >> represent the same real world entity. Perhaps we need a "sameFeatureAs" >> predicate to address this. >> >> >> >> @josh - can we clarify my understanding please? >> >> >> >> In the BP doc §4 "Spatial things, features and geometry" [1] I use a >> lighthouse example, so I'll continue with that ... >> >> >> >> We have one real lighthouse (Eddystone Lighthouse) that is discerned as a >> different Type by different communities: "VerticalObstruction" and >> "MaritimeNavigationAid". In ISO 19100 parlance, these are two distinct >> feature types. The two "Features" might be encoded in GML as follows >> (forgive any errors in my illustrative example): >> >> >> >> <VerticalObstruction gml:id="a"> >> >> <gml:name>Eddystone</gml:name> >> >> <gml:identifier codeSpace=" >> http://example.com/sar/features/vo/">EDY</gml:identifier> >> >> <geometry> >> >> <gml:Point gml:id="a-p1" srsDimension="2" srsName="EPSG:4326"> >> >> <gml:pos>50.184 -4.268</gml:pos> >> >> </gml:Point> >> >> </geometry> >> >> <height uom="m">41</height> >> >> </VerticalObstruction> >> >> >> >> <MaritimeNavigationAid gml:id="b"> >> >> <gml:name>Eddystone Lighthouse</gml:name> >> >> <gml:identifier codeSpace="http://example.org/maritime/navaid/ >> ">2650253</gml:identifier> >> >> <geo> >> >> <gml:Point gml:id="b-p1" srsDimension="2" srsName="EPSG:4326"> >> >> <gml:pos>50.2 -4.3</gml:pos> >> >> </gml:Point> >> >> </geo> >> >> <lightCharacteristic> >> >> ... >> >> </lightCharacteristic> >> >> </MaritimeNavigationAid> >> >> >> >> So we have two Features (which we collectively have agreed are "spatial >> things"), with identifiers <http://example.com/sar/features/vo/EDY> and < >> http://example.org/maritime/navaid/2650253>. Respectively, the XML >> elements that describe these features are identified as "a" and "b" using >> the @gml:id attribute. >> >> >> >> If we are using "indirect identification" then _both_ < >> http://example.com/sar/features/vo/EDY> and < >> http://example.org/maritime/navaid/2650253> are treated as identifiers >> for the _real_ Eddystone Lighthouse; we simply don't care to differentiate >> between the real world thing and the information record. In which case, >> <owl:sameAs> would seem sufficient? The "height" and "lightCharacteristic" >> properties are both applicable to the real Eddystone Lighthouse. Some >> judgement would be required to decide which point geometry ("geo" or >> "geometry" property) is considered "best". >> >> >> >> The way I think about it, @gml:id is more like the identifier for a named >> graph; a container for a set of properties ... >> >> >> >> Am I missing something??? >> >> >> >> Jeremy >> >> >> >> >> >> [1]: http://w3c.github.io/sdw/bp/#spatial-things-features-and-geometry >> >> >> >> On Wed, 31 Aug 2016 at 12:42 Joshua Lieberman < >> jlieberman@tumblingwalls.com> wrote: >> >> If we are asserting that spatial data on the Web is "always" feature data >> that represents a real world entity, then yes, we don't have the general >> Web "is it or isn't it physical" ambiguity and can assume that a feature >> data identifier also and indirectly identifies the feature. That still >> leaves a gap in expressing whether two feature data entities represent the >> same real world entity. Perhaps we need a "sameFeatureAs" predicate to >> address this. >> >> >> >> Josh >> >> Joshua Lieberman, Ph.D. >> >> Principal, Tumbling Walls Consultancy >> >> Tel/Direct: +1 617-431-6431 >> >> jlieberman@tumblingwalls.com >> >> >> On Aug 31, 2016, at 07:29, Frans Knibbe <frans.knibbe@geodan.nl> wrote: >> >> Hello, >> >> >> >> As stated before, I don't think the httpRange-14 problem exists in our >> domain of discourse. I think (and hope) that confusion can only occur when >> the things that are described are digital things, or things that can be >> transmitted over a computer network, like web pages or mail boxes. It seems >> to me that spatial things are never that type of thing. Therefore there is >> no reason to take precautions against possible confusion. >> >> >> >> That probably means +1. >> >> >> >> Greetings, >> >> Frans >> >> >> >> >> >> >> >> On 31 August 2016 at 09:50, Jeremy Tandy <jeremy.tandy@gmail.com> wrote: >> >> Thanks Rob & Clemens ... >> >> >> >> On Wed, 31 Aug 2016 at 08:30, Clemens Portele < >> portele@interactive-instruments.de> wrote: >> >> +1 >> >> >> >> On 30 August 2016 at 10:10:26, Jeremy Tandy (jeremy.tandy@gmail.com) >> wrote: >> >> Hi. It would be good to close this issue out & include our collective >> recommendation in the BP doc working draft. >> >> >> >> PROPOSAL: SDW working group recommends use of "indirect identifiers" for >> spatial things >> >> >> >> ... I'll start the voting. >> >> >> >> +1 >> >> >> >> Jeremy >> >> >> >> (BTW, to make sense of the PROPOSAL you'll need to read the email thread) >> >> >> >> On Fri, 26 Aug 2016 at 10:12 Linda van den Brink < >> l.vandenbrink@geonovum.nl> wrote: >> >> So… do we agree we can recommend indirect identifiers, or do we try to >> fix the issue with getting the correct identifier as Rob describes? >> >> >> >> While waiting for this I’ve updated the issue and the text referring to >> the issue in BP6. >> >> >> >> *Van:* Rob Atkinson [mailto:rob@metalinkage.com.au] >> *Verzonden:* woensdag 24 augustus 2016 13:56 >> *Aan:* Jeremy Tandy; Phil Archer; Linda van den Brink; Bill Roberts >> >> >> *CC:* SDW WG Public List >> >> *Onderwerp:* Re: Clarification required: BP6 "use HTTP URIs for spatial >> things" >> >> >> >> Hi >> >> >> >> Agree this is a real concern - people cant be blamed for doing the >> obvious, if dumb, thing.. >> >> >> >> I think we should take note of best practice in the HTML world - which is >> often to include a citable link to a resource in the rendered view. Or a >> "share" or something similar. We can also put fairly explicit annotation in >> machine-readable code - stating that the resource is about the URI - and >> even notes saying when citing this resource use the URI.... >> >> >> >> I'd also like to see browsers evolve to offer you the original link or >> the redirected when cutting and pasting - how hard can it be! >> >> >> >> Maybe we can get Ed to ask around Google Chrome team for suggestions on >> how best to handle this :-) >> >> >> >> Rob >> >> >> >> >> >> >> >> On Wed, 24 Aug 2016 at 18:27 Jeremy Tandy <jeremy.tandy@gmail.com> wrote: >> >> Yes, I think so ... And we should do so if we are recommending "indirect >> identification". >> >> Jeremy >> >> On Wed, 24 Aug 2016 at 09:24, Phil Archer <phila@w3.org> wrote: >> >> Bill's comments also made me think about some of the classic arguments, >> such as that a lake doesn't have a last updated date and isn't 435KB >> big. Which are true, however, that kind of metadata generally comes from >> the server, i.e. the HTTP layer. That's an over simplification but the >> point is that it is relatively easy to avoid deliberately creating >> misleading metadata - metadata about the doc rather than the thing it >> describes - and it's also generally easy to avoid looking for that >> metadata. >> >> Is there scope for some BP advice there? >> >> Phil. >> >> On 24/08/2016 08:25, Jeremy Tandy wrote: >> > Thanks Linda. More clear examples where being "correct" (in terms of >> > avoiding uri collisions by using two distinct uris) is making things >> worse >> > because users take the wrong one! >> > >> > So, as a WG, are we content to recommend this "indirect identification" >> > pattern where thing & info resource identifiers are conflated? >> > >> > Bill has added some good points about how to avoid impacts of uri >> > collision- by using the (dataset) metadata to talk about licenses and >> > creators for the information ... >> > On Wed, 24 Aug 2016 at 07:52, Linda van den Brink < >> l.vandenbrink@geonovum.nl> >> > wrote: >> > >> >> Experience from the Netherlands: we have the id/doc pattern in our URI >> >> strategy, based on the Cool URIs note [8] and the ISA study on >> persistent >> >> identifiers [9]. >> >> >> >> >> >> >> >> That being said, same as Bill I also notice data users getting confused >> >> and generally using the /doc/ URI as that is the one they can >> >>
Received on Monday, 5 September 2016 16:18:56 UTC