- From: Krzysztof Janowicz <janowicz@ucsb.edu>
- Date: Thu, 1 Sep 2016 14:42:46 -0700
- To: Joshua Lieberman <jlieberman@tumblingwalls.com>, Jeremy Tandy <jeremy.tandy@gmail.com>
- Cc: Frans Knibbe <frans.knibbe@geodan.nl>, SDW WG Public List <public-sdw-wg@w3.org>
- Message-ID: <1190e136-6e0e-cb97-3c13-93b3e17411cc@ucsb.edu>
Hi,
> So as representations, these are not “owl:sameAs”.
Just for clarification. owl:sameAs is only concerned with the mapping of
IRIs to (real world) entities and not 'representations' (leaving aside
the fact that everything is a representation in some sense). I.e., it is
about 'identity'. To give an extreme example, a URI may refer to the
Eddystone Lighthouse which may be classified as /Lighthouse/ in some
repository. Another URI established 50 years from now can still refer to
this particular (4th) lighthouse and classify it as a /Ruin/. Another 50
years into the future, there may be yet another URI that refers to the
fact that at some stage there was a ruin here of the 4th lighthouse
called Eddystone while there is nothing physical left of it, and, thus,
it is neither classified as /Ruin/ nor /Lighthouse/. In fact, we do not
even need to introduce the concept of "real world" here as we can also
establish a sameAs relation between two URIs that point to Zeus. Please
note that this is different from establish a sameAs link between a
particular statue of Zeus in a particular museum and Zeus as the god of
thunder. Finally, the purpose of establishing sameAs links is typically
data fusion/conflation (no matter whether this is done ad-hoc, manually,
or (offline) computationally) .
Best,
Jano
On 08/31/2016 06:38 AM, Joshua Lieberman wrote:
> Jeremy,
>
> So as representations, these are not “owl:sameAs”. We assume that as
> feature data, each refers to a real world entity, but we don’t assert
> that this VerticalObstruction is the same individual as this
> MaritimeNavigationAid. We just are suspecting or asserting that the
> same real world thing is being discerned in two different ways.
> Someone may define a lighthouse class as subclassing both, otherwise a
> slightly specialized relation (e.g. sdwgeo:sameRealWorldEntityAs)
> would be useful here.
>
> Josh
>
>> On Aug 31, 2016, at 8:41 AM, Jeremy Tandy <jeremy.tandy@gmail.com
>> <mailto:jeremy.tandy@gmail.com>> wrote:
>>
>> > That still leaves a gap in expressing whether two feature data
>> entities represent the same real world entity.Perhaps we need a
>> "sameFeatureAs" predicate to address this.
>>
>> @josh - can we clarify my understanding please?
>>
>> In the BP doc §4 "Spatial things, features and geometry" [1] I use a
>> lighthouse example, so I'll continue with that ...
>>
>> We have one real lighthouse (Eddystone Lighthouse) that is discerned
>> as a different Type by different communities: "VerticalObstruction"
>> and "MaritimeNavigationAid". In ISO 19100 parlance, these are two
>> distinct feature types. The two "Features" might be encoded in GML as
>> follows (forgive any errors in my illustrative example):
>>
>> <VerticalObstruction gml:id="a">
>> <gml:name>Eddystone</gml:name>
>> <gml:identifier
>> codeSpace="http://example.com/sar/features/vo/">EDY</gml:identifier
>> <http://example.com/sar/features/vo/%22%3EEDY%3C/gml:identifier>>
>> <geometry>
>> <gml:Point gml:id="a-p1" srsDimension="2" srsName="EPSG:4326">
>> <gml:pos>50.184 -4.268</gml:pos>
>> </gml:Point>
>> </geometry>
>> <height uom="m">41</height>
>> </VerticalObstruction>
>>
>> <MaritimeNavigationAid gml:id="b">
>> <gml:name>Eddystone Lighthouse</gml:name>
>> <gml:identifier
>> codeSpace="http://example.org/maritime/navaid/">2650253</gml:identifier>
>> <geo>
>> <gml:Point gml:id="b-p1" srsDimension="2" srsName="EPSG:4326">
>> <gml:pos>50.2 -4.3</gml:pos>
>> </gml:Point>
>> </geo>
>> <lightCharacteristic>
>> ...
>> </lightCharacteristic>
>> </MaritimeNavigationAid>
>>
>> So we have two Features (which we collectively have agreed are
>> "spatial things"), with identifiers
>> <http://example.com/sar/features/vo/EDY> and
>> <http://example.org/maritime/navaid/2650253>. Respectively, the XML
>> elements that describe these features are identified as "a" and "b"
>> using the @gml:id attribute.
>>
>> If we are using "indirect identification" then _both_
>> <http://example.com/sar/features/vo/EDY> and
>> <http://example.org/maritime/navaid/2650253> are treated as
>> identifiers for the _real_ Eddystone Lighthouse; we simply don't care
>> to differentiate between the real world thing and the information
>> record. In which case, <owl:sameAs> would seem sufficient? The
>> "height" and "lightCharacteristic" properties are both applicable to
>> the real Eddystone Lighthouse. Some judgement would be required to
>> decide which point geometry ("geo" or "geometry" property) is
>> considered "best".
>>
>> The way I think about it, @gml:id is more like the identifier for a
>> named graph; a container for a set of properties ...
>>
>> Am I missing something???
>>
>> Jeremy
>>
>>
>> [1]: http://w3c.github.io/sdw/bp/#spatial-things-features-and-geometry
>>
>> On Wed, 31 Aug 2016 at 12:42 Joshua Lieberman
>> <jlieberman@tumblingwalls.com <mailto:jlieberman@tumblingwalls.com>>
>> wrote:
>>
>> If we are asserting that spatial data on the Web is "always"
>> feature data that represents a real world entity, then yes, we
>> don't have the general Web "is it or isn't it physical" ambiguity
>> and can assume that a feature data identifier also and indirectly
>> identifies the feature. That still leaves a gap in expressing
>> whether two feature data entities represent the same real world
>> entity. Perhaps we need a "sameFeatureAs" predicate to address this.
>>
>> Josh
>>
>> Joshua Lieberman, Ph.D.
>> Principal, Tumbling Walls Consultancy
>> Tel/Direct: +1 617-431-6431
>> jlieberman@tumblingwalls.com <mailto:jlieberman@tumblingwalls.com>
>>
>> On Aug 31, 2016, at 07:29, Frans Knibbe <frans.knibbe@geodan.nl
>> <mailto:frans.knibbe@geodan.nl>> wrote:
>>
>>> Hello,
>>>
>>> As stated before, I don't think the httpRange-14 problem exists
>>> in our domain of discourse. I think (and hope) that confusion
>>> can only occur when the things that are described are digital
>>> things, or things that can be transmitted over a computer
>>> network, like web pages or mail boxes. It seems to me that
>>> spatial things are never that type of thing. Therefore there is
>>> no reason to take precautions against possible confusion.
>>>
>>> That probably means +1.
>>>
>>> Greetings,
>>> Frans
>>>
>>>
>>> On 31 August 2016 at 09:50, Jeremy Tandy <jeremy.tandy@gmail.com
>>> <mailto:jeremy.tandy@gmail.com>> wrote:
>>>
>>> Thanks Rob & Clemens ...
>>>
>>> On Wed, 31 Aug 2016 at 08:30, Clemens Portele
>>> <portele@interactive-instruments.de
>>> <mailto:portele@interactive-instruments.de>> wrote:
>>>
>>> +1
>>>
>>>
>>> On 30 August 2016 at 10:10:26, Jeremy Tandy
>>> (jeremy.tandy@gmail.com <mailto:jeremy.tandy@gmail.com>)
>>> wrote:
>>>
>>>> Hi. It would be good to close this issue out & include
>>>> our collective recommendation in the BP doc working draft.
>>>>
>>>> PROPOSAL: SDW working group recommends use of "indirect
>>>> identifiers" for spatial things
>>>>
>>>> ... I'll start the voting.
>>>>
>>>> +1
>>>>
>>>> Jeremy
>>>>
>>>> (BTW, to make sense of the PROPOSAL you'll need to read
>>>> the email thread)
>>>>
>>>> On Fri, 26 Aug 2016 at 10:12 Linda van den Brink
>>>> <l.vandenbrink@geonovum.nl
>>>> <mailto:l.vandenbrink@geonovum.nl>> wrote:
>>>>
>>>> So… do we agree we can recommend indirect
>>>> identifiers, or do we try to fix the issue with
>>>> getting the correct identifier as Rob describes?
>>>>
>>>>
>>>> While waiting for this I’ve updated the issue and
>>>> the text referring to the issue in BP6.
>>>>
>>>>
>>>> *Van:* Rob Atkinson [mailto:rob@metalinkage.com.au
>>>> <mailto:rob@metalinkage.com.au>]
>>>> *Verzonden:* woensdag 24 augustus 2016 13:56
>>>> *Aan:* Jeremy Tandy; Phil Archer; Linda van den
>>>> Brink; Bill Roberts
>>>>
>>>>
>>>> *CC:* SDW WG Public List
>>>>
>>>> *Onderwerp:* Re: Clarification required: BP6 "use
>>>> HTTP URIs for spatial things"
>>>>
>>>>
>>>> Hi
>>>>
>>>>
>>>> Agree this is a real concern - people cant be
>>>> blamed for doing the obvious, if dumb, thing..
>>>>
>>>>
>>>> I think we should take note of best practice in the
>>>> HTML world - which is often to include a citable
>>>> link to a resource in the rendered view. Or a
>>>> "share" or something similar. We can also put
>>>> fairly explicit annotation in machine-readable code
>>>> - stating that the resource is about the URI - and
>>>> even notes saying when citing this resource use the
>>>> URI....
>>>>
>>>>
>>>> I'd also like to see browsers evolve to offer you
>>>> the original link or the redirected when cutting
>>>> and pasting - how hard can it be!
>>>>
>>>>
>>>> Maybe we can get Ed to ask around Google Chrome
>>>> team for suggestions on how best to handle this :-)
>>>>
>>>>
>>>> Rob
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, 24 Aug 2016 at 18:27 Jeremy Tandy
>>>> <jeremy.tandy@gmail.com
>>>> <mailto:jeremy.tandy@gmail.com>> wrote:
>>>>
>>>> Yes, I think so ... And we should do so if we
>>>> are recommending "indirect identification".
>>>>
>>>> Jeremy
>>>>
>>>> On Wed, 24 Aug 2016 at 09:24, Phil Archer
>>>> <phila@w3.org <mailto:phila@w3.org>> wrote:
>>>>
>>>> Bill's comments also made me think about
>>>> some of the classic arguments,
>>>> such as that a lake doesn't have a last
>>>> updated date and isn't 435KB
>>>> big. Which are true, however, that kind of
>>>> metadata generally comes from
>>>> the server, i.e. the HTTP layer. That's an
>>>> over simplification but the
>>>> point is that it is relatively easy to
>>>> avoid deliberately creating
>>>> misleading metadata - metadata about the
>>>> doc rather than the thing it
>>>> describes - and it's also generally easy to
>>>> avoid looking for that metadata.
>>>>
>>>> Is there scope for some BP advice there?
>>>>
>>>> Phil.
>>>>
>>>> On 24/08/2016 08:25, Jeremy Tandy wrote:
>>>> > Thanks Linda. More clear examples where
>>>> being "correct" (in terms of
>>>> > avoiding uri collisions by using two
>>>> distinct uris) is making things worse
>>>> > because users take the wrong one!
>>>> >
>>>> > So, as a WG, are we content to recommend
>>>> this "indirect identification"
>>>> > pattern where thing & info resource
>>>> identifiers are conflated?
>>>> >
>>>> > Bill has added some good points about how
>>>> to avoid impacts of uri
>>>> > collision- by using the (dataset)
>>>> metadata to talk about licenses and
>>>> > creators for the information ...
>>>> > On Wed, 24 Aug 2016 at 07:52, Linda van
>>>> den Brink <l.vandenbrink@geonovum.nl
>>>> <mailto:l.vandenbrink@geonovum.nl>>
>>>> > wrote:
>>>> >
>>>> >> Experience from the Netherlands: we have
>>>> the id/doc pattern in our URI
>>>> >> strategy, based on the Cool URIs note
>>>> [8] and the ISA study on persistent
>>>> >> identifiers [9].
>>>> >>
>>>> >>
>>>> >>
>>>> >> That being said, same as Bill I also
>>>> notice data users getting confused
>>>> >> and generally using the /doc/ URI as
>>>> that is the one they can copy from
>>>> >> their browser address bar. This is not
>>>> only casual confusion but also ends
>>>> >> up in published information resources.
>>>> >>
>>>> >>
>>>> >>
>>>> >> You see this, for example, all over the
>>>> CB-NL which is a vocabulary for
>>>> >> the building sector and contains links
>>>> to other Dutch standards such as
>>>> >> IMGeo, an information model and
>>>> vocabulary for large scale topography. E.g.
>>>> >> the CB-NL concept of ‘Gebouw’ (Building)
>>>> [10] links to two IMGeo concepts
>>>> >> ‘Pand’ (building part) and ‘Overig
>>>> Bouwwerk’ (other construction) using
>>>> >> their /doc/ URIs. If you click on Pand
>>>> (which doesn’t have its own landing
>>>> >> page in CB-NL so I can’t include the
>>>> link) you will see it includes the
>>>> >> /doc/ URI as the identifier of Pand.
>>>> >>
>>>> >>
>>>> >>
>>>> >> This is an example where it occurs in
>>>> vocabularies, but I also see it
>>>> >> happen with identifiers for data instances.
>>>> >>
>>>> >>
>>>> >>
>>>> >> [8]: https://www.w3.org/TR/cooluris/
>>>> >>
>>>> >> [9]:
>>>> >>
>>>> https://joinup.ec.europa.eu/sites/default/files/D7.1.3%20-%20Study%20on%20persistent%20URIs_0.pdf
>>>> >> 10: http://ont.cbnl.org/cb/def/Gebouw
>>>> >>
>>>> >>
>>>> >>
>>>> >> Linda
>>>> >>
>>>> >>
>>>> >>
>>>> >> *Van:* Jeremy Tandy
>>>> [mailto:jeremy.tandy@gmail.com
>>>> <mailto:jeremy.tandy@gmail.com>]
>>>> >> *Verzonden:* dinsdag 23 augustus 2016 20:57
>>>> >> *Aan:* Bill Roberts
>>>> >> *CC:* SDW WG Public List
>>>> >> *Onderwerp:* Re: Clarification required:
>>>> BP6 "use HTTP URIs for spatial
>>>> >> things"
>>>> >>
>>>> >>
>>>> >>
>>>> >> Thanks Bill. Sounds very coherent ... I
>>>> hoped for some responses such as
>>>> >> this based on practical experience. Jeremy
>>>> >>
>>>> >> On Tue, 23 Aug 2016 at 19:41, Bill
>>>> Roberts <bill@swirrl.com
>>>> <mailto:bill@swirrl.com>> wrote:
>>>> >>
>>>> >> ah Jeremy, you are a brave man to poke
>>>> the sleeping beast of httpRange-14.
>>>> >>
>>>> >>
>>>> >>
>>>> >> But I'll get my thoughts in early, then
>>>> I can tune out of the ensuing mail
>>>> >> avalanche :-)
>>>> >>
>>>> >>
>>>> >>
>>>> >> When publishing Linked Data about places
>>>> we (at Swirrl) generally do the
>>>> >> id/doc fandango, but to be honest I
>>>> think data users either don't notice,
>>>> >> or they get confused by it. In the
>>>> applications we are working with (and I
>>>> >> acknowledge that others may have
>>>> different applications and different
>>>> >> experiences), it wouldn't cause any
>>>> problems to have a single URI, the 'id'
>>>> >> URI if you like. We just don't find a
>>>> need to say anything about the /doc/
>>>> >> URI. If we were starting again, I'd
>>>> probably ditch the /doc/ and the 303
>>>> >> and rely on context and a little bit of
>>>> documentation to make it clear what
>>>> >> we mean.
>>>> >>
>>>> >>
>>>> >>
>>>> >> The place where we find a need to talk
>>>> about creators and licences and
>>>> >> modified dates is in metadata about
>>>> datasets where a dataset might be a
>>>> >> collection of information about a bunch
>>>> of places - and we treat datasets
>>>> >> as an 'information resource'. If someone
>>>> requests a dataset URI we return a
>>>> >> status code of 200 and the dataset
>>>> metadata as the response. That metadata
>>>> >> includes info on where to get all the
>>>> contents of the dataset if you want
>>>> >> that.
>>>> >>
>>>> >>
>>>> >>
>>>> >> By the way, though it's sensible and
>>>> consistent, I find that the implied
>>>> >> and parallel property stuff makes it
>>>> more rather than less complicated.
>>>> >>
>>>> >>
>>>> >>
>>>> >> Bill
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On 23 August 2016 at 17:37, Jeremy Tandy
>>>> <jeremy.tandy@gmail.com
>>>> <mailto:jeremy.tandy@gmail.com>> wrote:
>>>> >>
>>>> >> All-
>>>> >>
>>>> >>
>>>> >>
>>>> >> Linda has done a great job of
>>>> consolidating the best practices are use of
>>>> >> identifiers. We have just one [1] now.
>>>> >>
>>>> >>
>>>> >>
>>>> >> Reading though just now, it occurred to
>>>> me that there's still an open
>>>> >> issue about identifier assignment ...
>>>> >>
>>>> >>
>>>> >>
>>>> >> W3C's Architecture of the World Wide Web
>>>> constraint "URIs identify a
>>>> >> single resource" [2] asserts "Assign
>>>> distinct URIs to distinct resources"
>>>> >> in order to avoid URI collisions [2a]
>>>> which "often imposes a cost in
>>>> >> communication due to the effort required
>>>> to resolve ambiguities".
>>>> >> Discussions from earlier years in UK Gov
>>>> Linked Data working group (and
>>>> >> elsewhere) concluded that the "real
>>>> world thing" and "information resource
>>>> >> that describes the real world thing" are
>>>> separate resources. I think this
>>>> >> is based on a (purist?) view when
>>>> working with RDF of needing to be totally
>>>> >> clear on "what's the subject" of each
>>>> triple ... the thing or the document.
>>>> >> This manifests as URIs with `id` or
>>>> `doc` included somewhere to distinguish
>>>> >> between the resources and some RDF
>>>> triples to clarify that the doc resource
>>>> >> is talking about the thing resource etc..
>>>> >>
>>>> >>
>>>> >>
>>>> >> (dangerously close to "httpRange-14" [3]
>>>> here ... let's avoid that bear
>>>> >> trap)
>>>> >>
>>>> >>
>>>> >>
>>>> >> Jeni Tennison's "URLs in Data Primer"
>>>> draft TAG note captures this
>>>> >> practice in §5.3 "Publishing data" [4]:
>>>> >>
>>>> >>
>>>> >>
>>>> >> ```
>>>> >>
>>>> >> Publishers can help enable more accurate
>>>> merging of data from different
>>>> >> sites if they support URLs for each entity
>>>> >>
>>>> <https://www.w3.org/TR/urls-in-data/#dfn-entity>
>>>> they or other sites may
>>>> >> wish to describe, separate from the
>>>> landing pages
>>>> >>
>>>> <https://www.w3.org/TR/urls-in-data/#dfn-landing-page>
>>>> or records
>>>> >>
>>>> <https://www.w3.org/TR/urls-in-data/#dfn-record>
>>>> that they publish.
>>>> >>
>>>> >> ```
>>>> >>
>>>> >>
>>>> >>
>>>> >> Yet Architecture of the World Wide Web
>>>> §2.2.3 "Indirect identification"
>>>> >> [5] notes that:
>>>> >>
>>>> >>
>>>> >>
>>>> >> ```
>>>> >>
>>>> >> To say that the URI
>>>> "mailto:nadia@example.com
>>>> <mailto:nadia@example.com>" identifies both an
>>>> >> Internet mailbox and Nadia, the person,
>>>> introduces a URI collision.
>>>> >> However, we can use the URI to
>>>> indirectly identify Nadia. Identifiers are
>>>> >> commonly used in this way.
>>>> >>
>>>> >> ```
>>>> >>
>>>> >>
>>>> >>
>>>> >> This is consistent with what I recall
>>>> TimBL saying at TPAC-2015 in regards
>>>> >> to Vcard; come the finish, no one really
>>>> cares to distinguish between the
>>>> >> thing and its associated information
>>>> resource.
>>>> >>
>>>> >>
>>>> >>
>>>> >> ... And in most cases, one can use
>>>> context to determine whether a
>>>> >> statement concerns the thing or the
>>>> information resource. In those cases
>>>> >> where you can't, "URLs in Data Primer"
>>>> suggests some mechanisms to mitigate
>>>> >> such confusion [6][7].
>>>> >>
>>>> >>
>>>> >>
>>>> >> I think that in our SDW WG discussion we
>>>> have concluded that we _are_
>>>> >> content to use "indirect identification"
>>>> - e.g. that we use URIs that
>>>> >> conflate the thing and document resource.
>>>> >>
>>>> >>
>>>> >>
>>>> >> Please can we confirm this? Assuming
>>>> that indirect identification is
>>>> >> "approved" as best practice, then it
>>>> seems prudent to add a note to the BP
>>>> >> document saying "don't worry about
>>>> distinguishing between thing and
>>>> >> resource; indirect identification is
>>>> fine" (etc.)
>>>> >>
>>>> >>
>>>> >>
>>>> >> Thanks, Jeremy
>>>> >>
>>>> >>
>>>> >>
>>>> >> [1]:
>>>> http://w3c.github.io/sdw/bp/#globally-unique-ids
>>>> >>
>>>> >> [2]:
>>>> https://www.w3.org/TR/webarch/#pr-uri-collision
>>>> >>
>>>> >> [2a]:
>>>> https://www.w3.org/TR/webarch/#URI-collision
>>>> >>
>>>> >> [3]:
>>>> https://www.w3.org/2001/tag/group/track/issues/14
>>>> >>
>>>> >> [4]:
>>>> https://www.w3.org/TR/urls-in-data/#publishing-data
>>>> >>
>>>> >> [5]:
>>>> https://www.w3.org/TR/webarch/#indirect-identification
>>>> >>
>>>> >> [6]:
>>>> https://www.w3.org/TR/urls-in-data/#documenting-properties
>>>> >>
>>>> >> [7]:
>>>> https://www.w3.org/TR/urls-in-data/#authoring-specifications
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >
>>>>
>>>> --
>>>>
>>>>
>>>> Phil Archer
>>>> W3C Data Activity Lead
>>>> http://www.w3.org/2013/data/
>>>>
>>>> http://philarcher.org <http://philarcher.org/>
>>>> +44 (0)7887 767755
>>>> <tel:%2B44%20%280%297887%20767755>
>>>> @philarcher1
>>>>
>>>
>
--
Krzysztof Janowicz
Geography Department, University of California, Santa Barbara
4830 Ellison Hall, Santa Barbara, CA 93106-4060
Email: jano@geog.ucsb.edu
Webpage: http://geog.ucsb.edu/~jano/
Semantic Web Journal: http://www.semantic-web-journal.net
Received on Thursday, 1 September 2016 21:43:19 UTC