RE: Question about identifiers from Simon.Cox@csiro.au on 2016-08-21 (public-sdw-wg@w3.org from August 2016)

From: <Simon.Cox@csiro.au>
Date: Sun, 21 Aug 2016 23:40:15 +0000
To: <rob@metalinkage.com.au>, <jlieberman@tumblingwalls.com>, <eparsons@google.com>
CC: <frans.knibbe@geodan.nl>, <public-sdw-wg@w3.org>
Message-ID: <96864e8960764573880d57ee3274ced0@exch1-mel.nexus.csiro.au>
And joining these thoughts together,

-          if URIs are assigned by a registration process, and

-          if the registrar uses a hierarchical path to manage governance (including maintaining uniqueness)

-          then the URI will reflect the governance arrangement at the time of registration.
This might mean that the original identifier for a thing does not reflect some future governance arrangement.
At which time there are two options:

(i)                 keep the original identifier

(ii)               make a new registration and mark the original identifier ‘superseded’ by the new one.

Simon


From: Rob Atkinson [mailto:rob@metalinkage.com.au]
Sent: Saturday, 20 August 2016 2:32 PM
To: Joshua Lieberman <jlieberman@tumblingwalls.com>; Ed Parsons <eparsons@google.com>
Cc: Frans Knibbe <frans.knibbe@geodan.nl>; SDW WG (public-sdw-wg@w3.org) <public-sdw-wg@w3.org>
Subject: Re: Question about identifiers

IMHO there is a basic principle that neatly resolves this - identifiers are generated by a registration process (i.e. if you accept something is an identifier you are essentially assuming its minting process is a registration process, i.e. you are subscribing to that governance). Registry practices are quite well established - and we should point people to these.  These include things like not reusing identifers, version handling etc.

If a dataset does not conform to the principles of registration then it is not suitable as a source of concept identifiers - e.g. a spatial dataset whose object ids change every version may be used as a resource, and features may have URLs, but such URLS must not be used as URIs - it is necessary to put a redirect from a more stable identifier set to the resource du jour.

Within a registration paradigm, the URI pattern is a simple registry delegation model - an item lives within a register (its base URI left of the /).  These may be nested, in the same way subregisters may be items in a register. Register URIs should be dereferenceable to get metadata about the governance process and the type of object in the register.

Thus, hierarchies made this way are stable. If governance of the set of items change, then new identifiers must be minted and a reference to old identifers should be included.

The UK examples conform to this pattern, although they seem to have converged on it rather that started with a registration perspective. WMO practice at codes.wmo.int<http://codes.wmo.int> formalises this more explicitly

Rob


On Sat, 20 Aug 2016 at 00:09 Joshua Lieberman <jlieberman@tumblingwalls.com<mailto:jlieberman@tumblingwalls.com>> wrote:
I’m sorry — or not — to have kicked off this identifier structure debate, but it’s an important one to have. It’s easy in a way just to say that URL identifiers should carry no meaning for maximum flexibility, but in almost all practice they are used in meaningful ways. I am also part of the specifying minority (but the URL minting and parsing majority) that feel it is done anyway and carries undeniable advantages, so let’s figure out ways and means for it.

There are several reasons why it is useful to have agreed URL structures. We should note first of all that the host domain name is an important part of the meaning context and authenticity for a URL. The value of using HTTPS is not just encryption but also having the identity of the URL resolver confirmed by a PKI certificate. Both the domain name hierarchy and the URL path hierarchy can also support meaningful uniqueness of identifiers. They address a problem with UUID’s that is easy to make too many unique identifiers, rather than not enough. The hierarchical structures help a lot with figuring out what identifiers may actually have been minted to refer to the same things. They also help with determining the authority for making and maintaining identifiers, on both inter-organizational and intra-organizational levels.

On a level of taxonomic meaning, the stability and/or uniqueness of a classification may indeed be questionable, as Frans and others have pointed out. Certainly many taxonomic identifier systems have gone from hierarchy to sequential primary identifiers as classifications have evolved with on-going research. I’m still in the smaller minority that dereferencing every URL to see what its classification might be is more work than necessary. Many classification schemes are quite stable or only slowly changing. I feel it is also acceptable as needed to have redirections from URL’s representing a current or historical or even alternative classification to the same normative or informative material that an authoritative URL links to.

On the other hand, we worry about the “semantics” of URL’s versus other means, but the formal semantics of an entity are expressed as logical relationships to other entities (at least in predicate logic). If a substantial portion of those relationships form a hierarchical structure of like entities, than a hierarchical URL can be a real form of identity, not just a convenience.

So my recommendation is to support a practice of identifiers being structured minimally at least for purposes of authority and uniqueness. I also recommend considering taxonomic meaning for primary or secondary identifiers where the taxonomy is relatively stable and/or an integral part of the definition of the identified entity. I’ll have time next week to contribute something to this effect to the BP.

—Josh


On Aug 19, 2016, at 8:44 AM, Ed Parsons <eparsons@google.com<mailto:eparsons@google.com>> wrote:

Currently a human readable pattern would not help in terms of crawling... however I still maintain my (minority) view that as a method of expressing current and past geographic hierarchies such uri schemes could be useful.

ed


On Fri, 19 Aug 2016 at 13:05 Frans Knibbe <frans.knibbe@geodan.nl<mailto:frans.knibbe@geodan.nl>> wrote:
On 19 August 2016 at 12:10, Ed Parsons <eparsons@google.com<mailto:eparsons@google.com>> wrote:
So perhaps best practice is to update the resource at the old URI to point to the new one ?

That is a possibility, but it would be messy. For individual resources redirection would have to be set up. That means high maintenance costs and a high risk of mistakes. And still there would be the risk of misinterpretation. A human consumer could interpret the first URI encountered without following it to an alternative URI, still leading to false data.

But what would be the point anyway? If a  path in the URI like /{municipality}/{quarter}/{neighbourhood} is for human consumption only it is not that valuable, I think, assuming that most people don't read URIs.

The only reason I can think of to want to have a hierchical path in a URI is if web crawlers are known to parse the URI strings themselves (next to the URI payload). That could in theory lead to improved discoverabilty of resources. I wonder if that actually happens... Perhaps Ed knows how the Google crawlers behave in that respect? Or would that be sharing trade secrets?

Regards,
Frans



Ed


On Fri, 19 Aug 2016 at 11:03 Frans Knibbe <frans.knibbe@geodan.nl<mailto:frans.knibbe@geodan.nl>> wrote:
On 19 August 2016 at 11:11, Linda van den Brink <l.vandenbrink@geonovum.nl<mailto:l.vandenbrink@geonovum.nl>> wrote:
Yes…  it is generally easier to make meaningless IDs persistent. But it is nice to have URIs that are human readable. In the Dutch URI strategy we do advise having human-readable parts in the URI scheme, but say that officially these mean nothing i.e. we say it is extremely ill-advised to ascribe any meaning to {concept} *for the machine*. URIs are opaque in a technical sense. Meanwhile, however, they do give hints to human readers.

Then how can you tell humans that they can interpret the URI and tell machines that they should not? Is there a mechanism for doing that?

Greetings,
Frans


Van: Ed Parsons [mailto:eparsons@google.com<mailto:eparsons@google.com>]
Verzonden: vrijdag 19 augustus 2016 11:02
Aan: Frans Knibbe; SDW WG (public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>)
CC: Linda van den Brink; Joshua Lieberman (jlieberman@tumblingwalls.com<mailto:jlieberman@tumblingwalls.com>); Byron Cochrane
Onderwerp: Re: Question about identifiers

While I accept that the current view of URI schemes having no explicit meaning, I do see great value in the /{municipality}/{quarter}/{neighbourhood} as a simple way of expressing geographical hierarchy independent of geometry... What's the worst that could happen ?

Ed


On Fri, 19 Aug 2016 at 09:30 Frans Knibbe <frans.knibbe@geodan.nl<mailto:frans.knibbe@geodan.nl>> wrote:
Hi,

A prime requirement of good URI minting is to not put any meaning in the URI, at least no meaning that is somehow intended for consumers. Everything that needs to be said about a resource, like its membership of data collections or its versioning, can be said in the data that is returned when the URI is dereferenced.

URI schemes like /{municipality}/{quarter}/{neighbourhood} could be dangerous, because consumers could inadvertently try to derive meaning from such an URI. The usefulness of such a scheme in URI minting is also doubtful, because administrative structure can change in time. That could complicate the URI minting procedures over time.

I do wonder to what extent common web crawlers try to parse URIs and attach meaning to URI parts.

Regards,
Frans



On 18 August 2016 at 22:55, Byron Cochrane <bcochrane@linz.govt.nz<mailto:bcochrane@linz.govt.nz>> wrote:
Hi,

I like the guidance under the URI-Strategy under Hierarchical URIs generally, but have some reservations to this intelligent identifiers approach.
For metadata access I think it is a good thing.  Most metadata for an individual features will usually reside at the dataset or collection (better term) level.  This hierarchical approach makes this metadata easy to access.

But this built in intelligence makes the permanence of the URIs more difficult.  For example, administrative boundaries change through mergers and annexations.  A spatial thing that was in one collection is now in another.  The URIs for these things then confuse more than help.  URI redirects are one way to deal with this, but perhaps tracking these relationships through applied ontologies such as skos:broader and skos:narrower is the better practice?

No answers from me here, just questions.

Cheers,
Byron

________________________________________
From: Linda van den Brink [l.vandenbrink@geonovum.nl<mailto:l.vandenbrink@geonovum.nl>]
Sent: Thursday, August 18, 2016 8:28 PM
To: Joshua Lieberman (jlieberman@tumblingwalls.com<mailto:jlieberman@tumblingwalls.com>)
Cc: SDW WG (public-sdw-wg@w3.org<mailto:public-sdw-wg@w3.org>)
Subject: Question about identifiers

Hi Josh,

Coming back to the telecon yesterday:


<joshlieberman> Should identifiers be part of a system for the features of interest?

joshlieberman: making identifiers part of a system, where the features are part of the system?
... for example corresponding to paths in a taxonomy

Linda: no answer right now, will have to think about it

Were you talking about recommending some system for creating HTTP URI identifiers, i.e. some sort of URI strategy or pattern? Specifically where the features can be organised into some system like a hierarchy, as with administrative regions? There are some examples from Geonovums testbed here
https://github.com/geo4web-testbed/topic3/wiki/URI-Strategy under Hierarchical URIs.

Just trying to understand what you mean… we could add some guidance to the BP about this. I think that would be helpful.

Linda

______________________________________
Geonovum
Linda van den Brink
Adviseur Geo-standaarden

a: Barchman Wuytierslaan 10, 3818 LH Amersfoort
p: Postbus 508, 3800 AM Amersfoort
t:  + 31 (0)33 46041 00<tel:%2B%2031%20%280%2933%2046041%2000>
m: + 31 (0)6 1355 57 92<tel:%2B%2031%20%280%296%201355%2057%2092>
e:  l.vandenbrink@geonovum.nl<mailto:l.vandenbrink@geonovum.nl><mailto:r.beltman@geonovum.nl<mailto:r.beltman@geonovum.nl>>
i:  www.geonovum.nl<http://www.geonovum.nl/><http://www.geonovum.nl/>
tw: @brinkwoman

This message contains information, which may be in confidence and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info@linz.govt.nz<mailto:info@linz.govt.nz>) and destroy the original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank You.

--

Ed Parsons FRGS
Geospatial Technologist, Google

Google Voice +44 (0)20 7881 4501<tel:%2B44%20%280%2920%207881%204501>
www.edparsons.com<http://www.edparsons.com/> @edparsons
--

Ed Parsons FRGS
Geospatial Technologist, Google

Google Voice +44 (0)20 7881 4501<tel:%2B44%20%280%2920%207881%204501>
www.edparsons.com<http://www.edparsons.com/> @edparsons
--

Ed Parsons FRGS
Geospatial Technologist, Google

Google Voice +44 (0)20 7881 4501<tel:%2B44%20%280%2920%207881%204501>
www.edparsons.com<http://www.edparsons.com/> @edparsons
Received on Sunday, 21 August 2016 23:41:12 UTC