RE: [Uri-review] the "ni:" URI scheme soon to "last call" in IETF -- security concern from Larry Masinter on 2012-05-08 (www-archive@w3.org from May 2012)

From: Larry Masinter <masinter@adobe.com>
Date: Tue, 8 May 2012 15:21:25 -0700
To: Jonathan A Rees <rees@mumble.net>
CC: "www-archive@w3.org" <www-archive@w3.org>
Message-ID: <C68CB012D9182D408CED7B884F441D4D194AE4714A@nambxv01a.corp.adobe.com>
I don't disapprove of the term "persistent", any more than I disapprove of the terms "good" or "secure" or "true" or "fact". But when we're designing systems, "persist" is a  not a binary attribute (things are either persistent or they are not, systems are either secure or they are not, statements are either true or they are false, etc.).

Rather, the architecture of persistence requires attention to the failure modes and careful attention to the means of mitigating all of the ways in which things can fail to persist. It's impossible to engage in the design discussion if you rule failure cases as "out of scope". Sure, there are systems of citations for which failures are rare enough that you might be tempted to do so, but it is the resilience of the system in the face of both accidental and intentional errors, disputes, rewriting of history, takedown notices and the like that determine the qualities of persistence.

LOCKSS (http://en.wikipedia.org/wiki/LOCKSS)
and http://larry.masinter.net/0603-archiving.pdf (and US Patent 7577689)

 perhaps will give some additional insight.

Larry
--
http://larry.masinter.net


-----Original Message-----
From: jonathan.rees@gmail.com [mailto:jonathan.rees@gmail.com] On Behalf Of Jonathan A Rees
Sent: Tuesday, May 08, 2012 11:31 AM
To: Larry Masinter
Cc: www-archive@w3.org
Subject: Re: [Uri-review] the "ni:" URI scheme soon to "last call" in IETF -- security concern

It sounds like you are making a plea for people to stop using terms
like "persistent name" since, by the way you define this, they have to
meet a bar that has never been met, and will never be met in the
future, and which nobody expects to be met.

This just means that if you get your way people will switch to other
terminology to express the meaning they want to express.

In academic circles the historical precedent is the scholarly
reference, which certainly doesn't live up to your expectations, yet
has been extremely successful and useful. I think people are just
looking for a way to make such references processable by machine. Is
there a term you would approve for that endeavor?

Central authorities for scholarly references certainly do not exist,
and must not exist. Nobody, not even the author or publisher or any
library or archive, can be given the ability to rewrite history. The
system would not work if this were the case.

So what people depend on, and the only empirical examples of
"persistence" that we have, work very differently from what you
describe. How would you describe machine actionable scholarly
citations?

On Tue, May 8, 2012 at 12:38 PM, Larry Masinter <masinter@adobe.com> wrote:
> I said "credible SLA" and I agree that wasn't good terminology. Let me rewrite it:
>
> "Persistence" and "Permanently" are acts of intention -- a name is "persistent" or "permanent" if there is a credible expectation of future resolution services, where the services are reasonably expected to maintain authority (forever) for telling everyone what a name means or identifies, consistent with the current intended meaning.
>
> And my point is that there often _aren't_ explicit guarantees, because the notion of "persistent names" is dubious.
>
> http://masinter.blogspot.com/2010/03/ozymandias-uri.html
>
> Taking your examples:
>
> - The RFC series
>
> There are enough copies of current RFCs around that I understand how you could imagine the 'meaning' of RFC1024 was independent of any authority. However, (a) IETF might change its official publication format, and possibly even republish old RFCs in new formats. It is under the authority of the IETF to determine whether a new format is the "same". (b) IETF also maintains a "BCP" and "STD" series numbers to name documents which have achieved a particular status, and the IETF has the authority to change the designation of BCP70 to a new document. (c) Efforts are underway to reformulate internet governance under UN ITU-T rather than ISOC, ICANN, IETF.  Changes in organization might ultimately lead to changes in policy. It isn't impossible to imagine a new system Internet governance changing the rules, e.g., decide it was legitimate to "republish" RFCs and correct "errors" in them.
>
> If the rules change to allow RFCs to be updated, then there is then a question of whether a statement asserting something about "RFC1024" was about the old edition vs. the new one.
>
>  - The IANA registries
> Similarly for IANA registries, IANA maintains the process by which individuals are allowed (or not allowed) to update an IANA registry entry, depending on review as defined. So the organization "promises" to maintain the registry entry for  "text/html" according to some process rules, and to be the central authority.
>
>  - The DOI system:
>  Similarly.  Does the publisher of a DOI-identified document have the authority to correct errors? Is it subject to take-down orders? (Can you give a DOI for the "anarchist's cookbook" and still expect some resolution?)

Once the article is published (i.e. copies are on deposit and out of
the publisher's control) then it is impossible to correct errors. Of
course people can issue retractions and errata, and a conscientious
resolution service will make the client aware that there has been
additional activity around the DOI, but no, correction is impossible.
You would have to find all copies and update them all, and that is
both technically infeasible and ethically forbidden.

>  - The chemical element symbols:
>
> This does seem to be pretty stable, although perhaps there are disagreements over naming newly discovered elements?

I think there is initial debate but that once a decision is made it's
final. If there are exceptions they do not call the whole system into
question.

>  - The names and orbital parameters of asteroids
>
> (don't know anything about this one)
>
>  - The "binomial" system of biological nomenclature
> Aren't there controversies, updates, mergers? Species are mis-named, decisions are made that two species-names really named the same species, etc.

What I meant was the binding of a species name to its original
publication. Many publications get attached to a species name as the
theories of the boundary of the species change, but that's not what
I'm talking about. The original publication "binding" a name to its
original circumscription has priority and is always the point of
reference in future publications. This system has been in use for 250
years and has worked exceedingly well over a million names. The
community has only screwed up a handful of times and decided to
violate its publication priority rule; usually because somebody wasn't
aware of the prior publication, and the later name became so well
established that an exception was suggested by the custodial
organization. The reliability of the system is remarkable given its
(until recently) total lack of automation.

Whether this scales or would resist a concerted attack is not clear,
but if this isn't a persistent naming system I don't know what is.

> There is an expectation that there will continue to be convergence and friendly scientific discussion about this, but of course, once you get into debates over evolution, there remains the possibility that large communities might resist or disagree with others.
>
>
> That there are librarians who are eager to help people resolve names is great! However, the Orwellian forces of Newspeak remain in all of these, where some librarians in some jurisdictions forced to resolve names in ways that are pleasing to their governments (as SOPA and take-down notices threaten to do). If uncool URIs MUST change (as we've seen recently in some situations), it threatens persistent naming.

No system of any kind is immune to takedown notices or the actions of
governments (or armies, thieves, etc.). If you make that a condition
then of course no system will live up to it. Keeping multiple copies
in multiple jurisdictions is pretty good insurance and has worked so
far. We do not see anyone arguing about the bindings of scholarly
references to their publications, because little would be gained by an
attack and replication keeps everyone honest.

> Perhaps a better way to put this is: persistence depends on the resilience and persistence of the resolution infrastructure, and what it is that is expected to "persist" is part of the definition of the "meaning".

I don't think this matches the historical precedent. In every case the
resolution infrastructure has emerged after the fact; each library has
built its own card catalog and organized its own stacks. You could
wipe the resolution infrastructure for DOIs, together with Crossref,
IDF, and all the publishers, off the face of the earth and replacement
resolution infrastructure would be rebuilt from the holdings. What
comes first is publication, i.e. replication, dissemination, and
holdings. Indexing and resolution follow. Nothing about the Internet
or the digital age changes this.

People writing references and resolving them come up with ways to
address attacks of the sort you describe, such as bad-citizen
publishers who publish a new "volume 1" (or RFC 2616) years after they
have previously published something with the same name. Usually this
is done using publication dates (which are hard to lie about) and
redundancy such as author + title (which makes accidents much more
unlikely); there are also warning tags such as "new series" that get
attached by indexes. My point is that there is no such thing as
"ownership" or "authority" after publication; and yet the system that
has this property is as close as you're ever going to get to
"persistence" in reality. It seems very strange to me to require of a
persistent naming system something that our only precedents lack.

I can think of plenty of ways to make current persistent reference
systems more machine actionable, inclusive, reliable, etc. I'm sure
you can too. Grousing about people calling things persistent when
they're not, whether true or not, doesn't help much IMO.

Jonathan

> Larry
> --
> http://larry.masinter.net
>
>
>
> -----Original Message-----
> From: jonathan.rees@gmail.com [mailto:jonathan.rees@gmail.com] On Behalf Of Jonathan A Rees
> Sent: Tuesday, May 08, 2012 5:14 AM
> To: Larry Masinter
> Cc: www-archive@w3.org
> Subject: Re: [Uri-review] the "ni:" URI scheme soon to "last call" in IETF -- security concern
>
> [Removing original cc: list Stephen Farrell
> <stephen.farrell@cs.tcd.ie>, David Booth <david@dbooth.org>,
> uri-review <uri-review@ietf.org>, Barry Leiba
> <barryleiba@computer.org>, "draft-farrell-decade-ni@tools.ietf.org"
> <draft-farrell-decade-ni@tools.ietf.org>
> and moving to www-archive to avoid spamming... you can reinstate any
> of these cc's if you feel any of these parties might care.]
>
> In reply to: http://www.ietf.org/mail-archive/web/uri-review/current/msg01584.html
>
> On Mon, May 7, 2012 at 8:51 PM, Larry Masinter <masinter@adobe.com> wrote:
>> Since I'm quoted, I thought I better clarify:
>>
>> "Persistence" and "Permanently" are acts of intention -- a name is "persistent" or "permanent" if there is a credible SLA by some current and future resolution service, where the service promises to be the authority (forever) for telling everyone what a name means or identifies.
>
> Although I've read what you've written on this subject I'm still not
> clear on your theory of "persistence" so I'd like to try to draw you
> out.
>
> Can you give me an example of a persistent name for which such a
> "credible SLA" exists?
>
> SLAs as I understand them are contracts and as such only last a few
> years; they are certainly never "permanent".
>
> When I think of persistent naming, I think of the following examples:
>  - The RFC series
>  - The IANA registries
>  - The DOI system
>  - The chemical element symbols
>  - The names and orbital parameters of asteroids
>  - The "binomial" system of biological nomenclature
> What examples do you have in mind?
>
> You are right that all of these systems involve "acts of intention"
> consisting usually of social acts of publication and dissemination.
> The typical ni: URI will probably not involve any such act (although
> in principle it could). If this is the main thing you're saying then
> we're totally in agreement.
>
> But none of these involve SLAs or even promises on the part of
> institutions providing the persistence. None even relies on the
> existence of a single resolving agency in perpetuity. Most don't
> involve any specific effort on anyone's part to provide persistence
> specifically for that system; persistence just happens because the
> society in general, and its "memory institutions" in particular, wants
> the things that have managed to find their way into these collections
> to persist.
>
> Although no such source guarantees a "resolution service", as a matter
> of fact there have usually been librarians acting in that role, eager
> to help you find these things. In recent years sometimes they have set
> up web servers to help with the job of resolution.
>
> For example, the RFC series does not depend on IETF. There are copies
> of the RFCs stored in Internet archives, so if IETF disappeared one
> day the documents and their resolvability would persist.
>
> Similarly, the persistence of binomial names depend only on getting
> their defining publications into a few research libraries. If any one
> library burned down, the names would persist by virtue of having their
> "meanings" recorded in other libraries.
>
> Similarly for DOIs; the catalog (metadata) has backup copies in memory
> institutions, as do almost all of the identified documents. Like ICZN,
> the IDF is only a facilitator for a system that belongs to the world,
> not an "owner".
>
> But in none of these cases has anyone set up an "SLA" or even made a
> credible promise.
>
>> The "ni:" scheme does not provide a persistent name for anything other than chunks of data.
>
> This seems to contradict what you just said. I would think some ni:
> URIs could be considered "persistent names" even if most aren't. it
> would depend either on the particular ni: and on empirical truth
> (supposing we had 200-year-old ni: URIs) or a bettable story (such as
> storage and cataloguing in some number of "memory institutions"). When
> you contrast "chunks of data" with, say, RFCs, what distinction are
> you drawing - are you referring to the problem of digital media
> obsolescence, which might be frustrated if someone cataloguing ni:
> URIs was not on top of the problem of format upgrades? Or just the
> fact that the RFCs reside in memory institutions and we have no reason
> to expect that any ni: will (although one could)?
>
> Certainly there have been binomial names and DOIs that have not
> persisted. That this is the case does not call the others into doubt.
>
> Whether there is a kind of persistence other than what the memory
> institutions offer (collectively, via replication) is, I think,
> unclear. National archives burn down, etc.
>
> Do you have a different idea of how "persistence" works?
>
> Best
> Jonathan
Received on Tuesday, 8 May 2012 22:22:52 UTC