Re: Memo on persistent reference - TAG please read before F2F discussion from Noah Mendelsohn on 2010-10-15 (www-tag@w3.org from October 2010)

From: Noah Mendelsohn <nrm@arcanedomain.com>
Date: Fri, 15 Oct 2010 12:40:24 -0400
To: Jonathan Rees <jar@creativecommons.org>
CC: www-tag@w3.org
Message-ID: <4CB883F8.5060101@arcanedomain.com>
These are comments on my readthrough of [1].  Overall, I find it to be very 
helpful, well written, and a very useful foundation for our work in this 
area.  So, here I'll concentrate on areas where I have quibbles or 
concerns.  All quoted text is from the draft.

> The scenario under discussion is that of a general user or robot with a reference in hand, using a well-known apparatus or method to "chase" the reference and obtain the target document. The reference was not created with that particular user in mind; the reference might be seen by anyone (or anyone in some substantial community), so no special knowledge can be assumed. Of course some knowledge must be assumed, such as how to read the Latin alphabet or ASCII or use a browser; but not special knowledge peculiar to the user or reference.

I think we can do a bit better on this.  In general, the person using the 
reference must know, or at least successfully infer, the specifications 
(written or informal) that apply to resolution of the identifier.  In the 
case of the Web, the TAG has written with some care on this in its finding 
The Self-Describing Web.  That finding points out that everything one 
needs, not just to dereference a URI, but in the case of HTTP, to properly 
interpret information retrieved from the resulting resource, can be found 
directly or indirectly from RFC 3986.

Also, we should acknowledge that in many cases, a priori knowledge of the 
identification mechanism is required to interpret an identifier.  Someone 
finding the identifier:

 http://example.org/somedoc.xml

on the side of a bus would could guess with at least moderate reliability 
that RFC 3986 applies, but if a someone found the identifier

 somedoc.xml

on a piece of paper on a programmer's desk, then it might be a (relative) 
URI, or it might be a filename, perhaps some other identifier, or perhaps 
not an identifier at all.

> I'll define persistence as survival beyond events that you would expect would imply extinction,

(editorial) this seems a slightly odd way to put it.  If I'm smart enough, 
then I know that the events don't imply extinction after all;  does that 
mean there's no persistence. Also, you don't say survival of what?  Might 
it be better to try something along the lines of:

"I'll define persistence as survival if the reference and the information 
needed for its interpretation, for a very long time, typically tens of 
years or centuries [I would have thought millenia?], and in the face of a 
broad range of potential threats (technical failures of systems; 
organizational failures or death of responsible individuals; natural 
disasters; malicious attempts to hijack "ownership" of the reference; etc."

 > The ideal reference is both fast

Really?  For the range of contexts your talking about?  We'd in all cases 
prefer millisecond access from NVRAM to tablets carved in stone?  I would 
have thought it depended on the use case.

 > Failure modes

Shouldn't there be failure modes for specifications being unclear, lost 
over time, etc?  That seems the common case even with materials written on, 
say, magnetic media from the 1950s and 1960s, where nobody can find or 
properly interpret the specifications used to encode them.  In principle, 
specifications and implementations could be hijaked over time:  e.g. 
someone could rewrite the specs for DNS resolution to insert a government 
agency into the lookup process, or the deployed infrastructure could do 
that, in violation of the (unmodified) specification.  In both cases, 
references are no longer correctly associated with targets over time.

 > Placing bets

I like the list, but I think there might be a fifth characteristic along 
with Ubiquity, etc., and that's "Self-checking".  What I have in mind is 
the possibility of using things like digital signatures to enforce 
end-to-end checking.

To invent an example, let's say I had a normative convention for putting a 
digital signature of the resource into the URI (or other reference) itself. 
  Now, no matter which of the other problems hit us, I know with very high 
confidence when I have or have not successfully found the intended target. 
  That seems a useful axis to explore. Maybe or maybe not that fits under 
your "safety net", but it feels different.




Noah


[1] http://www.w3.org/2001/tag/doc/persistent-reference/

On 10/13/2010 11:00 AM, Jonathan Rees wrote:
> http://www.w3.org/2001/tag/doc/persistent-reference/
>
>
> ISSUE-50
> ACTION-444
>
>
Received on Friday, 15 October 2010 18:28:24 UTC