Re: stuff goes away from Jonathan Rees on 2009-03-16 (www-archive@w3.org from March 2009)

From: Jonathan Rees <jar@creativecommons.org>
Date: Mon, 16 Mar 2009 23:54:00 +0100
To: Larry Masinter <masinter@adobe.com>
Cc: "www-archive@w3.org" <www-archive@w3.org>
Message-Id: <11909823-6F9B-43E8-8342-8FAB4CC7CB04@creativecommons.org>
Everything you say could be true, but I'm not sure what the point is.  
You get what you pay for.

The problem is, someone in the year 2029 encounters a use of some  
identifier in some document or database, and wants to know what it  
refers to. It's a so-called "persistent identifier" to the degree that  
this is likely to succeed. (Identifiers aren't in themselves  
persistent or not; it's the possibility of dereferencing them would  
be. And application of the label "persistent" is always mere wishful  
thinking; there's no test for it.)

Persistence for some number of years can be arranged via an SLA or  
endowment, and replication is a big help, but as time goes on these  
machinations all loose their oomph.

The web is not the only way to figure out what someone meant by a  
term; any index or table or database that contains the correct  
information will do. So the issue has little to do with anything  
outliving the web. It is merely about whether the (meta)data someone  
needs will exist in a place they can get to, when they need it. The  
problem existed pre-web and was solved through a replicated  
infrastructure (library card catalogs and holdings). If a library  
burned down, you could usually find what you wanted at another library.

(If you think the use of http: syntax for identifiers puts them at a  
disadvantage relative to urn:, I'm not sure why this should be the  
case - the syntax shouldn't matter. (At least not for RDF, which as I  
said should declare independence from the HTTP protocol, while  
maintaining a sort of opportunistic and nonbinding allegiance.)  In  
any case the choice of URI scheme is a minor problem relative to that  
of future accessibility.)

I don't know how to assess your claim; it may be true or not. But it  
seems obvious that someone who wants assertions (whether their own or  
someone else's, it doesn't matter) to be understood at time t knows  
that the terms used in the assertions have to be understandable at  
time t. If they know what they're doing they'll take pains to make  
sure that for each term used either (a) the term belongs to a  
vocabulary that seems quite likely to be alive at time t, or else (b)  
information designed to promote understandability is included in the  
context of the assertion (i.e. in the same file) so that it will be  
carried along with the assertion as it goes through life (akin to  
propagating the full citation along with a DOI, even though in  
principle the DOI by itself is sufficient). Such information could be  
a "definition" or defining properties, location hints (locations of  
copies), and/or other stuff.

I try to stay away from the "semantic web" movement because it seems  
to not care about this problem - the implicit assumption is that all  
assertions are ephemeral. Coming up with credible URIs was the first  
problem I hit when I started doing RDF, and after three years I'm only  
now making a little headway on it.

Coincidentally, today I had a couple of conversations about the need  
for open replicable metadata, as a way to make identifier systems more  
credible, trusted, and likely to persist. (I'm at the International  
Repositories Workshop in Amsterdam.)

By "credible commitments" I meant things like the cool-URIs site  
policy for  w3.org. Because of this, and a bet that w3.org will  
outlive neurocommons.org, I prefer URIs beginning http://w3.org/ to my  
those beginning http://neurocommons.org/ (other things being equal).  
And I figure that by the time ICANN goes sour or w3.org folds, there  
will be alternative resolution methods, of the sort that is encouraged  
by URNs (and maybe handles?) and ought to be encouraged for http: as  
well.

Jonathan

On Mar 16, 2009, at 2:45 AM, Larry Masinter wrote:

> I'm still stuck on the lifetimes of URIs vs. lifetimes
> of statements, in engineering the semantic web:
>
> "... you might be able to
> make some plausible predictions or credible commitments.."
>
> Stuff goes away. Mean time between site failure might be less
> than 10 years. Companies change their names, merge, split,
> go out of business, stop doing the business that caused them
> to bring up the web site. Students graduate. Non-profit
> organizations change brands. Web technology itself is
> only 20 years old, 20 years from now. Sure, maybe some will
> still be around, but on the average, no one has the
> foundation or insurance policy to guarantee that a
> URI will still be around to respond "200-" to anything
> for the expected lifetime of the assertion being made.
>
> Many industries and applications have a requirement that
> the statements made and inferences about them need to last
> much longer than 20 years: government documents, descriptions
> of building plans, life insurance policies.
>
> Anyone who wants to make a "semantic web" statement which
> need to have meaning beyond the guaranteed lifetime of the
> web sites used to form their "ontology" cannot link the
> meaning of those statements to the future 200-response
> expectation of the referenced web site. The expected
> lifetime of any particular piece of web content is much
> less than the needed lifetime of the validity of semantics
> and understanding of semantic intent.
>
> I think it is more natural to assume that there are
> *no* stable URIs in the long run: every URI has a
> lifetime, we wish every one to have as long a life
> as possible, but every single URI will, at some point
> in the future, evaporate. Consider:
>
> at any instant, there are:
> * People who want to make semantic web assertions P
> * assertions that those people want to make
>   A(p) for p in P
> * for each assertion, their desired lifetime
>  (how long each person wants to make sure the
>  assertion is interpretable)
>    D(a) for a in A(p) for p in P
> * terms needed in those assertions
>    T(a) for a in A(p) for p in P
> * URIs under the control of those people
>  which are appropriate
>    U(t) for t in T(a) for a in A(p) for p in P
> * expected lifetime of those URIs
>    E(u) for u in U(t) for t in T(a) for
>   a in A(p) for p in P.
>
>
> CLAIM:
>
> Most people don't have the ability to make
> assertions for which the URIs they use have
> an expected lifetime longer than the desired
> lifetime of all of the assertions they want
> to make.
>
> for large percentage of p in P
> there are some assertions a in A(p)
> such that for some needed term
> t in T(a), such that the desired
> lifetime of the asertion D(a) exceeds
> the maximum expected lifetime of
> all resources available to p.
>
>
> Larry
> --
> http://larry.masinter.net
Received on Monday, 16 March 2009 22:54:43 UTC