RE: "tdb" and "duri" URI schemes... from Larry Masinter on 2011-02-02 (www-tag@w3.org from February 2011)

From: Larry Masinter <masinter@adobe.com>
Date: Wed, 2 Feb 2011 14:25:08 -0800
To: Jonathan Rees <jar@creativecommons.org>
CC: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <C68CB012D9182D408CED7B884F441D4D058EEE5A9C@nambxv01a.corp.adobe.com>
> Somewhere you need to include a warning that two clients can observe
> completely different content for the same resource, at exactly the
> same time.

6.5

> It is not the URI that is "of a particular date".  Rather it is
> (according to your URI/resource theory articulated in RFC 3986) the
> binding of the URI to a resource and the condition (state, whatever)
> of the resource to which the URI is bound.  I think you should say
> "indicating a resource as of a particular date" since that can be read
> as covering both bases.

I used "identify" since I don't think "indicate" means anything
different.


>                 This allows explicit reference to the "time of
>     retrieval", similar to the way in which bibliographic references
>     containing URIs are used.

> I would say "are written", not "are used".

"are often written"

> While many people will know what you're talking about, some of your
> audience won't.  You need to either remove the "similar to" or expand
> on it.

The text now includes a reference to the Chicago Manual of Style.

>      The second scheme, 'tdb' ( standing for "Thing Described By"),
>     provides a way of using a way of minting URIs for anything that can
>     be described,

> Please don't propagate this "anything that can be described" meme.
> It's a silly and meaningless distinction.  Just say "anything" or "any
> resource".

"for something by the means of identifying a description..."


>                  with the ability to fix the description to a given date
>     or time.

> I would not put the time-binding feature directly into tdb:, but
> rather use an orthogonal design that uses the time-binding ability of
> the subject URI.  That is, write tdb:duri:...  and put the time in the
> DURI.  Then, if the resource is already dated (via URI scheme, or some
> other "persistent" method) there's no need to repeat the time in the
> "tdb:.  This orthogonality also simplifies the specification.


You are not alone in asking that I remove the time from "tdb" and
suggest people use tdb:duri:<date>:<uri>  if they want a date.

I instead wanted to make the date in tdb optional, for the rare
cases where a date is not required.

I will note that if "tdb" involves two dates (date of description resource,
date of interpreting description), that tdb:duri:<date>: only fixes the former
and not the latter. Instead, you'd need   duri:<date>:tdb:.

But I think it's a false economy to leave the date out, since most
of the use cases for tdb that I can think of need a timestamp, because
the lifetime of the resource described has nothing in common with the
lifetime of the description.

The goal of "tdb" is to provide a permanent URI for 'anything', by
means of using a date along with a URI for a (likely temporary)
description.

Anyway, I'm resisting. I admit that leaving the date out of tdb
is feasible, because one could combine 'tdb' and 'duri', but I think
of the two choices, I like the one I have since it matches better
what I think the common use cases would be.


>              The 'tdb' URI scheme may reduce the need to define define
>     new URN namespaces merely for the purpose of creating stable
>     identifiers for concepts or abstractions: it provides a ready means
>     for identifying "non-information resources" by semantic indirection
>     -- a way of creating a URI for anything.

> Have you checked this hypothesis out with anyone who has created or is
> thinking of creating a URN namespace?

I was thinking of RFC 1737 ("Requirements for Uniform Resource Names")
Section 2 http://tools.ietf.org/html/rfc1737.txt#section-2. I've also
been involved in creating several URN name spaces.

> Remember that URN namespaces
> give something that duri: and tdb: don't, which is a registry (stored
> in the RFC corpus) of namespace definitions maintained by IETF.  As
> this is the primary value proposition of URNs, I think it unlikely
> that a URN customer would forego it.

It is one of many things the URN registry offers.

>     Many people have wondered how to create globally unique and
>     persistent identifiers.  There are a number of URI schemes and URN
>     namespaces already registered.  However, an absolute guarantee of
>     both uniqueness and persistence is very difficult.

> Nay, impossible.

data: comes pretty close; I won't belabor the point.


>     In some cases, the guarantee of persistence comes through a promise
>     of good management practice, such as is encouraged in "Cool URLs
>     don't change" [COOL].  However, relying on promise of good management
>     practice is not the same as having a design that guarantees
>     reliability independent of actual administrative practice.

> Adhering to the design would itself be an "administrative practice".
> This is a difference of degree, not of kind.  Your point is a good one
> but it needs to be said in a way that does not oversell it.

I just took out the last sentence.

     A primary design goal for URIs is that they are intended to mean the
     same thing, no matter in what context they appear: a "Uniform" way to
     Identify a Resource.  However, even when URIs have Uniform meaning
     from the point of view of the source of the reference, they don't
     guarantee stability over time.  Despite best efforts and intentions,
     identifying information can change in unpredictable ways: domain
     names can disappear or be reassigned, name assigning organizations
     can change structure, responsibility, disappear, merge, or change in
     unpredictable ways.

> Again, the point is good, but it could be expressed better.  You don't
> define "uniform meaning" or "stability" or "identifying information".
> I don't know what you mean by "from the point of view of the source"
> -- are you referring to Humpty Dumpty's view that a word means
> precisely what he wants it to mean?

> I think that again you are trying to simplify by sidestepping the 3986
> factoring of URI -> representation into URI -> resource ->
> representation.  Simplification is good, but in this case it will just
> amplify the current confusion around this point.  Maybe there is a way
> to replace 3986 with a simpler theory and eliminate the intermediary
> "resource", but this document is not the place to do it.


I worked on the wording here, but I'm not sure I solved the problem
to your satisfaction. This is, after all, introductory text. Is there
a problem here?


> The way to go with duri: is to say that the duri: "identifies" a
> resource that is the condition (or state) of the subject resource at
> the given time.  duri:T:X is this resource: "What the resource
> identified by 'X' at time T was like at time T".  The dual "at time T"
> is needed because there are two time-varying mappings at play, one
> from URI to resource and another from resource to "representation" (or
> other time-dependent properties, for non-"informationresources").

I didn't do this; I think duri:<timeT>:<uri> only fixes <uri> => <resource>
and that <resource> => <representation> at time <timeT> is only fixed
insofar as it was unambiguous at <timeT>.

>     There is a significant dependence in the interpretation of many URNs
>     with the concept of "naming authority".  The authority is presumably
>     some individual or organization both to insure uniqueness of
>     assignment and also to help with understanding the meaning of the
>     link between the name and the named.

> You are doing a disservice to URNs by talking of name assignment
> authority and name resolution service (not authority) in the same
> breath.  These are orthogonal functions. At the birth of a namespace
> the functions are often carried out by the same organization, and this
> is why people tend to conflate the two ideas.  But the distinction is
> essential, I think, to the nature of URNs, so it should be celebrated,
> not glossed.

I understand they're separable services but disagree they are orthogonal.
If I have two agencies which hate each other, "Name Assignment Service
Green" and "Name Resolution Service Red", and everyone goes to Green
to get a name, and everyone goes to Red to find out what the name means,
well, Red has control, and can just ignore anything Green assigns.
In any case I did separate out the functions.

>   However, authorities, whether individuals or organizations, have a

>  1.2.  URIs for abstractions

I renamed this.

     One might use a URI such as "mailto:" email address to identify a
     person,

> You cannot use "mailto:" to identify a person without contradicting
> the "mailto:" URI scheme registration.  Please don't suggest this.

"One might wish to use ..."


======================================================
If we have time at the next TAG F2F to talk about duri/tdb, I'll say
that I did go through the rest of this message and addressed most
of the issues you've raised, but perhaps we could -re-review to see
if I left anything out.
========================================================



     or a "http:" URI to identify an abstract comment.  However,
     this leaves the question of how one might identify, within the same
     context, both the system mailbox and the person to which it is
     assigned, or the web page at a http URI and the concept it describes.

Not "concept" - again web pages can describe anything, not just
concepts (whatever those are!).  How about "and what it describes" or
"the entity it describes" or "the thing it describes" or "the resource
it describes".

     The 'tdb' URI scheme allows ready assignment of URIs for abstractions
     that are distinguished from the media content that describes them.

Not "abstractions".  What you are saying is that the "media content"
describes something - not necessarily an abstraction - and the 'tdb'
scheme allows ready assignment of a URI to whatever it describes
(which is not the media content except in pathological situations).

     The goal, then, of the 'tdb' URI scheme is to provide a mechanism
     which is, at the same time:

        permanent: The identity of the resource identified is not subject
        to reinterpretation over time.

Well, anything written by anyone is subject to reinterpretation,
that's just the way it goes.  But I guess your intent is clear.

Why not just say that the URI is not subject to reinterpretation over
time, and skip the identity/identified bit?  The interpretation *is*
what's identified, right?

        explicitly bound: The mechanism by which the identified resource
        can be determined is explicitly included in the URI.

I don't get this.  If an http: URI yields representations, then the
*server* knows what the resource is, but it's unlikely any client does
- all they know is a couple of representations that they happened to
get; and if the server has been offline for ten years, how can
*anyone* determine what the resource is?

I think you mean to be talking about objectivity: what's meant by the
DURI, or how it's to be interpreted, is explicit in the DURI.

        useful for non-networked items: Allows identification of resources
        outside the network: people, organizations, abstract concepts.

        no administration: The mechanism does not depend on reliable
        administrative processes of authorities for either assignment or
        interpretation.

other than adherence to this RFC, that is.


  2.  Syntax

...

  3.  Semantics

  3.1.  'duri' Semantics

     It is traditional in convention references and citations in printed

conventional

     works to include the date of publication; this practice serves the
     important purpose that the context of the naming can be determined.

Since the context can't necessarily be determined, how about
"important purpose of determining the context of the naming".

Although determining anything about the context other than the time,
if that's possible at all, would require investigative work.

     The meaning of a 'duri' URI is "the resource that was identified by
     the <encoded-URI> (after hex decoding) at the date(time) given".

If one resource can have one representation at one time, and a
different representation at a future time, as is permitted by 3986,
then this is not good enough for your purposes.  You also want to say
that it's the resource as it was at that time.  See above.

     For example, "duri:2001:http://www.ietf.org" is a persistent
     identifier to "http://www.ietf.org" as of 2001.  A 'duri' URI may not
     be a resource locator in a practical sense: the time of location has
     not yet arrived or has passed.

"may not" => "is not necessarily"

Not sure what you mean by "time of location".  I would say something
like "the binding and/or condition may not yet..."


  3.2.  'tdb' Semantics

     The 'tdb' URI scheme is intended to be useful for describing
     entities, concepts, abstractions, and other items which may not
     themselves be network accessible resources, but have been at some
     point described by network accessible resources.

Re "describing" please don't introduce yet another near miss for
"identify" (term of art) - we already have "name" and "designate".
Also you're introducing "item" as a new near-synonym for "resource" or
"thing".

     A 'tdb' URI is intended to be used where the <encoded-URI> identifies
     a 'document' (something a person could read, peruse, understand) or a
     fragment thereof, where the document describes some thing or concept.

Concepts are not things?  That suggests there are *other* things that
aren't things, too...  How about just "describes some thing" or
described something" or "describes some resource".

     The 'tdb' URI itself then identifies the subject of that document.
     It is common practice to give a reference for a concept by including
     a pointer to a document, segment, phrase that defines the concept;
     'tdb' attempts to capture this practice in URI space.

What is the relation between concepts and resources?
What if the document has more than one subject / defines more than one
concept?

     For example, one might use "tdb:2008:http://www.ietf.org" as a
     persistent identifier for the Internet Engineering Task Force, as
     described by the "http://www.ietf.org" in 2008.

I prefer tdb:duri:2008:http://www.ietf.org


     The 'tdb' URI scheme differs from other URI or URN methods for
     identifying abstractions because the designation of what is actually
     identified by the 'tdb' doesn't depend on knowing the intention of
     the "assigner" of the identifier.  Unlike "tag", "info", "cid", "mid"
     or related schemes, the identification is not dependent on the
     context of use.  The 'tdb' URI scheme can be thought of as giving a



  Masinter                 Expires April 25, 2011                 [Page 7]

  Internet-Draft      The 'tdb' and 'duri' URI schemes        October 2010


     way to invoke a level of semantic indirection to URI resolution.

     While one could imagine using 'tdb' without a date, it would leave
     the possibility that a reference that is unambiguous at one time
     might become ambiguous at some other time.  There are two ways that
     the date is useful for 'tdb' URIs: it fixes the time of access of the
     resource, for variable descriptions, and it fixes the time of
     interpretation, for descriptions whose meaning (in natural language)
     might vary.

Why is it that tdb: fixes the time of interpretation, but duri:
doesn't?

If I know the date of a document's publication, I will do my best to
interpret the document in the way that it would have been interpreted
around that time.

Whether this needs to be specified by the URI scheme, or is just
common sense, is not clear.

  3.3.  Timestamp Semantics

     It is traditional in convention references and citations in printed

conventional

     works to include the date of publication; this practice serves the
     important purpose that the context of the naming can be determined.

     While one could imagine using 'tdb' without a timestamp, it would
     leave the possibility that a reference that is unambiguous at one
     time might become ambiguous at some other time.  There are two ways
     that the date is useful for 'tdb': it fixes the time of access of the
     resource, for variable descriptions, and it fixes the time of
     interpretation, for descriptions whose meaning (in natural language)
     might vary.  While normally, in a literary work in natural language
     which makes a reference to another work, both the reference itself
     and the work referenced are dated, e.g., a footnote in an article
     written in 1967 might talk about a "private communication" which
     itself had a date.  The difference between a URI and a conventional
     literary reference is the desire to be able to extract the URI from
     its context and still retain its meaning.

     The meaning of a timestamp is the interval specified by the
     granularity of the time range indicated, in the UTC time zone, as
     described in [RFC3339].  If necessary, timestamps can include times
     and even fractional times, so that a generator of 'duri' or 'tdb'
     URIs can be arbitrarily precise.

     If there is any ambiguity of the resource within the range of time
     indicated (for example, if the timestamp consists only of a year, and
     the resource changes over the course of the year), then the resource
     state as of the very last instant of the range indicated should be
     used.

     Timestamps are allowed to be specified with as much precision as
     needed.  This keeps most 'duri' and 'tdb' URIs relatively short.







  Masinter                 Expires April 25, 2011                 [Page 8]

  Internet-Draft      The 'tdb' and 'duri' URI schemes        October 2010


  4.  Use as a Locator

     A 'duri' URI is not directly useful as a resource locator, since many
     resources vary their content over time.

     A 'tdb' URI is not a resource locator in a practical sense, since it
     explicitly requires human interpretation.  However, it allows one to
     know that a resource was described at some point in time; whether the
     description is still available, or whether that description is still
     meaningful, is not guaranteed.

The resource needn't have been described or even observed at the
indicated time.  All you know is what you've already said above, that
the reference or description relates to the condition of a resource at
a particular time, where the resource in question was the one that at
that time was "identified" (or "described") by the subject URI.

     ...

     One might consider using 'tdb' with a "data" URI to designate
     concepts that can be described uniquely briefly inline.  For example,

          tdb:2001:data:,The%20US%20president

     names the concept described by the (text/plain) string "The US
     president" at the very last instant of 2001.

The president is a concept?

                                                   Of course, this
     practice is only useful if the referent of the data is (or was at the
     time) completely unique.  Since "data" does not contain a way to
     designate content-language,

(hey, this is a bug, how about if we fix it?)

                                 the string in question would have to not
     be ambiguous as to its language.  In the case of 'data', there is no
     assigning authority at all; the interpretation of the 'tdb' depend on
     the interpreting community.

Although your RFC doesn't explicitly say what resources data: URIs
"identify" or "locate", it seems pretty clear from all the examples
that they identify things that are pretty close to strings.  Therefore
RFC 2397 *is* the assigning authority.  Interpreting the data: URI as
a string is straightforward; interpreting that string as something
else has nothing to do with the data: URI scheme, so the comment about
"no assigning authority" is confused.

     Using 'tdb' or 'duri' with an embedded 'urn:' might not seem to be
     too useful,

tdb, very useful, why wouldn't it seem so?

                 but it might be useful where the assignment of names in a
     URN namespace are not, in practice, permanent, or that one might want
     to refer to the assignment as of a given date.  In this case, it is
     possible to use a "urn" within a 'duri', e.g.,

           duri:2000:urn:ietf:std:50

     might be used to refer to "the document that the IETF considered to
     be STD 50, as of the last instant of 2000".

     For 'tdb', many URIs identify resources which do not clearly describe
     anything at all.  The "home page" for an organization isn't nearly as
     good a resource to use to describe an organization as the
     organization's "about" page.

Depends on what the home page says, and what the about page says.

                                   But it is up to the minter of the 'tdb'
     URI to choose wisely.

  6.2.  Useful timestamps

     Timestamps far in the future are suspect, because the future content
     of a description resource cannot usually be reliably predicted.
     Timestamps which preceed the availability of the description resource
     should not be used either.  For example, using a http URI with a
     timestamp before the description resource is also not recommended.

     However, although these practices are not recommended, there is no
     assurance that they haven't been used; by itself, a 'tdb' URI by
     itself does not constitute an assertion that the description resource
     was available or assigned at the date specified.

     Note that the use of the "very last instant" allows for the
     conventional bibliographic convention that a work published in 2009
     can use "2009" as the date string, to refer to the work in the year
     of publication.



  Masinter                 Expires April 25, 2011                [Page 10]

  Internet-Draft      The 'tdb' and 'duri' URI schemes        October 2010


  6.3.  Free assignment

     Because of the many possible schemes that can be used in the
     <encoded-URI> portion, there should be no difficulty in almost any
     computational process being able to assign 'duri' or 'tdb' URIs at
     will.  Of course, it is necessary for there to be some resource which
     is available at some point in time, and to have a clock which is
     accurate to the granularity of the frequency of assignment.

  6.4.  Resolution

     There are no direct resolution servers or processes for 'duri' or
     'tdb' URIs.  However, a 'duri' URI might be "resolvable" in the sense
     that a resource that was accessed at a point in time might have the
     result of that access cached or archived in an Internet archive
     service.  See, for example, the "Internet Archive" project [archive].
     And a 'tdb' URI is "resolvable" in the sense that the description
     resource can be accessed and interpreted.

     Clients without access to an Internet archive service might take the
     decoded <encoded-URI> of a 'duri' and attempt resolution of *that*
     identifier.  This will give an approximation whose reliability
     depends on the what has happened in the time since the date
     indicated.

  6.5.  Why Names with Semantics?

     There are a number of URI and URN schemes that create otherwise
     unbound "names", where the scheme only provides for uniqueness, with
     some other agent or process or context providing the authority to
     interpret the meaning of the identifier at some point in the future.
     'duri' and 'tdb' is different, in that it is the agreement between
are
     the describer (the agent creating the URI) and the receiver of the
     URI (the agent interpreting the URI) to agree upon the semantics
     without any reference to any third party.

  6.6.  Avoiding MetaData

     One might consider the timestamp in a 'duri' or 'tdb' URI to be just
     one piece of additional metadata about the URI, and consider adding
     other pieces of metadata as annotation.

     However, the use of the timestamp is intended primarily as a
     mechanism of accomplishing uniqueness over time.  No other bit of
     metadata or description readily fills that purpose.  Further, the
     date is not descriptive (an assertion about the URI) but merely
     refining.




  Masinter                 Expires April 25, 2011                [Page 11]

  Internet-Draft      The 'tdb' and 'duri' URI schemes        October 2010


  6.7.  Avoiding 'duri' and 'tdb'

     Many applications of URIs already provide a context of timestamp.
     For example, one could imagine a hypertext system where the URIs
     contained within a document were intended to refer to the resources
     as of the date of the enclosing document.  This would be a reasonable
     interpretation of URIs within an Internet archive system, for
     example.

     Some applications of URIs already implicitly use the level of
     interpretive indirection that is explicit with 'tdb', For example,
     within an ontology language definition, the URIs used for abstract
     concepts, individuals and so forth are generally considered the
     "thing described by" the URI.

     In addition, the 'application/rdf+xml' Media Type [RFC3870] uses the
     fragment identifier resolution as an explicit way of identifying
     abstract concepts that are described by an RDF document.

"Abstract concept" is too limiting as RDF is used for concrete
non-concepts as well.

  6.8.  'tdb' and levels of indirection

     The 'tdb' scheme introduces a level of semantic indirection.  The
     puzzles and confusions about use and mention, name and reference, and
     levels of indirection have been puzzling and amusing for quite a
     while.

        "It's long," said the Knight, "but it's very, very beautiful.
        Everybody that hears me sing it--either it brings tears into their
        eyes, or else--"
        "Or else what?" said Alice, for the Knight had made a sudden
        pause.
        "Or else it doesn't, you know.  The name of the song is called
        'Haddock's Eyes.'"
        "Oh, that's the name of the song, is it?"  Alice said, trying to
        feel interested.
        "No, you don't understand," the knight said, looking a little
        vexed.  "That's what the name is called.  The name really is 'The
        Aged Aged Man.'"
        "Then I ought to have said 'That's what the song is called'?"
        Alice corrected herself.
        "No, you oughtn't: that's quite another thing!  The song is called
        'Ways and Means': but that's only what it's called, you know!"
        "Well, what is the song, then?" said Alice, who was by this time
        completely bewildered.
        "I was coming to that," the Knight said.  "The song really is
        'A-sitting On A Gate': and the tune's my own invention."  [LOOK]

I don't see what this section contributes.  As an alternative I would
suggest giving a reference to any useful exposition of the difference
between an utterance and the meaning of an utterance, such as

duri:20101102:http://en.wikipedia.org/wiki/Use%E2%80%93mention_distinction
Received on Wednesday, 2 February 2011 22:26:02 UTC