RE: "tdb" and "duri" URI schemes... from Larry Masinter on 2010-11-04 (www-tag@w3.org from November 2010)

From: Larry Masinter <masinter@adobe.com>
Date: Thu, 4 Nov 2010 11:03:27 -0700
To: Jonathan Rees <jar@creativecommons.org>
CC: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <C68CB012D9182D408CED7B884F441D4D0476C1567C@nambxv01a.corp.adobe.com>
I'm working on a new version, went ahead and attached it
(can't post to IETF internet-drafts until Monday, I don't think.)


Anyway, here are responses to Jonathan. Items marked ***
are things I haven't addressed yet.


Based on feedback and an attempt at simplicity and to avoid
the need for double-escaping, I've been convinced to remove
the possibility of using fragment identifiers with "tdb".

If you want to use 'tdb', you need a single resource
to describe the thing you want a URI for.




> I hope that you will be coordinating with others who are
> working on similar issues.

Do you think add references to other work, and possibly
contrasting the differences, is sufficient? What other
work do you think it's most important to coordinate with?

I added references for tag:, info:, cid: and mid:. Are there
others?

> Somewhere you need to include a warning that two clients can observe
> completely different content for the same resource, at exactly the
> same time.  Time is not adequate to "identify" anything about what
> anyone actually observed since it potentially depends on all the
> details of the observation (IP address, cookies, which physical server
> responded, etc.).  All a DURI really says is that someone observed
> something at the given URI at a certain time - and this has to be
> taken on trust.

Added (using your words)

>     This document defines two URI schemes.  The first, 'duri' (standing
>     for "dated URI"), allows indicating a URI as of a particular date
>     (and time).

> It is not the URI that is "of a particular date".  Rather it is
> (according to your URI/resource theory articulated in RFC 3986) the
> binding of the URI to a resource and the condition (state, whatever)
> of the resource to which the URI is bound.  I think you should say
> "indicating a resource as of a particular date" since that can be read
> as covering both bases.

I changed "indicating" to "identifying" since that's the I in URI.

>                 This allows explicit reference to the "time of
>     retrieval", similar to the way in which bibliographic references
>     containing URIs are used.

> I would say "are written", not "are used".

yes


> While many people will know what you're talking about, some of your
> audience won't.  You need to either remove the "similar to" or expand
> on it.

I added a reference.

>     The second scheme, 'tdb' ( standing for "Thing Described By"),
>     provides a way of using a way of minting URIs for anything that can
>     be described,

> Please don't propagate this "anything that can be described" meme.
> It's a silly and meaningless distinction.  Just say "anything" or "any
> resource".

Personally, one of the things that I find myself dwelling on is the
infinite order  -- there are uncountably many "things", but only a
countable number of "descriptions".  At least, for me, the distinction
isn't silly or meaningless. In addition, the notion of "identity" is
associated with the description rather than the thing-described, which
seems to me to be an important distinction; I'd rather elaborate these
ideas than leave them unstated.

That is, "things" don't really form a set, in the sense of having
a clear equality relationship. We talked about this before and I don't
think I convinced you, but perhaps you'll have more sympathy for
my continuing to talk about "anything that can be described" vs
"anything".

>                  with the ability to fix the description to a given date
>     or time.

> I would not put the time-binding feature directly into tdb:, but
> rather use an orthogonal design that uses the time-binding ability of
> the subject URI.

I did consider that, but felt uncomfortable not binding the date of
interpretation. Of course, almost all descriptions are really ambiguous
even at a single instant, but it's certain that meanings vary over time,
and tdb:duri: only fixes the URI->representation mapping.

> That is, write tdb:duri:...  and put the time in the
> DURI.  Then, if the resource is already dated (via URI scheme, or some
> other "persistent" method) there's no need to repeat the time in the
> tdb:.  This orthogonality also simplifies the specification.

It might simplify things, but I'm uncomfortable with the simplification.
And in any case, as a "thought experiment", I'd rather have the
complexity.

>              The 'tdb' URI scheme may reduce the need to define define
>     new URN namespaces merely for the purpose of creating stable
>     identifiers for concepts or abstractions: it provides a ready means
>     for identifying "non-information resources" by semantic indirection
>    -- a way of creating a URI for anything.

> Have you checked this hypothesis out with anyone who has created or is
> thinking of creating a URN namespace?  Remember that URN namespaces
> give something that duri: and tdb: don't, which is a registry (stored
> in the RFC corpus) of namespace definitions maintained by IETF.  As
> this is the primary value proposition of URNs, I think it unlikely
> that a URN customer would forego it.

I removed the discussion of reducing URN namespaces.

>    Many people have wondered how to create globally unique and
>    persistent identifiers.  There are a number of URI schemes and URN
>    namespaces already registered.  However, an absolute guarantee of
>    both uniqueness and persistence is very difficult.

> Nay, impossible.

I reworded this.


>     In some cases, the guarantee of persistence comes through a promise
>     of good management practice, such as is encouraged in "Cool URLs
>     don't change" [COOL].  However, relying on promise of good management
>     practice is not the same as having a design that guarantees
>      reliability independent of actual administrative practice.

> Adhering to the design would itself be an "administrative practice".
> This is a difference of degree, not of kind.  Your point is a good one
> but it needs to be said in a way that does not oversell it.

I reworded this.

>    A primary design goal for URIs is that they are intended to mean the
>    same thing, no matter in what context they appear: a "Uniform" way to
>    Identify a Resource.  However, even when URIs have Uniform meaning
>    from the point of view of the source of the reference, they don't
>    guarantee stability over time.  Despite best efforts and intentions,
>    identifying information can change in unpredictable ways: domain
>    names can disappear or be reassigned, name assigning organizations
>    can change structure, responsibility, disappear, merge, or change in
>    unpredictable ways.

> Again, the point is good, but it could be expressed better.  You don't
> define "uniform meaning" or "stability" or "identifying information".

This text is just intended to be explanatory by relating duri / tdb
in terms that the reader might otherwise understand. Surely those
terms are not so obscure as to require definition?

> I don't know what you mean by "from the point of view of the source"
> -- are you referring to Humpty Dumpty's view that a word means
> precisely what he wants it to mean?

I removed "from the point of view of the source"

> I think that again you are trying to simplify by sidestepping the 3986
> factoring of URI -> representation into URI -> resource ->
> representation.  Simplification is good, but in this case it will just
> amplify the current confusion around this point.  Maybe there is a way
> to replace 3986 with a simpler theory and eliminate the intermediary
> "resource", but this document is not the place to do it.

> (I'm not suggesting you use the term "representation", or not.)

> The way to go with duri: is to say that the duri: "identifies" a
> resource that is the condition (or state) of the subject resource at
> the given time.  duri:T:X is this resource: "What the resource
> identified by 'X' at time T was like at time T".  The dual "at time T"
> is needed because there are two time-varying mappings at play, one
> from URI to resource and another from resource to "representation" (or
> other time-dependent properties, for non-"informationresources").

I'm not sure I buy "the resource identified by 'X' at time T" is
time varying. (or perhaps I should quote Alice and the Caterpillar ... http://www.xs4all.nl/~4david/alice3.html ).

>     There is a significant dependence in the interpretation of many URNs
>     with the concept of "naming authority".  The authority is presumably
>     some individual or organization both to insure uniqueness of
>     assignment and also to help with understanding the meaning of the
>     link between the name and the named.

> You are doing a disservice to URNs by talking of name assignment
> authority and name resolution service (not authority) in the same
> breath.

I think I split this.

> These are orthogonal functions.  At the birth of a namespace
> the functions are often carried out by the same organization, and this
> is why people tend to conflate the two ideas.  But the distinction is
> essential, I think, to the nature of URNs, so it should be celebrated,
> not glossed.

acknowledged, but I didn't go as far as celebrating.

>     However, authorities, whether individuals or organizations, have a

> That should be "resolution services", not "authorities".

I rewrote this but it's both naming services (authorities) and
resolution services (authority) because the naming service
needs to insure that it doesn't allocate the same name for
something else, later.

>   1.2.  URIs for abstractions

> "Abstractions" is a terrible ontological category.  What's an example
> of something that's not an abstraction?  I think the distinction
> you're looking for is - in the words of the passage you quote below -
> "resources not accessible via the Internet".  Just say that instead.

I was hoping to be evocative, not proscriptive, "people, place or thing
or abstract". I did reword this section, though, let me know if
what I wrote is better.

>     One might use a URI such as "mailto:" email address to identify a
>     person,

> You cannot use "mailto:" to identify a person without contradicting
> the "mailto:" URI scheme registration.  Please don't suggest this.

I put this as "one might want to use ..."

>     or a "http:" URI to identify an abstract comment.  However,
>     this leaves the question of how one might identify, within the same
>     context, both the system mailbox and the person to which it is
>     assigned, or the web page at a http URI and the concept it describes.

> Not "concept" - again web pages can describe anything, not just
> concepts (whatever those are!).  How about "and what it describes" or
> "the entity it describes" or "the thing it describes" or "the resource
> it describes".

I worked on this, hope you liked it.

>     The 'tdb' URI scheme allows ready assignment of URIs for abstractions
>     that are distinguished from the media content that describes them.

> Not "abstractions".  What you are saying is that the "media content"
> describes something - not necessarily an abstraction - and the 'tdb'
> scheme allows ready assignment of a URI to whatever it describes
> (which is not the media content except in pathological situations).

Not sure if I got this, but I tried.

>     The goal, then, of the 'tdb' URI scheme is to provide a mechanism
>     which is, at the same time:

>       permanent: The identity of the resource identified is not subject
>       to reinterpretation over time.

> Well, anything written by anyone is subject to reinterpretation,
> that's just the way it goes.  But I guess your intent is clear.

> Why not just say that the URI is not subject to reinterpretation over
> time, and skip the identity/identified bit?  The interpretation *is*
> what's identified, right?

ok

>       explicitly bound: The mechanism by which the identified resource
>       can be determined is explicitly included in the URI.

> I don't get this.  If an http: URI yields representations, then the
> *server* knows what the resource is, but it's unlikely any client does
> - all they know is a couple of representations that they happened to
> get; and if the server has been offline for ten years, how can
> *anyone* determine what the resource is?

> I think you mean to be talking about objectivity: what's meant by the
> DURI, or how it's to be interpreted, is explicit in the DURI.

OK

>       useful for non-networked items: Allows identification of resources
>       outside the network: people, organizations, abstract concepts.

>       no administration: The mechanism does not depend on reliable
>       administrative processes of authorities for either assignment or
>       interpretation.

> other than adherence to this RFC, that is.

I just changed it to say "interpretation".

>     It is traditional in convention references and citations in printed
> conventional

OK

 >    works to include the date of publication; this practice serves the
 >    important purpose that the context of the naming can be determined.

> Since the context can't necessarily be determined, how about
> "important purpose of determining the context of the naming".

> Although determining anything about the context other than the time,
> if that's possible at all, would require investigative work.

I rewrote a lot of this.

>      The meaning of a 'duri' URI is "the resource that was identified by
>     the <encoded-URI> (after hex decoding) at the date(time) given".

> If one resource can have one representation at one time, and a
> different representation at a future time, as is permitted by 3986,
> then this is not good enough for your purposes.  You also want to say
> that it's the resource as it was at that time.  See above.

 >    For example, "duri:2001:http://www.ietf.org" is a persistent
 >     identifier to "http://www.ietf.org" as of 2001.  A 'duri' URI may not
 >     be a resource locator in a practical sense: the time of location has
 >     not yet arrived or has passed.

> "may not" => "is not necessarily"

worked on these

> Not sure what you mean by "time of location".  I would say something
> like "the binding and/or condition may not yet..."

rewrote

>   3.2.  'tdb' Semantics
>     The 'tdb' URI scheme is intended to be useful for describing
>     entities, concepts, abstractions, and other items which may not
>    themselves be network accessible resources, but have been at some
>     point described by network accessible resources.

> Re "describing" please don't introduce yet another near miss for
> "identify" (term of art) - we already have "name" and "designate".
> Also you're introducing "item" as a new near-synonym for "resource" or
> "thing".

I got rid of 'item'. I think 'describing' is part of what you need
to make a 'tdb' URI. But rewrote.

>     A 'tdb' URI is intended to be used where the <encoded-URI> identifies
>     a 'document' (something a person could read, peruse, understand) or a
>     fragment thereof, where the document describes some thing or concept.

> Concepts are not things?  That suggests there are *other* things that
> aren't things, too...  How about just "describes some thing" or
> described something" or "describes some resource".

I rewrote this enough that I can't find any words that match what you
wanted me to rewrite, so I hope I got this.

>     The 'tdb' URI itself then identifies the subject of that document.
>     It is common practice to give a reference for a concept by including
>    a pointer to a document, segment, phrase that defines the concept;
>    'tdb' attempts to capture this practice in URI space.

> What is the relation between concepts and resources?
> What if the document has more than one subject / defines more than one
> concept?

It's not a good idea. I will put in something about this. ***


>     For example, one might use "tdb:2008:http://www.ietf.org" as a
>    persistent identifier for the Internet Engineering Task Force, as
>     described by the "http://www.ietf.org" in 2008.

> I prefer tdb:duri:2008:http://www.ietf.org


> Why is it that tdb: fixes the time of interpretation, but duri:
> doesn't?

because 'tdb' involves interpretation (of a description) and
'duri' doesn't.


> If I know the date of a document's publication, I will do my best to
> interpret the document in the way that it would have been interpreted
> around that time.

> Whether this needs to be specified by the URI scheme, or is just
> common sense, is not clear.

Well, that's the reason why there is only one date in "tdb" rather
than two. *** I suppose I should say something about this too.


>     A 'tdb' URI is not a resource locator in a practical sense, since it
>     explicitly requires human interpretation.  However, it allows one to
>     know that a resource was described at some point in time; whether the
>    description is still available, or whether that description is still
>     meaningful, is not guaranteed.

> The resource needn't have been described or even observed at the
> indicated time.  All you know is what you've already said above, that
> the reference or description relates to the condition of a resource at
> a particular time, where the resource in question was the one that at
> that time was "identified" (or "described") by the subject URI.

Not sure I fixed this, but I worked on it.

> The president is a concept?

changed terminology


>                                                  Of course, this
>     practice is only useful if the referent of the data is (or was at the
>     time) completely unique.  Since "data" does not contain a way to
>     designate content-language,

> (hey, this is a bug, how about if we fix it?)

no, no interest in changing data:. You can use data: with text/html
though or application/xml with an xml:lang in it.


>                                the string in question would have to not
>     be ambiguous as to its language.  In the case of 'data', there is no
>     assigning authority at all; the interpretation of the 'tdb' depend on
>     the interpreting community.

> Although your RFC doesn't explicitly say what resources data: URIs
> "identify" or "locate", it seems pretty clear from all the examples
> that they identify things that are pretty close to strings.

Well, this is actually wrong, and I need to fix it. ***, I'm just
using data: with text/plain, but it can be any MIME type.

>  Therefore
> RFC 2397 *is* the assigning authority.

I don't like that terminology; I think it's better to think of
an authority as a person or organization. The RFC doesn't 'assign'
anything, it might define something.

>   Interpreting the data: URI as
> a string is straightforward; interpreting that string as something
> else has nothing to do with the data: URI scheme, so the comment about
> "no assigning authority" is confused.

OK, I think I see this ...****

>     Using 'tdb' or 'duri' with an embedded 'urn:' might not seem to be
>     too useful,

> tdb, very useful, why wouldn't it seem so?

Yes, I changed this to just say "duri".


>     For 'tdb', many URIs identify resources which do not clearly describe
>     anything at all.  The "home page" for an organization isn't nearly as
>     good a resource to use to describe an organization as the
>     organization's "about" page.

> Depends on what the home page says, and what the about page says.

I edited this.

>     In addition, the 'application/rdf+xml' Media Type [RFC3870] uses the
>    fragment identifier resolution as an explicit way of identifying
>     abstract concepts that are described by an RDF document.

> "Abstract concept" is too limiting as RDF is used for concrete
> non-concepts as well.

fixed

> I don't see what this section contributes.  As an alternative I would
> suggest giving a reference to any useful exposition of the difference
> between an utterance and the meaning of an utterance, such as

> duri:20101102:http://en.wikipedia.org/wiki/Use%E2%80%93mention_distinction


this is now
  duri:2010-11-02:http://en.wikipedia.org/wiki/Use%E2%80%93mention_distinction


should I add this as a reference? I really don't want to take out
Alice.
Attachments

text/plain attachment: duri.txt
Received on Thursday, 4 November 2010 18:04:11 UTC