Re: mid and cid URLs

Keith Moore (moore@cs.utk.edu)
Wed, 22 Nov 1995 02:02:44 -0500


Message-Id: <199511220702.CAA02772@wilma.cs.utk.edu>
From: Keith Moore <moore@cs.utk.edu>
To: asg@severn.wash.inmet.com (Al Gilman)
Cc: moore@cs.utk.edu (Keith Moore), uri@bunyip.com, elevinso@Accurate.COM,
Subject: Re: mid and cid URLs 
In-Reply-To: Your message of "Tue, 21 Nov 1995 17:17:48 EST."
             <9511212217.AA07499@severn.wash.inmet.com> 
Date: Wed, 22 Nov 1995 02:02:44 -0500

> To follow up on what Keith Moore said ...
>   
>   
>   I've been thinking of "cid", "message/external-body; access-type=cid"
>   (and by association, "mid") as *URL* schemes, especially (for the
>   former two) within the context of multipart/related.  Such a scheme is
>   useful even without the URN infrastructure.
>   
> For all the reasons that Ned has been teaching me, a "cid" does
> not represent a reliable _location_ reference.  It is a name used
> in searching for the cited object.  Even inside the same MIME
> multipart.

Sorry, I was being fuzzy with my wording.  I'll explain what I meant
by URN vs URL in the quote above.  

I can implement multipart/related and access-type=content-id (or cid
URLs) within multipart/related today, with no additional
infrastructure that's not already in place.  In this sense content-id
is like a URL; it requires no services that are not already available.

On the other hand, I can't provide the ability for arbitrary MIME
parts to reliably reference arbitrary objects by name, without some
infrastructure for keeping track of the locations of those objects and
mapping names to locations, like that anticipated for URNs.


> The Message-ID is in use today as an object name in the
> "In-reply-to" header in RFC 822 mail that knows nothing about
> MIME.  And it is now used by Hypermail at random _receiving_
> sites that have nothing to do with the sending location to
> reconstruct threads of dialog.
> 
> You don't have to define and provide a retrieval service for object
> naming on widely-distributed objects like RFC 822 mail and News to
> be a useful construct.

Yes, message-id is useful, and I didn't mean to imply otherwise.  But
being able to thread articles together by message-id (especially
articles which were all posted to the same mailing list) isn't the
same thing as being able to refer to arbitrary files by name within
messages and expecting mail readers (or web clients, whatever) to be
able to access those files.

If people are satisfied with the ability to follow message threads for
messages posted to a single list and which are all availble from one
repository, we don't need to define any more infrastructure to
accomplish this.  The extra services are only needed when we want to
be able to reference a much wider collection of documents by name, to
have those references be usable for an extended period of time, etc.


How did we get here?  From my vague recollection:

+ some people saw the need for intra-message references in MIME and
  proposed a mechanism for it using content-ids and message/external-body
  (thus involving minimal changes to MIME)

+ someone else saw the need for a URL message/external-body access-type,
  (thus bringing about greater intergration between email and the web)

+ someone noticed that if content-id URLs were defined, they could fit
  into the URL access-type (similarly for message-ids)

+ other people noticed that some subset of {message-ids, content-ids, and 
  article-ids from netnews} look the same and suggested that they all
  be handled by the same syntax or mechanism.

+ eventually people started talking about how to make these look like
  traditional URLs, or how to make URNs look like message-ids.

+ others saw that these things are really a kind of URN, and 
  proposed them as a URN scheme

It might not seem so, but there's a large gap in implementation
difficulty between the first view of the world and the last ones.

I'll now informally state Keith's Design Principle of URN Interoperability:

	You should be able to type any kind of "URN" into any blank
	labeled "URN" and get a reasonable result.

This implies that while you may have separate and disjoint URN spaces
for the purpose of assignment, you should not have to remember the
difference between the characteristcs of a "cid" URN and another kind
of URN, and you should not have to know that you can type a "cid" URN
into this blank labeled URN, but not into some other blank labeled
URN.  If URNs don't have this characteristic, they're not a big
improvement over URLs...they will still pretty much be tied to a
protocol for resolution purposes (even if that protocol is 
"grep Message-ID:{string} ~/Mail/folder/*")  In practice, message-ids or 
URLs would do just as well for this purpose.

So to put it another way:

If we just want unique names, we've already got them -- they're called
message-ids and content-ids.  They're pretty much sufficient as-is for
references between different messages in the same thread and different
body parts in the same message, respectively.

If we want references in messages that can refer to arbitrary other
files, you can use URLs (now) or URNs (eventually) to do this.

But ... just because a content-id or a message-id has some of the
characteristics of a URN, doesn't mean we can derive much additional
benefit from calling it a URN.  The extra benefit from URNs over
message-ids will be from a resolution infrastructure, not from a
unified syntax.

Keith