Re: mid and cid URLs

Al Gilman (asg@severn.wash.inmet.com)
Wed, 22 Nov 1995 10:23:14 -0500 (EST)


From: asg@severn.wash.inmet.com (Al Gilman)
Message-Id: <9511221523.AA13116@severn.wash.inmet.com>
Subject: Re: mid and cid URLs
To: moore@cs.utk.edu (Keith Moore)
Date: Wed, 22 Nov 1995 10:23:14 -0500 (EST)
Cc: ietf-types@uninett.no, uri@bunyip.com
In-Reply-To: <199511220702.CAA02772@wilma.cs.utk.edu> from "Keith Moore" at Nov 22, 95 02:02:44 am

To follow up on what Keith Moore said ...
  
  > To follow up on what Keith Moore said ...
  >   
  >   
  >   I've been thinking of "cid", "message/external-body; access-type=cid"
  >   (and by association, "mid") as *URL* schemes, especially (for the
  >   former two) within the context of multipart/related.  Such a scheme is
  >   useful even without the URN infrastructure.
  >   
  > For all the reasons that Ned has been teaching me, a "cid" does
  > not represent a reliable _location_ reference.  It is a name used
  > in searching for the cited object.  Even inside the same MIME
  > multipart.
  
  Sorry, I was being fuzzy with my wording.  I'll explain what I meant
  by URN vs URL in the quote above.  
  
  I can implement multipart/related and access-type=content-id (or cid
  URLs) within multipart/related today, with no additional
  infrastructure that's not already in place.  In this sense content-id
  is like a URL; it requires no services that are not already available.
  
Ok, I understand what you mean in terms of incumbent capability, but
this is not the URL/URN distinction.  Roy already clarified that.

  On the other hand, I can't provide the ability for arbitrary MIME
  parts to reliably reference arbitrary objects by name, without some
  infrastructure for keeping track of the locations of those objects and
  mapping names to locations, like that anticipated for URNs.
  
  > The Message-ID is in use today as an object name in the
  > "In-reply-to" header in RFC 822 mail that knows nothing about
  > MIME.  And it is now used by Hypermail at random _receiving_
  > sites that have nothing to do with the sending location to
  > reconstruct threads of dialog.
  > 
  > You don't have to define and provide a retrieval service for object
  > naming on widely-distributed objects like RFC 822 mail and News to
  > be a useful construct.
  
  Yes, message-id is useful, and I didn't mean to imply otherwise.  But
  being able to thread articles together by message-id (especially
  articles which were all posted to the same mailing list) isn't the
  same thing as being able to refer to arbitrary files by name within
  messages and expecting mail readers (or web clients, whatever) to be
  able to access those files.
  
You are ignoring incumbent capability in the host file system.  I
don't know a web browser that doesn't support the file: access
method.  And I do know of HTML users who use this method
exclusively on their home PC which has no mail or HTTP while
drafting Web Page bundles before uploading them to the server.

The tools you want to strike a deal with as helpers already know how
to deal with sets of files in the local file system.  It is the least
common denominator.  
  
If the MIME tool will just get on with it and do file= disposition right,
the helpers are already set up to deal with the product.  The version
that involves extra work is the Content-ID scheme.

  If people are satisfied with the ability to follow message threads for
  messages posted to a single list and which are all availble from one
  repository, we don't need to define any more infrastructure to
  accomplish this.  The extra services are only needed when we want to
  be able to reference a much wider collection of documents by name, to
  have those references be usable for an extended period of time, etc.
  
  How did we get here?  From my vague recollection:
  
  + some people saw the need for intra-message references in MIME and
    proposed a mechanism for it using content-ids and message/external-body
    (thus involving minimal changes to MIME)

Someone else saw the need for inter-object references across archives
and messages and created the URI language.
  
  + someone else saw the need for a URL message/external-body access-type,
    (thus bringing about greater intergration between email and the web)
  
  + someone noticed that if content-id URLs were defined, they could fit
    into the URL access-type (similarly for message-ids)
  
  + other people noticed that some subset of {message-ids, content-ids, and 
    article-ids from netnews} look the same and suggested that they all
    be handled by the same syntax or mechanism.

This was a reduction to the Message-ID case of the concept of uniform
identifiers for information resources, to be used across a wide variety
of Internet archive and message modes.
  
  + eventually people started talking about how to make these look like
    traditional URLs, or how to make URNs look like message-ids.

On the URI side, this was the realization that there were resource identifiers
in the mail and MIME usage that hadn't been integrated into the Uniform
syntax by dealing with file: ftp: mailto: Gopher: and http: in the inital
burst.
  
  + others saw that these things are really a kind of URN, and 
    proposed them as a URN scheme
  
  It might not seem so, but there's a large gap in implementation
  difficulty between the first view of the world and the last ones.
  
  I'll now informally state Keith's Design Principle of URN Interoperability:
  
  	You should be able to type any kind of "URN" into any blank
  	labeled "URN" and get a reasonable result.
  
You can actually broaden that to URI.

  This implies that while you may have separate and disjoint URN spaces
  for the purpose of assignment, you should not have to remember the
  difference between the characteristcs of a "cid" URN and another kind
  of URN, and you should not have to know that you can type a "cid" URN
  into this blank labeled URN, but not into some other blank labeled
  URN.  If URNs don't have this characteristic, they're not a big
  improvement over URLs...they will still pretty much be tied to a
  protocol for resolution purposes (even if that protocol is 
  "grep Message-ID:{string} ~/Mail/folder/*")  In practice, message-ids or 
  URLs would do just as well for this purpose.
  
  So to put it another way:
  
  If we just want unique names, we've already got them -- they're called
  message-ids and content-ids.  They're pretty much sufficient as-is for
  references between different messages in the same thread and different
  body parts in the same message, respectively.

YES.
  
  If we want references in messages that can refer to arbitrary other
  files, you can use URLs (now) or URNs (eventually) to do this.
  
YES.

  But ... just because a content-id or a message-id has some of the
  characteristics of a URN, doesn't mean we can derive much additional
  benefit from calling it a URN.  The extra benefit from URNs over
  message-ids will be from a resolution infrastructure, not from a
  unified syntax.
  
At the level of the core <foo2i56hr4g@node..path> unique ID,
whether you call it URx or URy is immaterial.

I was raising the distinction because of the kinds of needs I see,
for example for FAQs.  There I want more than a bare ID in the citation.
These are information objects that are widely read and widely served.
Listing retrieval options is appropriate in addition to declaring the
unique key to match.

I think that when one considers decorating the unique ID with
tips as to alternatives of access methods -- newsgroups, archives
-- that the realization that this ID is not really an URL helps
us to accept the different flavor of the composite citation with
its attributes attached.

Al