Re: mid and cid URLs

Ned Freed (NED@innosoft.com)
Tue, 21 Nov 1995 12:03:24 -0800 (PST)


Date: Tue, 21 Nov 1995 12:03:24 -0800 (PST)
From: Ned Freed <NED@innosoft.com>
Subject: Re: mid and cid URLs
In-Reply-To: "Your message dated Tue, 21 Nov 1995 12:19:33 -0500 (EST)"
To: asg@severn.wash.inmet.com
Cc: elevinso@Accurate.COM, ietf-types@cs.utk.edu, uri@bunyip.com
Message-Id: <01HXWLFACL0G9BWNBV@INNOSOFT.COM>

> I need another MIME lesson.

No problem ;-)

> As I look at the draft proposed schemes for mid and cid URLs, a couple
> of thoughts come up:

> 1. By construction, these two nominal schemes are one scheme and we
> should only use one name for them.  MID or MIDCID are possibles.

While its certainly possible to do this, I don't see why you'd want to.
Message-IDs and Content-IDs are distinct entities. A given part of a message
can have neither, one, or both of them.

There is also the question of scope. I see support of message-ids as a
cross-message sort of thing, preferably implemented as an index emcompassing
the entire mailbox. (Preference would be given to whatever message is
"current", of course.) Content-ids, on the other hand, are largely intended to
be used within a single message. It therefore seems logical to give some
indication of scope in the scheme identifier.

I guess what I'm asking is what advantage you in collapsing the schemes into
one. If there is a big one I guess I wouldn't mind making such a change.

> 2. A more URL-traditional syntax would be something like

> 	"mid:" //host-net-path/message-unique

> where the RFC Message-ID object was <message-unique@host-net-path>.

Ed is the expert on this one...

> 3. The structure of a MIME multipart defines a well-ordered hierarchical
> space where at each level there is a linear sequence of parts.  We
> could index into this part tree with part numbers.  MIME message/partial
> usage establishes the precedent that parts number from 1 and the
> generic Internet URL syntax sets the precedent that paths punctuate with
> '/'.  If we simply extrapolate from these two boundary conditions, we
> get a part URL more or less like this:

> 	midcidurl ::= "mid:" //host-net-path/message-unique part-number
	
> 	part-number ::= *( / decimal-integer )

> where the interpretation of the decimal integers is as follows:

> For each level of hierarchy (defined separator)
> 	0 refers to the unPart before the first opening-separator
> 	1 refers to the Part after the first opening-separator
> 	... 			; there are N parts and N opening-separators
> 	n > N refers to the unPart after the closing-separator

> 	opening-separator ::= "--" separator-key
> 	closing-separator ::= "--" separator-key "--"

> 4. To retrieve an object by its Content-ID, the usage

> 	cidurl ::= "mid:" //host-net-path/message-unique?part-designation

> 	part-designation ::= part-unique [@host-path-if-different]
> where
> 	Content-ID == <part-unique@(host-net-path | host-path-if-different)>

> 				; and I have not addressed encoding problems

> is more consistent with general Internet URL usage than
> introducing the Content-ID with '#'.  In particular, for the
> general pattern of URLs, the #fragment clause makes no difference
> in the object that is served, only its state as presented.

> Since the retrieval of a part by Content-ID only needs to get the
> content of the part, the ? syntax which supports searching and can
> affect the scope of the object retrieved is more consistent.

> This usage would establish, as a rule under the mid: scheme, that
> searching defaults to matching part-designation as constructed
> above.

The problem with all this is that this basic assumption is flawed:

 We could index into this part tree with part numbers.

This assumes that from the time the message is composed to the time it 
is received and put in a message store nothing happens that could perturb
the part structure. Example abound that contradict this. Some examples:

(1) Security services may operate after the message is composed (in fact
    they have to) and may add additional encapsulation layers. Working around
    this leads to a hideous interaction where the agent constructing the
    core message has to know in advance what structure the security service
    is going to add. In addition, it presupposes that the agent constructing
    the message knows whether or not the the security wrapper will last
    throughout the life of the message. There is no way it can know this, since
    a receiver might elect to strip wrappers before storing or might elect
    not to, and might change its mind from time to time.

(2) Transit through non-MIME systems may not preserve message structure. The
    most blatent example of this is exemplified by most of the LAN email
    systems, which can only handle a serious of unadulterated parts.

    Now, one can argue that content-ids won't be preserved either. In many
    cases this is true, but examples do exist of systems that are capable
    of preserving content-id information without preserving part structure.

(3) MIME-MIME conversion gateways are becoming increasingly common. Such
    facilities preserve structure as a whole, but may elect to add parts,
    delete parts, and turn what used to be a single part into a multipart
    structure (usually multipart/alternative). 

    We provide one of these in our products, as a matter of fact. The
    current version preserves structure unconditionally, and would that it
    could have stayed that way. Unfortunately the advent of various systems
    that produce wierd structures or bogus parts has forced us to implement
    structural manipulation primitives, and I'm sure other vendors will find
    themselves forced to do similar things.

(4) An unforunate reality of some present-day X.400 and MIME agents is an
    inability to handle nested messages. (At least two popular MIME agents
    and several popular X.400 systems have this problem. How the X.400
    systems managed to pass the conformance tests they claim to have passed
    and not support this is beyond me, but that's another story for another
    day.) This leads to situations where agents have no choice but to flatten
    out unnecessary message levels.

    This is one I'm especially aware of because not only do we deal with the
    broken agent, we also handle another agent that sends absolutely everything
    using nested message structures. Go figure.

    This can be accomodated by using nested-message-relative numbering (which
    you want anyway in order to allow forwaring without scanning the entire
    message and mucking with its content), but its still a problem.

(5) Forwarding of messages by user agents in some cases disrupts the part
    structure. This can be intentional, when for example a user deletes a
    part from a forwarded message, or it can be unintentional, where agents
    simply don't handle forwarding properly.

I can dig up more examples, but I think the point is clear -- messages are
malleable things, and numbering schemes simply don't work very well with them.
Labels, on the other hand, do.

BTW, this discussion is pretty similar to the arguments for using line or byte
counts instead of boundary markers for multipart structures. We elected to
use a labelling approach there because of the malleability of messages, and
the same logic applies here as well.

				Ned