Re: A content-id URL scheme from Roy T. Fielding on 1995-02-09 (uri@w3.org from February 1995)

From: Roy T. Fielding <fielding@avron.ICS.UCI.EDU>
Date: Wed, 08 Feb 1995 16:00:52 -0800
To: Ed Levinson <elevinso@Accurate.COM>
Cc: uri@bunyip.com
Message-Id: <9502081601.aa28974@paris.ics.uci.edu>
Ed,

> The problem with explicitly dealing with escape characters in the
> syntax is that it unneccessarily mixes 822/1521/1522 and 1738 syntaxes.
> Addr-spec is an 822 item and when I go to 1522 and 1738 I see
> different escape mechanisms.

That is exactly why the full syntax description is necessary.  The "cid"
URL (and any other URL) has a specific allowed syntax, and a single escaping
mechanism, which differs from the syntax of those other specs.  That is
why we cannot use their existing BNF productions.

> In my view the cidurl could appear in 822/1521 message headers and
> messages.  I'd like the spec to be usable in both environments.  Has
> the issue of the different escape mechanisms (1521: =hh, 1738: %hh)
> already been dealt with?  What was the result?

There is only one escape mechanism for URLs: %hh.  The content of URL
may include characters that have special significance to the URL's
destination application, but that has no effect on the URL itself.
None of the 822/1521 escape mechanism are necessary for URLs, since
the allowed URL syntax is more restrictive than that of Internet mail headers.

It's important to differentiate between the syntax of the URL and its
semantics.  The URL syntax provides a mapping to an addr-spec, not the
addr-spec itself.  Thus, you cannot use a cidurl where an addr-spec is
expected, and you cannot use an addr-spec where a URL is expected, even
though the semantic content is identical.

> I'd prefer to say something like
> 
> 	cid-spec := addr-spec ; special characters must be escaped
> 
> which finesses the problem.

This would mean that the implementor must go through the same mechanism
I did -- expand the 20+ productions in rfc822 that underly "addr-spec"
and replace all allowed "special characters" with escapes.  Since implementors
rarely bother to do this (and even when they do, they tend to do it wrong),
I think it is better for us to just spell it out.  Furthermore, since our
requirements for a cidurl are just for a superset of the allowed syntax,
we can vastly simplify the above syntax to just the few productions below.
Although this may appear more complex to the reader, it is vastly more simple
for the implementor (the intended audience of the specification).

Hmmm, rather than restate it, I'll make the BNF as general as possible:

   ----------------------------------------------------------------------
   midurl       = "mid" ":" encoded-addr           ; RFC 822  Message-ID
   cidurl       = "cid" ":" encoded-addr           ; RFC 1521 Content-ID

   encoded-addr =  local-part "@" domain-part      ; globally unique

   local-part   =  1*mchar
   domain-part  =  1*mchar

   mchar        =  uchar | ";" | "/" | "?" | ":" | "&" | "="
   uchar        =  <as defined in RFC 1738>
   ----------------------------------------------------------------------

As an aside, I'll have to remember to ask Dave Crocker some time as to why
the addr-spec syntax (normally used for the To:, cc:, etc.) was reused for
Message-IDs.  It seems a bit silly to allow all the weird forms of 
Internet addresses within what should be a simple identifier, but I guess
we are stuck with it now.


......Roy Fielding   ICS Grad Student, University of California, Irvine  USA
                                     <fielding@ics.uci.edu>
                     <URL:http://www.ics.uci.edu/dir/grad/Software/fielding>
Received on Wednesday, 8 February 1995 19:23:28 UTC