Re: Global tags [was: HTTP Protocol Extentions] from Larry Masinter on 1997-09-22 (www-push@w3.org from July to September 1997)

From: Larry Masinter <masinter@parc.xerox.com>
Date: Mon, 22 Sep 1997 14:01:46 PDT
To: Arthur van Hoff <avh@marimba.com>
CC: Push Workshop <www-push@w3.org>, DRP Mailing List <drp@marimba.com>
Message-ID: <3426DCBA.C92482F9@parc.xerox.com>

Part of the problem in this discussion is that you continue to
use the phrase "Content Identifier" in ways that are contrary
to the common usage in other Internet protocols. I'm trying to
be careful, but it's hard if we're not speaking the same language.

In the following message, a "Content Identifier" is intended
to refer to what is described in section 7 of RFC 2045 (the
replacement for RFC 1541), entitled "The Content-ID header",
a world unique identifier for content, noted as being similar
to the "Message-ID" header; the Content-ID header also appears
in RFC 1848, RFC 1872, etc.

> I'm still not sure that this makes a lot of sense. An MD5 content
> identifier would look something like this:
> 
>    cid:md5:PEFjWBDv/sd9alS9BYuX0w==@md5.w3.org
> 

You don't need "cid:md5:" at all as a prefix if you're using
"Content Identifier" in the standard meaning, as what occurs
in an email message or HTTP "Content-ID" header.

> We would probably have to escape the / and + character in the
> base 64 encoding.

No, there's no need to escape either one of those, either in
Content-ID headers, or in URLs which are constructed from them
using the method outlined in RFC 2111.

> I don't see why this is better than:
> 
>    urn:md5:PEFjWBDv/sd9alS9BYuX0w==

There are two separate roles for protocol elements in the protocols
we are designing. One role is for an "identifier": a short string
used to represent some other content, where the identifier is guaranteed
by the origin to be unique. The second is for a "verifier": a short
string used to verify that some content actually matches what the
identifier is intended to identify. In most Internet protocols, these
roles are separated: the former is satisfied by "Content-ID" and the
latter by things such as "Content-MD5". In the protocol you're
designing,
you want to use the same string for both "identifier" and "verifier",
but -- at the same time -- also want to have multiple verifiers (e.g.,
both md5 and sha).

Rather than invent an entirely new kind of identifier, it is better,
I think, to just extend the existing identification mechanism
("content-id")
to have a subset of identifiers which can also act as verifiers.
(Content-IDs from a well known domain, where the LHS is the MD5).

In other situations, you might want to keep the role of identifier and
verifier separate, not even have a verifier, require that the verifier
be signed, allow some other kind of matching other than byte-for-byte
equivalence, etc.

> What does the "@md5.w3.org" add?

The ability to use DNS registration rather than IANA registration for
the authority of uniqueness and the space of mapping from identifier
to verifier.

> Have fun,
> 
>         Arthur van Hoff

I'm trying.

Larry
-- 
http://www.parc.xerox.com/masinter

Received on Monday, 22 September 1997 17:02:16 UTC