Re: Hashed URIs from Paul Hoffman / IMC on 2002-11-26 (uri@w3.org from November 2002)

From: Paul Hoffman / IMC <phoffman@imc.org>
Date: Tue, 26 Nov 2002 15:12:55 -0800
To: "Clive D.W. Feather" <clive@demon.net>
Cc: discuss@apps.ietf.org, uri@w3.org
Message-Id: <p05200f2eba09ae71e116@[165.227.249.18]>

At 6:43 PM +0000 11/26/02, Clive D.W. Feather wrote:
>Paul Hoffman / IMC said:
>>>     draft-feather-hashed-uri-03.txt
>
>>  1) There is absolutely no need to have two different hash algorithms.
>>  For this purpose, both are equivalent. Pick one.
>
>I understood that having the ability to "drop-in" an algorithm was a good
>idea. That way, if there turns out to be a problem with SHA-1 then people
>can switch to alternatives without needing to redesign the entire scheme.

For a security protocol, some people think it is a good idea to have 
a fall-back; many others don't because of lack of interoperability 
and confusion. Regardless of that, what you are describing is not a 
security protocol.

>I've made use of SHA-1 a SHOULD in the next draft.

Sorry, I believe that is still not good enough. Pick one.

>  > 2) Appendix A will lead to lack of interoperability. You haven't
>>  shown why canonicalizing (that is, changing) the pre-hashed URI has
>>  any advantage.
>
>The real-world problem here is that many URIs actual lead to the same
>resource.

So? They are still different URIs. If that is your reasoning, you 
should be hashing what is returned from the URI, not the URI itself.

>  For example, domain names in URLs are usually case-insensitive.
>Therefore the need to canonicalise the URL.

Appendix A has examples of mapping ".html" to ".htm". That is *very* 
different than talking about case-insensitive host name parts. It is 
perfectly believable that <http://example.com/a.htm> will lead to a 
different object than <http://example.com/a.html>.

>However, it isn't a well-defined situation. Hence two variations - one that
>tends towards false positives and one that tends towards false negatives.
>Which is suitable depends on the context in which you're using this.

This simply re-affirms what I said before about lack of interoperability.

>  > 3) Which leads to the biggest question: where are the real-world uses
>>  for this? Which actual content providers have said "I don't want to
>>  reveal the real URL for the thing you are comparing"? If those folks
>>  exist, they could easily answer the first two objections. Quite
>>  frankly, this seems like a cute and well thought-out idea searching
>>  for a problem.
>
>This work was originally done for ICRA <http://www.icra.org>. The ICRAsafe
>filtering software available from them - among other things - blocks access
>to URLs that are on "black lists". Organisations want to be able to publish
>such lists without letting people use them to find "unsuitable" (in the
>opinion of the person producing the list) material.

This has nothing to do with using them on the Internet, which is what 
we standardize in the IETF. You are making these only for 
publication, not for end-user use. This doesn't sound compelling to 
me.

At this point, I would have to say to the AD's original request: this 
probably should not be published as a standard or as an Informational 
RFC.

--Paul Hoffman, Director
--Internet Mail Consortium

Received on Tuesday, 26 November 2002 19:30:10 UTC