- From: Graham Klyne <GK@ninebynine.org>
- Date: Mon, 25 Nov 2002 17:15:08 +0000
- To: Paul Hoffman / IMC <phoffman@imc.org>
- Cc: "Clive D.W. Feather" <clive@demon.net>, discuss@apps.ietf.org, uri@w3.org
Responding to [4] (http://lists.w3.org/Archives/Public/uri/2002Nov/0024.html): At 08:49 AM 11/25/02 -0800, Paul Hoffman / IMC wrote: >3) Which leads to the biggest question: where are the real-world uses for >this? Which actual content providers have said "I don't want to reveal the >real URL for the thing you are comparing"? If those folks exist, they >could easily answer the first two objections. Quite frankly, this seems >like a cute and well thought-out idea searching for a problem. Hi Paul, There is one possible use-case of which I'm aware. A group of folks are experimenting with RDF metadata in a project called RDFWeb [1]. This is currently a research community project that has attracted some wider interest (e.g. [2]). One of the concerns is the use of email addresses in the form of mailto: URLs, used as one way of identifying people, being too much like easy targets for spammers. The idea of using hashed URIs has been surfaced [3] to maintain their identifying properties but protecting the corresponding email address. I generally agree with the sentiment of the other points you make. SHA-1 seems like a good choice for the hashing function. ... Further comments, concerning [5]: - this registers a new URI scheme, which IIRC requires a standards-track RFC. - what does this URI actually identify? The same thing as the hashed URI identifies, or the hashed URI itself. I think it's the latter, so this could reasonably be presented as a URN namespace [7]. This would call for an RFC, but not necessarily standards track. - I think some attempt at canonicalization may be useful (e.g. per section 4.3), but I think that appendix A attempts too much, and may result in some unreasonable failures (especially changing the file extensions). When using hashed URIs, I think one should presume that the URI has been obtained directly or indirectly from a definitive source before hashing. In the use-case of which I'm aware, it is reasonable to assume that the URI has been obtained authoritatively or common variations may be stored. Further, an application is not prevented from trying some simple heuristics to obtain a match. - I'm intrigued what purpose the +frag, +query flags serve. Also, in including fragment identifiers as part of the hashed URI, users should be aware that some uses of URIs require that any fragment be stripped before the URI is used (e.g. for retrieving a web resource; see [6]). I guess similar considerations apply to the query part with relation to the resource on the target server. So maybe I do see some point, if only as a diagnostic for mismatches. - I suggest that the case of alpha values in identifiers be normalized (i.e. always a-z or A-Z), or the URI be declared case sensitive, to simplify hash value comparison. (Er, that is, if identifiers are really required. Per Paul's suggestion, this might be deleted. At least, start out with just one.) #g -- [1] http://www.rdfweb.org/ [2] http://www-106.ibm.com/developerworks/xml/library/x-foaf.html [3] http://www.w3.org/2001/12/rubyrdf/util/foafwhite/intro.html [4] http://lists.w3.org/Archives/Public/uri/2002Nov/0024.html [5] http://www.ietf.org/internet-drafts/draft-feather-hashed-uri-03.txt [6] http://www.w3.org/DesignIssues/Fragment.html [7] http://www.ietf.org/rfc/rfc2611.txt ------------------- Graham Klyne <GK@NineByNine.org>
Received on Monday, 25 November 2002 13:16:47 UTC