- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Wed, 14 May 1997 18:12:23 +0200 (MET DST)
- To: Koen Holtman <koen@win.tue.nl>
- Cc: Larry Masinter <masinter@parc.xerox.com>, http-wg@cuckoo.hpl.hp.com
On Mon, 12 May 1997, Koen Holtman wrote: > Larry Masinter: > > > >I suggest avoiding any of the complexities of URI comparison > >and make feature tag comparison be exact, octet-by-octet. > > Yes, that would simplify things without loosing any of the power. > I'll put it in the next version. I do assume that with > octet-by-octed, you mean after interpretation of % escapes. I think > it is a good idea to allow these, especially in the tag values. > > >Given the enormous flamage around UTF8-URLs, I think > >you might be in trouble unless you specify very carefully > >exactly which subset of URIs you're actually going to allow. > > I'm taking the PEP approach of allowing *any* URI. Do you expect this > to cause flamage? I don't expect this to cause flamage, but it can very well be expected that a *limitation* of values will create flamage. This probably won't happen soon, because it usually takes some time for people to use new technology in localized contexts. Then after they use it localized, it takes some more time for people to realize that the various localizing solutions don't really fit together well. Thinking ahead will pay off! In terms of the problem with the %-escapes, this is indeed a problem of generic URI comparison. But discussion in the URN group has shown that it is not related to internationalization. Non-ASCII characters in URIs use the upper half of the 8-bit range, and for this range, %HH is a pure transfer encoding. If headers are strictly 7-bit, then it will always be %HH for these cases (then you only have to worry about %ab vs. %AB,...). If headers can contain 8-bit (the warnings already can, and this is the direction I think we should move to), then we can specify 8-bit always for transfer, and don't need %HH for these cases, and can indeed compare on octets. If we allow both (that's what usually happens in practice, even if it's not in the spec) then we can just normalize on one or the other. The problem with %HH is the syntactically significant ASCII characters, for which %HH is a true escaping rather than a transport mechanism. This doesn't affect internationalization. And of course, in accordance with the URN syntax, the URL process draft, and other work in this area, the mapping/encoding from characters to octets should be UTF-8. Regards, Martin.
Received on Wednesday, 14 May 1997 09:16:51 UTC