- From: Martin Duerst <duerst@w3.org>
- Date: Mon, 17 Feb 2003 18:31:40 -0500
- To: "Williams, Stuart" <skw@hplb.hpl.hp.com>, "'Tim Bray'" <tbray@textuality.com>
- Cc: WWW-Tag <www-tag@w3.org>
At 17:38 03/01/06 +0000, Williams, Stuart wrote: >2) On the topic of %-escape encoding, which I continue to find confusing >despite the opening sentence in RFC 2396 section 2.1. > >RFC 2396 appears to delgate the 'URI Character -> octet' mapping to the URI >scheme definition. The 4th Paragraph of Sec 2.1 begins: > > "A URI scheme may define a mapping from URI > characters to octets; whether this is done > depends on the scheme." I have recently bumped into this text, too. I have asked for clarification on the URI list: http://lists.w3.org/Archives/Public/uri/2003Jan/0025.html In particular, I wrote: >>>> As far as I understand, %hh is always usable, and I don't know about any schemes that define explicitly that this can be used. It may have been that this paragraph was written to take into account schemes such as data:, where an additional mechanism for encoding octets (base64) is used. My understanding is that even in a data: URI, I should still be able to replace "A" by "%41", and it should still resolve to the same data. >>>> I would really like to see an example where escape differences of non-reserved characters return different results (as opposed to compare differently). I'm not aware of any. Unless some major case turns up, I think it would be very beneficial if the TAG would nail down the principle that for purposes of resolution/retrieval, 'a' and '%61', and so on, have to return the same thing. This would definitely also be very helpful for IRIs. >Then, regarding the second mapping RFC 2396 speaks of 'octets -> original >characters': "A charset defines this mapping." RFC2396 states "However, >there is currently no provision within the generic URI syntax to accomplish >this identification." It then offers possible options including delegation >of charset default and/or selection mechanism to URI scheme definition. > >The URI Scheme registration template RFC2717 includes a field for "character >encoding consideration". However, on a quick scan of the scheme >registrations referenced from http://www.iana.org/assignments/uri-schemes I >couldn't find any that offered any "character encoding consideration" :-) RFC 2192 does (look for 9. Multinational Considerations). It is probably not the only one. >However, I think that there is an upside. Even if the first URI character -> >octet mapping is scheme dependent, I think that one can be confident that >for all %xx, for http://example.com/%xx and http://example.com/%xx, the >octet sequences arising from the first mapping will be identical because the >same scheme is in use. It's less clear that the second mapping, the charset >which maps octets to original characters, is going to be the same in all >contexts (like some of the forms examples)... however, in a given context... >http://example.com/%xx will be equivalent to itself (surely!). Yes indeed. For the equivalences discussed in Tim Bray's document, this as you call it 'second mapping' is irrelevant. Regards, Martin.
Received on Monday, 17 February 2003 19:55:34 UTC