- From: Dan Connolly <connolly@w3.org>
- Date: Thu, 27 May 1999 17:41:59 -0500
- To: Larry Masinter <masinter@parc.xerox.com>
- CC: "Martin J. Duerst" <duerst@w3.org>, ietf-url@imc.org, uri@Bunyip.Com
Larry Masinter wrote: > > URL character escaping normally should only be done at the > time the URL is constructed from its component pieces, and > normally should only be undone (unescaped) when the URL > is decomposed into its internal pieces. True. > Your description > of the process of either applying or removing %XX escaping > seems to be based on having the escapes applied or removed > when the URL is removed from or embedded in some context > such as XML. only when it's removed > In general, you cannot change an arbitrary > %XX into the character the XX byte sequence represents in > ASCII without some risk of changing the meaning of the URL, true. > and so you should not recommend this process at all. The excerpt below doesn't mention unescaping. Only how to take an XML attribute value and turn it into a URL in the case that it's not already a URL (because it has non-URL characters). It's probably worth warning folks that the inverse operation is not licensed, but that doesn't mean the operation itself is a problem. > > Larry > -- > http://www.parc.xerox.com/masinter > > > The second case basically works by saying that if in these formats > > (e.g. HTML), an URI contains a non-ASCII character, this character > > is converted to a byte sequence using UTF-8 and then %-encoded to > > produce a legal URI. > > I think "works" is ambitious. It "works" because most > HTTP servers are forgiving about this kind of transliteration > and most URLs are HTTP. It "works" in the case that, for example, a user copies a filename from a desktop filebrowser into an XML document href="xyz__" where __ is some non-URL character. Meanwhile, the HTTP server, when it exports the xyz__ file, uses the same convention: UTF-8 encoding, %XX escaped. That doesn't mean the HTTP server should grab xyz%XX%XX off the tcp socket and unescape it; it means the HTTP server should (do something equivalent to) enumerate each file in the directory and escape it, and compare the resultin URI path to xyz%XX%XX. It's a bit of a kludge; the cleaner thing to do would be to say "don't put things other than URIs in those XML attribute values." But we haven't had any luck doing that. And this "kludge" just so happens to be consistent with the existing specs (though subtly) and consistent with a fair amount of acutal practice (or at least so I gather from Martin; I haven't seen the evidence 1st hand). And it provides a global convention for interoperability between HTTP servers exporting filesystems that use iso-latin-1 to encode filenames and those that export filesystems that use shift-jis or UCS-2. -- Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Thursday, 27 May 1999 18:45:38 UTC