Re: Special characters in URIs

Larry Masinter (masinter@parc.xerox.com)
Fri, 28 May 1999 15:17:16 PDT


From: "Larry Masinter" <masinter@parc.xerox.com>
To: "Dan Connolly" <connolly@w3.org>
Cc: "Martin J. Duerst" <duerst@w3.org>, <ietf-url@imc.org>, <uri@Bunyip.Com>
Date: Fri, 28 May 1999 15:17:16 PDT
Message-ID: <002601bea957$d82e1460$79d3000d@copper.parc.xerox.com>
In-Reply-To: <374DCA37.1EDF77BE@w3.org>
Subject: RE: Special characters in URIs

(I'm hoping that uri@bunyip.com will migrate to uri@w3.org,
although I've not gotten an acknowledgement. I suppose people
should look for news at http://www.ics.uci.edu/pub/ietf/uri )

> It "works" in the case that, for example, a user copies
> a filename from a desktop filebrowser into an XML document
> 	href="xyz__"
> where __ is some non-URL character.

This works for me if you say that what's in the XML document
attribute isn't really a "URI" but rather something else.
For example, we could use the "IURI" draft to define what
appears in XML, and note that in order to turn it into a URI,
it needs to be escaped. I don't have a problem with that.

> Meanwhile, the HTTP server, when it exports the xyz__ file,
> uses the same convention: UTF-8 encoding, %XX escaped.
> 
> That doesn't mean the HTTP server should grab xyz%XX%XX off
> the tcp socket and unescape it; it means the HTTP server
> should (do something equivalent to) enumerate each file
> in the directory and escape it, and compare the resultin URI path
> to xyz%XX%XX.

Right.

> It's a bit of a kludge; the cleaner thing to do would
> be to say "don't put things other than URIs in those
> XML attribute values." But we haven't had any luck doing that.
> And this "kludge" just so happens to be consistent with
> the existing specs (though subtly) and consistent with
> a fair amount of acutal practice (or at least so I
> gather from Martin; I haven't seen the evidence 1st hand).

This works for me too, I'd just like to get this into the
specs.
 
> And it provides a global convention for interoperability
> between HTTP servers exporting filesystems that use
> iso-latin-1 to encode filenames and those that
> export filesystems that use shift-jis or UCS-2.

I'm not sure how that works (the shift-jis part), and I wonder
if this deserves a fuller explanation.

The URL internationalizationd raft has been sitting around
for a long time; maybe it's time to move it forward now?

> Dan Connolly, W3C
> http://www.w3.org/People/Connolly/
>