Re: Special characters in URIs

Larry Masinter (
Fri, 28 May 1999 15:17:16 PDT

From: "Larry Masinter" <>
To: "Dan Connolly" <>
Cc: "Martin J. Duerst" <>, <>, <uri@Bunyip.Com>
Date: Fri, 28 May 1999 15:17:16 PDT
Message-ID: <002601bea957$d82e1460$>
In-Reply-To: <>
Subject: RE: Special characters in URIs

(I'm hoping that will migrate to,
although I've not gotten an acknowledgement. I suppose people
should look for news at )

> It "works" in the case that, for example, a user copies
> a filename from a desktop filebrowser into an XML document
> 	href="xyz__"
> where __ is some non-URL character.

This works for me if you say that what's in the XML document
attribute isn't really a "URI" but rather something else.
For example, we could use the "IURI" draft to define what
appears in XML, and note that in order to turn it into a URI,
it needs to be escaped. I don't have a problem with that.

> Meanwhile, the HTTP server, when it exports the xyz__ file,
> uses the same convention: UTF-8 encoding, %XX escaped.
> That doesn't mean the HTTP server should grab xyz%XX%XX off
> the tcp socket and unescape it; it means the HTTP server
> should (do something equivalent to) enumerate each file
> in the directory and escape it, and compare the resultin URI path
> to xyz%XX%XX.


> It's a bit of a kludge; the cleaner thing to do would
> be to say "don't put things other than URIs in those
> XML attribute values." But we haven't had any luck doing that.
> And this "kludge" just so happens to be consistent with
> the existing specs (though subtly) and consistent with
> a fair amount of acutal practice (or at least so I
> gather from Martin; I haven't seen the evidence 1st hand).

This works for me too, I'd just like to get this into the
> And it provides a global convention for interoperability
> between HTTP servers exporting filesystems that use
> iso-latin-1 to encode filenames and those that
> export filesystems that use shift-jis or UCS-2.

I'm not sure how that works (the shift-jis part), and I wonder
if this deserves a fuller explanation.

The URL internationalizationd raft has been sitting around
for a long time; maybe it's time to move it forward now?

> Dan Connolly, W3C