Message-ID: <3367BA32.firstname.lastname@example.org> Date: Wed, 30 Apr 1997 14:31:30 PDT From: Larry Masinter <email@example.com> To: Francois Yergeau <firstname.lastname@example.org> CC: email@example.com Subject: Re: Using UTF-8 for non-ASCII Characters in URLs Francois, I suggested: ><A HREF="this-is-the-URL">this-is-what-the-user-sees</A> > >The URL in the 'this-is-the-URL' part should use hex-encoded-UTF8, >no matter what the user sees. and you responded: "That would break with current practice. Please see <http://www.alis.com/~yergeau/url-00.html>, section 4 for a discussion of this issue." However, I'm not aware of any current practice that does what section 4 suggests, namely: "This shows the path to be followed with non-ASCII URLs embedded in a text file: simply encode the characters of the URL in the same way as the other characters of the document, i.e. using the CCS of the document. If a character in the URL is not part of the repertoire of this CCS, use URL-encoding of the UTF-8 representation to preserve that character's identity." You would require a different transcoding mechanism for the URL and for the rest of the document. Normally, transcoding a Unicode document in HTML into ISO-8859-1 requires converting characters outside of 0-255 into numeric character references; however, you are suggesting turning URLs into hex-encoded UTF-8 instead. Right? Could you clarify what current practice would "break"?