Message-ID: <3369AC9E.281F@parc.xerox.com> Date: Fri, 2 May 1997 01:58:06 PDT From: Larry Masinter <email@example.com> To: "Martin J. Duerst" <firstname.lastname@example.org> CC: URI mailing list <email@example.com> Subject: Re: Using UTF-8 for non-ASCII Characters in URLs This is a great start at dealing with the issues that would otherwise cause great confusion. Other issues: The bidi issues for RLT languages in conjunction with normal punctuation used in and around identifiers. (Will the identifiers present themselves 'correctly' without these characters in all cases?) Using UCS in identifiers that are normally "case insensitive" in ASCII, and the issues, e.g., similar upper-case forms, the role of accents and equivalence. I think "white space" or spacing characters in general need to be addressed. You need to decide whether you're doing canonicalization/normalization or just equivalence. Equivalence is probably easier to define, and less politically sensitive, even though not as useful.