Date: Tue, 15 Apr 1997 17:12:09 -0700 (PDT) From: Chris Newman <Chris.Newman@innosoft.com> Subject: Re: revised "generic syntax" internet draft In-Reply-To: <firstname.lastname@example.org> To: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU> Cc: IETF URI list <email@example.com> Message-Id: <Pine.SOL.3.95.970415164238.22015Vfirstname.lastname@example.org> On Tue, 15 Apr 1997, Roy T. Fielding wrote: > >(3) whatever localized character set is in use > > > >(3) Never works, because it doesn't interoperate. It results in a bunch > >of islands which can't communicate, except via US-ASCII. > > But that is what Martin said he wanted -- the ability of an author to > decide what readership is most important. Why is it that it is okay > to localize the address, but not to localize the charset? I can't speak for Martin. But if I understand what you're saying, my response is that people want to use their own language in URLs and will do so whatever the standard says. If we define a standard way for them to include their national characters in such a way that those characters won't be misinterpreted by the recipient, then we've achived interoperability. That's the goal of protocol design. > >(5) Works fine, and has potential to be easier to support than (4). > > Excuse me, but it doesn't work at all unless all systems use the same > charset for encoding URLs. Since that is not the case today, we would > have to scrap all existing servers and browsers in order for (5) to work. > In other words, it is not an acceptable solution to those of use who > have to implement the specified protocol. I don't think any of the programs which display URLs try to interpret hex encoded %80 - %FF. So no URL display programs will break. Now if there's a URL entry program which permits non-ASCII characters and maps them to %80 - %FF using local conventions, that program will break. But that program is also already in violation of the current specification (which restricts URLs to US-ASCII). Therefore the only software which is forced to upgrade by this change is software which already violates the standard. If anything, that's an argument to make this change. So the transition plan is simple: (A) URL entry programs (which currently are restricted to US-ASCII by the specification) are upgraded so they map non-ASCII characters to hex encoded UTF-8. (B) URL display programs are upgraded so they map hex encoded UTF-8 to the correct display characters. (C) URL display programs which aren't upgraded just show hex encoded UTF-8, as they do today. > (3) does move toward (5). It even becomes (5) when people are using UTF-8. (4) can move towards (5), but (3) can't. With unlabelled character sets you just get interoperability problems. Look at it this way: if fred and sam are using localized character set thingbats, and fred tries to transition to UTF-8, all of a sudden fred and sam are completely unable to communicate and see garbage at the other end. A transition is only achievable if the character set is labelled. Any time a spec either implicity or explicitly says X is implementation defined, it is promoting a non-interoperable solution. The URL spec currently leaves the interpretation of %80 - %FF as implementation defined.