Date: Mon, 14 Apr 1997 21:09:16 +0200 (MET DST) From: "Martin J. Duerst" <email@example.com> To: John C Klensin <firstname.lastname@example.org> Cc: email@example.com, Francois Yergeau <firstname.lastname@example.org>, Subject: Re: UTF-8 URL for testing In-Reply-To: <SIMEON.9704121139.H@tp7.Jck.com> Message-Id: <Pine.SUN.3.96.970414204206.245I-100000@enoshima> On Sat, 12 Apr 1997, John C Klensin wrote: > > On Fri, 11 Apr 1997 16:29:57 -0700 (PDT) Larry Masinter > <email@example.com> wrote: > > > Just because a problem is important doesn't > > mean that we should recommend something that has not yet > > been demonstrated to actually solve the problem. > >... > > Dan and Francois, > > While I'm very anxious to see a real solution that > addresses the underlying issues here, I'm forced to agree > with Larry. We don't "make" things happen by standardize > untested ideas and arguments, however logical, that > things are easy to do don't move the discussion forward > much. Thanks for admitting that there is some logic behind what we have been proposing. > I don't think that timing of standards are much of > the issue here. It is just that we have a large installed > base and I'd prefer to see a demonstration that it works > well, that it won't cause significant problems with > existing (unmodified) clients, servers, or users, etc. I very much appreciate your concern. However, I have great difficulties to immaging what might actually go wrong. For example, as long as we stay with %HH, there can't possibly be anything going wrong, can it? And if it did, it wouldn't be UTF-8 that had to be blamed, but the implementation that didn't handle %HH correctly. If we start to remove %HHs and replace it with 8-bit octets, more things can go wrong. But they are exactly the same things that can happen now when this is done with a legacy encoding. They are mainly related to the fact that transcoding conserves character identity, whereas URLs assume octet identity. The recommendation for UTF-8 will finally remove these problems, but in a transition period, they will show up more strongly. The above applies as long as we don't have a look at the exact characters encoded. If we do this, we get problems similar to the 0O0O0O problems with ASCII. Again nothing really new. When asked for implementations, I immediately made two URLs with UTF-8 encoded characters. Francois made a few more and included them in a web page. They are here for anybody to test. We have tested the browsers we have around. When asked to write some software to convert URLs to UTF-8, Francois also wrote such software. Everybody can use it and test it. If you have any ideas of what else would have to be tested, and how, please tell the list. Everybody knows that it is hard to test one's own software or ideas. It's much easier for other people to spot problems. Many thanks for your help, Martin.