- From: Martin J. Duerst <mduerst@ifi.unizh.ch>
- Date: Sat, 8 Mar 1997 16:02:58 +0100 (MET)
- To: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
- Cc: Rich Salz <rsalz@osf.org>, uri@bunyip.com
On Fri, 7 Mar 1997, Roy T. Fielding wrote: > >I don't know if you can just rule out filesystems just like that. > >I can imagine networked filesystems that span hosts that would have, > >or need to have, the locale stored at the mountpoint. > > I am sure it is possible on some file systems to determine the charset. > It just isn't possible on all of the file systems for which you can > use an Apache server, nor is it possible for us to distribute code > that maps from any possible filesystem charset into UTF-8 and back > again, Apache is a great server, and it is improved constantly by a large group of people. You are one of the main contributors. Apache also has an API and can be extended in many different ways. In an earlier version, Apache had no support for language or charset negociation; Dirk van Gulik has explained in a recent workshop how that works now. It is well possible that with the increasing use of UTF-8 in Accept-Charset by browsers, future versions of Apache might include some functionality or hooks for conversion of document character encodings; that would be a very valuable addition. In a later stage, similar functionality might be added for file names/URLs. Also, it is well possible that somebody doing a port to some specific file system will include some code. For example, for an NT port, it would be trivial to add Unicode<->UTF-8 conversion code. For a system running in Western Europe, such as the ones Dan has described, it would be even easier to write Latin-1 <-> UTF-8 code (no table needed!). For Unix systems that use Unicode as their wchar_t type, with support for the appropriate locale, it would also be possible to implement things easily by just using mbtowc and such. On other systems, users just might start to use a UTF-8 locale, avoiding any implementation problems. To summarize: While it is very clear that providing a single solution that works everywhere is very far away, there are many solutions that can work very quickly for a particular system or locale. What this shows most clearly is that the current approach of various locales creates deployment problems that will be reduced when a single character <-> octet conversion is used not only on the wire, but also locally. > nor is it desirable for us to build a server that does it in > the first place because, as I said in a message a while back, I don't > think it is a good idea for http URLs to contain (or be displayed) > as anything other than ASCII characters, regardless of the locale. If you put in a hook that allows somebody to plug in his/her own code and see how it works, that would be great. All the rest will come with time. As for the desirability, I have well read your earlier message. I have answered and explained in detail why for URLs used mainly locally, the overhead induced on 99.9...% of the users if they are forced to use ASCII only may not be worth the savings produced for the 0.0...% of accidental users that can't use the local script. If, in light of this and related arguments, you still think that URLs should contain (and be displayed!) in ASCII and only in ASCII, I (and certainly others on this list) look forward to read your arguments. Regards, Martin.
Received on Saturday, 8 March 1997 10:02:47 UTC