Re: I18N Concensus - Generic Syntax Document

Dan Oscarsson (Dan.Oscarsson@trab.se)
Fri, 7 Mar 1997 16:28:54 +0100 (MET)


Date: Fri, 7 Mar 1997 16:28:54 +0100 (MET)
From: Dan Oscarsson <Dan.Oscarsson@trab.se>
Message-Id: <199703071528.QAA02041@valinor.malmo.trab.se>
To: mduerst@ifi.unizh.ch, fielding@kiwi.ICS.UCI.EDU
Subject: Re: I18N Concensus - Generic Syntax Document
Cc: uri@bunyip.com

> >+ It is recommended that UTF-8 [RFC 2044] be used to represent characters
> >+ with octets in URLs, wherever possible.
> >
> >+ For schemes where no single character->octet encoding is specified,
> >+ a gradual transition to UTF-8 can be made by servers make resources
> >+ available with UTF-8 names on their own, on a per-server or a
> >+ per-resource basis. Schemes and mechanisms that use a well-
> >+ defined character->octet encoding which is however not UTF-8 should
> >+ define the mapping between this encoding and UTF-8, because generic
> >+ URL software is unlikely to be aware of and to be able to handle
> >+ such specific conventions.
> 
> Here is where you lose me.  I have no desire to add a UTF-8 character
> mapping table to our server.  An HTTP server doesn't need one -- its URLs are
> either composed by computation (in which case knowing the charset is not
> possible) or by derivation from the filesystem (in which case it will use
> whatever charset the filesystem uses, and in any case has no way of
> determining whether or not that charset is UTF-8).  The server doesn't care
> and should not care.  It is therefore inappropriate to suggest that it should
> add such a table when doing so would only bloat the server and slow-down
> the URL<->resource mapping process.
> 

Well, you could say that I have no desire to add UTF-8 mapping to
my server either, BUT I can see that as the only good one.
My filesystem uses ISO 8859-1, URLs derived from the filesystem contains
ISO 8859-1, ISO 8859-1 based URLs sent through HTTP is fine (no 7-bit
encodings), my html-documents contain URLs referencing other documents
in my filesystem, and of cource they too contain ISO 8859-1.
So, easiest for me would be to just use ISO 8859-1! No change in my server!
But, as I understand that not everybody is satisfied with ISO 8859-1 and
use an other character set on their filesystem, I will accept to do
the extra work of adding ISO 8859-1 to UTF-8 mapping in my HTTP server,
so that we all can communicate and talk the same "language".
By the way, if URLs are composed by computation or from the filesystem,
the HTTP server will have to be configured to which character set
is used locally, so it knows from/to what it will map UTF-8.

Roy, the server doesn't need to care about characters, IF it were the case
that URLs only represented ascii characters. If they do I cannot use
the webb to present my documents on. Then we could as well define a
URL to contain ONLY digits and have the HTTP servers map the number
to a file in the filesystem.
Though, on your server Roy, you do not have to add UTF-8 mapping, if
you only use ascii as in that case you will never need to represent
non ascii in the URL.


If we should not go for UTF-8, tell me who to make my URLs work?
My filesystem contains ISO 8859-1 encoded filenames.
My html document are encoded using ISO 8859-1. No sane person sits
and lookups the HEX encoding for each non ascii character in a filename
and encodes them with %-encodings in a html file, where everything else
uses ISO 8859-1. It is even difficult to explain for the normal used
what ascii is and why not our normal character cannot be used in URLs.
It we use ascii only characters in our filenames they lose their
meaning, so that is not a solution. And if I send a mail to somebody
in Germany where they can understand our non ascii characters, with
a URL in, I want to use our non ascii characters in the URL as it is
easier to remember than (%-encoding are very difficult to remember).
If we used UTF-8 for transport, the person in Germany could follow'
my URL even if they do not use ISO 8859-1 at his site.


    Dan