Re: html, http, urls and internationalisation

Larry Masinter writes:

A number of things, which I can agree to, including that URLs are
described in (abstract) characters, independent of encoding.
Then he writes further comments to my initial mail:

> > I would propose that URLs be written in the charset of the 
> > document that references the url,
> This is exactly the situation. URLs are sequences of characters, can
> be written in newspapers or on business cards (which, not being
> computer encodings, don't have a 'charset'). For those situations
> where URLs are embedded in other documents, that embedding should use
> the charset of the containing document. The repertoire of characters
> allowed within URLs is intentionally restricted to allow such
> embedding in almost all contexts.
> >				possibly enhanced with
> > the extensions that we make to get further characters, 
> > for example &a-ring; or &#xxxx; 
> this is the part that's impossible. You might imagine doing such a
> thing, but it doesn't work if you then try to use URLs for the purpose
> for which they are functional.
> Some folks want to deal with the variability of how particular
> implementations of HTTP or FTP might use sequences of octets to
> represent characters, and, in particular, the characters that appear
> before the local user behind the HTTP or FTP server. So, if you have a
> FTP or HTTP server that serves out files in your file server, and your
> file server uses Big5 or Unicode for the representation of file names,
> you have to choose an encoding of Big5 or Unicode as octets in order
> to deal with the FTP or HTTP protocols. It would be useful to
> standardize that encoding, because there are new HTTP implementations
> being delivered all the time, and even new FTP implementations.

I do not see that I need to have the same encoding as the server,
iff the server had adequate charset translation software applicable.
This could be a requirement if we allowed  extended charsets beyond
ASCII in URLs. And it is nicer than requiring URLs always to
be written in some UCS encoding, say UTF-8.


Received on Sunday, 28 January 1996 14:02:05 UTC