Re: Recent ERB Work on URL addressing from Martin Bryan on 1997-03-17 (w3c-sgml-wg@w3.org from March 1997)

From: Martin Bryan <mtbryan@sgml.u-net.com>
Date: Mon, 17 Mar 1997 10:56:45 +0000
To: W3C-SGML-WG@w3.org
Message-Id: <1.5.4.32.19970317105645.006bd330@mail.u-net.com>

At 08:48 14/3/97 -0800, Tim Bray wrote:

>This leads to the #1 problem that makes the ERB unhappy about the 
>pure-URL idea: internationalization.  We have gone to great lengths
>to allow the use of any sane Unicode encoding in data and markup -
>and yet the rules as to what can be in a URL are very restrictive;
>URL-encoded UTF-8 is going to be massively non-human readable.  On
>the other hand, it may be the case that browsers de facto do the
>right thing with the part after the '#' - for sure this doesn't get
>sent out over the network, so why can't it be internationalized.  So
>maybe we could just assert that the URL-encoding is not required 
>after the '#'.  We have some action items to check out what the specs
>say, what people think they mean, and what the de facto behavior of
>popular browsers is.

I wonder what constitutes sane (or insane) Unicode encoding :-)

Aparently at last week's Unicode conference in Mainz it was agreed that URLs
would be allowed to contain any 10646 character providing that those outside
the first 95 characters were replaced by %number representations. (I also
heard that the %number concept would be extended to allow hexadecimal
numbers, but I find this somewhat hard to believe without the invention of a
number delimiter code or a fixed length requirement for both numbers and hex
values.)

Presumably the same rules will apply to fragment names as to domain and file
names.
----
Martin Bryan, The SGML Centre, Churchdown, Glos. GL3 2PU, UK 
Phone/Fax: +44 1452 714029   WWW home page: http://www.sgml.u-net.com/

Received on Monday, 17 March 1997 05:58:56 UTC