Re: Recent ERB Work on URL addressing

Quoth Tim:

> This leads to the #1 problem that makes the ERB unhappy about the 
> pure-URL idea: internationalization.  [...]
This is an issue for URLs in general.  When the HTML/HTTP/URI groups
agree on somethhing, we can follow it in a future XML.

> URL-encoded UTF-8 is going to be massively non-human readable.  On
> the other hand, it may be the case that browsers de facto do the
> right thing with the part after the '#' - for sure this doesn't get
> sent out over the network, so why can't it be internationalized.

The browser is responsible for URL-encoding strings passed over HTTP
as part of GET requests.

Some old and broken HTML browsers did't URL-encode things they found
in HTML, or tha the user entered.  I note that Netsape 4 does do automatic
URL enoding, and I think 3 does also.  MSIE quotes spaces but allows 8-bit
characters to pass through, but turns \ into / everywhere (!)

It would be a good idea to mention in the XML link spec that URLS must
be url-encoded by software that uses them in contexts that require such
encoding, and that they may be expected to undo/revese such encodings
before displaying the results to users.

> Summary: the ERB is leaning *very* strongly to asserting that all
> locators are to be URLs - and will almost certainly go this way, if
> we aren't thereby throwing away our nice clean international 
> interoperability.


Note that this way XML will be no less international than the WWW...
where there are strong incentives for i18n.

Perhaps someone (Misha?  You there?) could comment on RFC 2070 and 
that working group's discussion on URLs.