- From: <lee@sq.com>
- Date: Fri, 14 Mar 97 21:51:16 EST
- To: W3C-SGML-WG@w3.org
Quoth Tim: > This leads to the #1 problem that makes the ERB unhappy about the > pure-URL idea: internationalization. [...] This is an issue for URLs in general. When the HTML/HTTP/URI groups agree on somethhing, we can follow it in a future XML. > URL-encoded UTF-8 is going to be massively non-human readable. On > the other hand, it may be the case that browsers de facto do the > right thing with the part after the '#' - for sure this doesn't get > sent out over the network, so why can't it be internationalized. The browser is responsible for URL-encoding strings passed over HTTP as part of GET requests. Some old and broken HTML browsers did't URL-encode things they found in HTML, or tha the user entered. I note that Netsape 4 does do automatic URL enoding, and I think 3 does also. MSIE quotes spaces but allows 8-bit characters to pass through, but turns \ into / everywhere (!) It would be a good idea to mention in the XML link spec that URLS must be url-encoded by software that uses them in contexts that require such encoding, and that they may be expected to undo/revese such encodings before displaying the results to users. > Summary: the ERB is leaning *very* strongly to asserting that all > locators are to be URLs - and will almost certainly go this way, if > we aren't thereby throwing away our nice clean international > interoperability. Good! Note that this way XML will be no less international than the WWW... where there are strong incentives for i18n. Perhaps someone (Misha? You there?) could comment on RFC 2070 and that working group's discussion on URLs. Lee
Received on Friday, 14 March 1997 21:51:13 UTC