Re: Using unicode or MBCS characters in forms
> From: Larry Masinter <firstname.lastname@example.org>
> Date: Fri, 21 Jun 1996 22:34:27 PDT
> Now, again, you might say that some people could type in one way and
> other people could type in another way, and that's fine, but now the
> URLs are no longer UNIFORM: some people see one URL and other people
> see another URL. That's OK, too, but you must be explicit about that,
> that you're defining Non-Uniform Resource Locators.
This purported uniformity requirement bogged me, so I did a little
search of the RFCs, looking for a precise expression somewhere. I
didn't find any, but I did find that RFC 1738, by Berners-Lee,
Masinter & McCahill, does allow for Non-Uniform RLs by Larry's
current definition of "uniform". This Proposed Standard states that
any character in the scheme-specific part of an URL, except for
reserved characters, may be encoded using the well known %XX
notation. This means that for instance:
are one and the same URL by current standards and practice. This has
not broken the WWW yet, so I don't think that allowing it for
non-ASCII characters would break it. After all,
http://some.other.dom/~Franc,ois [where c, stands for c-cedilla]
isn't any more non-uniform than the above.
Having scanned a few RFCs related to URIs, URLs, etc. my feeling is
that the uniformity requirement is that UR* be uniform enough to be
parsable without external information, once one has a UR* at hand.
This is embodied by the <scheme>://<path>... structure, which is not
challenged by i18n.
The kind of extreme uniformity called for by Larry does not exist and
is not required for proper interoperability, so I hope this puts this
bogus argument against i18n to rest.
Francois Yergeau <email@example.com>
Alis Technologies Inc., Montreal
Tel : +1 (514) 747-2547
Fax : +1 (514) 747-2561