Re: Using unicode or MBCS characters in forms

> From:          Larry Masinter <masinter@parc.xerox.com>
> Date:          Fri, 21 Jun 1996 22:34:27 PDT
>
> Now, again, you might say that some people could type in one way and
> other people could type in another way, and that's fine, but now the
> URLs are no longer UNIFORM: some people see one URL and other people
> see another URL. That's OK, too, but you must be explicit about that,
> that you're defining Non-Uniform Resource Locators.

This purported uniformity requirement bogged me, so I did a little 
search of the RFCs, looking for a precise expression somewhere.  I 
didn't find any, but I did find that RFC 1738, by Berners-Lee, 
Masinter  & McCahill, does allow for Non-Uniform RLs by Larry's 
current definition of "uniform".  This Proposed Standard states that 
any character in the scheme-specific part of an URL, except for 
reserved characters, may be encoded using the well known %XX 
notation.  This means that for instance:

  http://some.dom/~Larry

and

  http://some.dom/~%4Barry

are one and the same URL by current standards and practice.  This has 
not broken the WWW yet, so I don't think that allowing it for 
non-ASCII characters would break it.  After all,

  http://some.other.dom/~Franc,ois    [where c, stands for c-cedilla]
  http://some.other.dom/~Fran%E7ois

isn't any more non-uniform than the above.

Having scanned a few RFCs related to URIs, URLs, etc.  my feeling is 
that the uniformity requirement is that UR* be uniform enough to be 
parsable without external information, once one has a UR* at hand.  
This is embodied by the <scheme>://<path>... structure, which is not 
challenged by i18n.

The kind of extreme uniformity called for by Larry does not exist and 
is not required for proper interoperability, so I hope this puts this 
bogus argument against i18n to rest.
-- 
Francois Yergeau <yergeau@alis.com>
Alis Technologies Inc., Montreal
Tel : +1 (514) 747-2547
Fax : +1 (514) 747-2561

Received on Sunday, 23 June 1996 22:07:17 UTC