- From: Randy Bush <randy@psg.com>
- Date: Thu, 02 Mar 2000 09:24:24 -0800
- To: a.irvine@bfs.phone.com
- Cc: Larry Masinter <LM@att.com>, Benedict Wee Tee Wei <benewee@ida.gov.sg>, "Rogers, Paul" <progers@vignette.com>, uri@w3.org, idn@ops.ietf.org, duerst@w3.org
only list subscribers may post to the list randy > Message-ID: <38BE79DC.84AE1247@corp.phone.com> > X-Mailer: Mozilla 4.5 [en] (Win95; I) > X-Accept-Language: en-GB,fr,eo > MIME-Version: 1.0 > References: <NDBBKEBDLFENBJCGFOIJMEICCEAA.LM@att.com> > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 7bit > Date: Thu, 02 Mar 2000 14:25:33 +0000 > From: Aaron Irvine <airvine@corp.phone.com> > Reply-To: a.irvine@bfs.phone.com > To: Larry Masinter <LM@att.com> > CC: Benedict Wee Tee Wei <benewee@ida.gov.sg>, > "Rogers, Paul" <progers@vignette.com>, uri@w3.org, > idn@ops.ietf.org, duerst@w3.org > Subject: Re: IURI questions > > > > > * hex-encoded characters in URLs. I just tried surfing to > > > > www.%79%61%68%6f%6f.com, and on IE5, it takes me to www.yahoo.com, but > > > > Netscape Navigator 4.6 can't find the server. > > > > It's interesting that it works! The question is whether it should. > > > > Larry > > -- > > http://larry.masinter.net > > Hi all, > > Yes I believe it should work. > > I think: > that human visible (typing into browsers, adverts on radio, etc.maybe in hrefs > too) escaped Unicode should be consistent with URI path escaped Unicode (i.e. > %hh escaped utf8), > and that URI-authorities like www.%79%61%68%6f%6f.com [works in IE5] and > schemes like k%C3%A1va [RFC2324] are IMHO the correct way to _present URI's_ > to end users > however within the net we have to _encode URI's_: > scheme = alpha *( alpha | digit | "+" | "-" | "." ) ;[RFC 2396] > domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum ;[RFC 2396] > labels 63 'septets' max each, dns 255 'septets' max, > possibly a desire not to change (immediately) the dns infrastructure, > and I also note: > hyphen hyphen and hyphen hyphen hyphen are allowed but rarely (never?) used in > practice, hence free for our use... > > > > So at the very top of the stack, use %hh escaped UTF-8. But deeper, utilise > somehow the hyphen to encode characters above ASCII. One possibility I here > suggest could be: > * triple-hyphened UTF-5 for when a scheme/username/domainlabel contains one or > more characters above Latin extended B > * double-hyphened UTF-8 otherwise > where: > * triple-hyphened UTF-5 means convert to UTF5 then insert "---" after first > letter > * double-hyphened UTF-8 means covert %XY to "X--Y" > * and note a bare(trailing) hyphen never occurs in these > * if in the unlikley event the original contains -- (or ---) then this is > encoded as "----2" (or "----3") > > > > > Examples: > > nihongo.jp > M---5E5M72COA9E.jp (is in triple-hyphened UTF-5; note translation done > on per label basis) > > www.{alpha=\u3B1}{beta=\u3B2}.gr > www.J---B1JB2.gr > > {oe=\u0153}uf.fr > For universal typing: %C5%93uf.fr > For the network itself: C--59--3uf.fr (rather than H---53N5M6.fr) > > feli{^c=\u0109}ulo > For universal typing: feli%C4%89ulo (or even %66%65%6C%69%C4%89%75%6C%6F also > allowed) > For the network itself: feliC--48--9ulo (rather than the longer > M---6M5MCM9H09N5MCMF) > > ridanta-feli{^c=\u0109}ulo@{oe=\u0153}uf.fr > ridanta-feliC--48--9ulo@C--59--3uf.fr > > > > (BTW, will toplabel ever need Unicode? If .store .web etc then yes) > (BTW, rather than these two methods could we just use double-hyphened UTF-5 or > would this not be compact enough for Latin languages?) > > > > Comments welcome please. Regards, > Aaron Irvine > (Belfast, Northern Ireland) > -- > > ----------------------------------------------------- > Aaron Irvine > mailto:airvine@corp.phone.com > ----------------------------------------------------- > > >
Received on Thursday, 2 March 2000 12:24:47 UTC