W3C home > Mailing lists > Public > uri@w3.org > July 2009

Re: IDNA and IRI document way forward

From: Charles Lindsey <chl@clerew.man.ac.uk>
Date: Wed, 29 Jul 2009 10:47:40 +0100
To: URI <uri@w3.org>
Message-ID: <op.uxtphqxm6hl8nm@clerew.man.ac.uk>
On Wed, 29 Jul 2009 05:52:58 +0100, Larry Masinter <masinter@adobe.com>  
wrote:

> A note about the direction for URI and IRI:
>
> There are other uses of domain names in URIs currently; for example,  
> cid: (content-ID) strings often contain domain names.  I'm not sure but  
> it may be reasonable to *not* allow IRI forms, e.g., require that all  
> URIs not using scheme://host/path syntax not allow hex-encoded octets  
> above %7F, for example.

That doesn't seem right. A domain name (usually after an '@') in a  
Content-ID or a Message-ID is just a convention (in fact any meaningless  
string would do, provided it is believed to be unique). In particular, it  
is NEVER required to submit that supposed domain name to a DNS query. What  
IS required is that the cid or mid should always compare equal to other  
copies of itself, hoverer it or those copies may have been mangled during  
transmission. Hence encoding it in punycode is never likely to be helpful,  
but encoding it in hex, even for octets greater than %7F, should always be  
harmless and easily reversible. This situation could well arise in the  
case of the news URI scheme, for example.

Fortunately, current I18N efforts such as EAI have chosen to retain a  
strict ASCII syntax for cids and mids, but that might not remain true  
indefinitely.

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131                       
   Web: http://www.cs.man.ac.uk/~chl
Email: chl@clerew.man.ac.uk      Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
Received on Wednesday, 29 July 2009 09:48:19 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 January 2011 12:15:42 GMT