UseSTD3ASCIIRules in IRI resolution; host syntax in RFC 3987 and RFC 3987

Gentlemen,

excuse the direct spam, please; for archival purposes, I'm also  
copying the public-iri list on this note.

   http://lists.w3.org/Archives/Public/public-iri/

It seems like RFC 3987 is mandating the use of the UseSTD3ASCIIRules  
flag when using ToASCII to map an IRI into a URI; see:

   http://tools.ietf.org/html/rfc3987#section-3.1
> Replace the ireg-name part of the IRI by the part converted using  
> the ToASCII operation specified in section 4.1 of [RFC3490] on  
> each    dot-separated label, and by using U+002E (FULL STOP) as a  
> label separator, with the flag UseSTD3ASCIIRules set to TRUE, and  
> with the flag AllowUnassigned set to FALSE for creating IRIs and set  
> to TRUE otherwise.


A quick search in various archives suggests that the genesis of the  
UseSTD3ASCIIRules flag in 3987 relates to what is now section 3.2.2 of  
RFC 3986 (at the time, section 3.2.2 of RFC 2396):

   http://www.imc.org/idn/mail-archive/msg07277.html
   http://tools.ietf.org/html/rfc3986#section-3.2.2

Interestingly, following through on the references from there  
effectively brings us to the name production in appendix B of RFC 952,

   http://tools.ietf.org/html/rfc952

-- which in turn forbids the double hyphen in a name. That's, of  
course, a really fine restriction on registrations, but it could  
(absurdly) be read to prohibit use of an A-label in URI references.   
That's certainly not a useful conclusion; the formal syntax in RFC  
3986 is actually vague enough to permit them.

All this just proves that strict spec lawyering on permissible strings  
doesn't get us to a useful place here, and that there's probably a  
need to distinguish registration guidelines from what's permissible in  
a URI (or IRI) reference.


With that, I'm left to wonder what the UseSTD3ASCIIRules restriction  
in RFC 3987 is meant to achieve?  My suspicion would be that we'd  
actually *not* want to set UseSTD3ASCIIRules when converting from an  
IRI reference to a URI reference, mostly in order to be conservative  
about unnecessary restrictions that might bite us later.


Finally, empirics: I've done some quick tests with the browsers that I  
run here (Firefox 3.1 beta3, Opera 9.64, Safari 4 beta [5528.16]).    
All of these were able to dereference a hyperlink to http:// 
_test0_α.does-not-exist.org/, i.e., they do not actually set  
UseSTD3ASCIIRules.


Thoughts?

Thanks,
--
Thomas Roessler, W3C  <tlr@w3.org>

Received on Thursday, 19 March 2009 15:27:52 UTC