- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Tue, 16 Jun 2009 18:21:56 +0900
- To: Mark Nottingham <mnot@mnot.net>
- CC: public-ietf-w3c <public-ietf-w3c@w3.org>, "public-iri@w3.org" <public-iri@w3.org>
Hello everybody, Thanks to Mark for bringing out the minutes so quickly (for this call, when compared to the average in the past). On 2009/06/12 13:01, Mark Nottingham wrote: > - DRAFT - > > W3C/IETF call > > 14 May 2009 > URI/IRI coordination > > John: it's been escalated to the IAB but no schedule yet. Good to know. The sooner we know if the IAB has any concerns, what these concerns are, who has them, and how we can talk to them, the better. > IDNA bis is > in a state of confusion but hoping we're making progress. > ... one of the key questions is what kind of domain names are permitted > in IRI and URI. Currently, URIs (RFC 3986) permit ASCII-only domain names and IDNs encoded in UTF-8 and percent-escaped. There is a backwards-compatibility warning for the later. You can test whether that is implement in browsers at http://www.w3.org/2004/04/uri-rel-test.html#reg-percent Opera works. Firefox does something strange, it shows the decoded IDN in the address field, but then says Address Not Found in the page area, giving the percent-encoded version. Definitely confusing for users. IE7 works only for http://www.w%33.org, which isn't really an IDN. Safari gives a Network Error (dns_unresolved_hostname), but shows the IDN, unescaped, in the page area, and the xn-- version in the address bar. Pressing enter in the address bar resolves the page, and changes the address bar to the real IDN. > ... percent encoding in utf-8 in URI would be an issue for IRI as well. And is covered in RFC 3987. The IRI spec has the following to say re. the backwards-compatibility issue (appologies for the length): Systems accepting IRIs MAY convert the ireg-name component of an IRI as follows (before step 2 above) for schemes known to use domain names in ireg-name, if the scheme definition does not allow percent-encoding for ireg-name: Replace the ireg-name part of the IRI by the part converted using the ToASCII operation specified in section 4.1 of [RFC3490] on each dot-separated label, and by using U+002E (FULL STOP) as a label separator, with the flag UseSTD3ASCIIRules set to TRUE, and with the flag AllowUnassigned set to FALSE for creating IRIs and set to TRUE otherwise. The ToASCII operation may fail, but this would mean that the IRI cannot be resolved. This conversion SHOULD be used when the goal is to maximize interoperability with legacy URI resolvers. For example, the IRI "http://résumé.example.org" may be converted to "http://xn--rsum-bpad.example.org" instead of "http://r%C3%A9sum%C3%A9.example.org". An IRI with a scheme that is known to use domain names in ireg-name, but where the scheme definition does not allow percent-encoding for ireg-name, meets scheme-specific restrictions if either the straightforward conversion or the conversion using the ToASCII operation on ireg-name result in an URI that meets the scheme- specific restrictions. Such an IRI resolves to the URI obtained after converting the IRI and uses the ToASCII operation on ireg-name. Implementations do not have to do this conversion as long as they produce the same result. and later down: Note: In practice, whether the general mapping (steps 1 and 2) or the ToASCII operation of [RFC3490] is used for ireg-name will not be noticed if mapping from IRI to URI and resolution is tightly integrated (e.g., carried out in the same user agent). But conversion using [RFC3490] may be able to better deal with backwards compatibility issues in case mapping and resolution are separated, as in the case of using an HTTP proxy. Note: Internationalized Domain Names may be contained in parts of an IRI other than the ireg-name part. It is the responsibility of scheme-specific implementations (if the Internationalized Domain Name is part of the scheme syntax) or of server-side implementations (if the Internationalized Domain Name is part of 'iquery') to apply the necessary conversions at the appropriate point. Example: Trying to validate the Web page at http://résumé.example.org would lead to an IRI of http://validator.w3.org/check?uri=http%3A%2F%2Frésumé. example.org, which would convert to a URI of http://validator.w3.org/check?uri=http%3A%2F%2Fr%C3%A9sum%C3%A9. example.org. The server side implementation would be responsible for making the necessary conversions to be able to retrieve the Web page. > ... some coordination with html5 might be useful to be in sync. Yes. It would be good if interested people subscribed to public-iri@w3.org. > mnot: this is a serious issue indeed and the html5 has probably > different ideas. > ... would it be useful to collect the issues somewhere? > > John: yes. > > mnot: if you send me what you have, I'm willing to help. will also get > some input from Thomas as well. > > <scribe> ACTION: John to send IRI issues to Mark > > <scribe> ACTION: Mark to put IRI issues in wiki What wiki is that? There is an issues list for the update of the iri spec at http://www.w3.org/International/iri-edit/#Issues. > <JcK> plh: That is really URI/IRI issues in both case -- the URIs are > the much harder problem in some ways. It would definitely be good to get to know more about what people on the call thought the issues were. It is extremely difficult to guess from the minutes (a general problem with minutes). > In some respects, the _only_ IRI > problem is how much they can be treated as protocol elements and what > that means Here is what I wrote on the question of protocol elements vs. user interface elements a couple weeks ago in a widely circulated private thread, slightly adapted: Some people think that anything including non-ASCII characters is bound to fail sooner or later, and therefore don't want to have them in any protocol. Some people think that the more places IRIs get accepted, the better. I'm personally somewhat tending to the later view, but it is very clear that there are some protocols (and formats) where IRIs are very appropriate, and others where they are not. Also, I think that the terms 'protocol element' and 'user interface element' are valuable, but because there are many protocols, and many user interfaces, it's not the only distinction that counts. To mention some specific examples, IRIs (or some variant thereof) are used in HTML, in places that would probably rather be called 'protocol elements' than 'user interface elements', with the caveat that HTML isn't really a protocol but a format. A strict 'user interface element only' view would prohibit that, but given that IRIs currently work on most browsers, and that people use them, and that people think that they can copy something from an address/location field to an href attribute in an HTML document, and so on, seems to suggest that allowing IRIs in HTML 'protocol elements' isn't overly harmful. Similar considerations, with less or different baggage, apply e.g. to XML and Atom. There may be analogies with other IETF work. For IDNs, you can see punycode as the protocol element version, and call actual IDNs 'user interface elements', and that view is certainly correct from a DNS perspective. But a higher layer protocol might easily use INDs directly. A good example here would be EAI (email address internationalization), where indeed IDNs are used directly as right hand sides. I think such a pragmatic view (make sure we know where IRIs are allowed, and where not; maybe give some guidelines for protocol designers on how to decide) will prevail in the end. Regards, Martin. > [NEW] ACTION: John to send IRI issues to Mark [recorded in > http://www.w3.org/2009/05/14-ietf-minutes.html#action06] > [NEW] ACTION: Mark to put IRI issues in wiki [recorded in > http://www.w3.org/2009/05/14-ietf-minutes.html#action07] -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Tuesday, 16 June 2009 09:23:05 UTC