Re: Minutes for 14 May 2009 from Martin J. Dürst on 2009-06-16 (public-iri@w3.org from June 2009)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Tue, 16 Jun 2009 18:21:56 +0900
To: Mark Nottingham <mnot@mnot.net>
CC: public-ietf-w3c <public-ietf-w3c@w3.org>, "public-iri@w3.org" <public-iri@w3.org>
Message-ID: <4A376434.4000109@it.aoyama.ac.jp>
Hello everybody,

Thanks to Mark for bringing out the minutes so quickly (for this call, 
when compared to the average in the past).

On 2009/06/12 13:01, Mark Nottingham wrote:
>   - DRAFT -
>
> W3C/IETF call
>
> 14 May 2009

> URI/IRI coordination
>
> John: it's been escalated to the IAB but no schedule yet.

Good to know. The sooner we know if the IAB has any concerns, what these 
concerns are, who has them, and how we can talk to them, the better.

> IDNA bis is
> in a state of confusion but hoping we're making progress.
> ... one of the key questions is what kind of domain names are permitted
> in IRI and URI.

Currently, URIs (RFC 3986) permit ASCII-only domain names and IDNs 
encoded in UTF-8 and percent-escaped. There is a backwards-compatibility 
warning for the later. You can test whether that is implement in browsers at
http://www.w3.org/2004/04/uri-rel-test.html#reg-percent

Opera works. Firefox does something strange, it shows the decoded IDN in 
the address field, but then says Address Not Found in the page area, 
giving the percent-encoded version. Definitely confusing for users.
IE7 works only for http://www.w%33.org, which isn't really an IDN.
Safari gives a Network Error (dns_unresolved_hostname), but shows the 
IDN, unescaped, in the page area, and the xn-- version in the address 
bar. Pressing enter in the address bar resolves the page, and changes 
the address bar to the real IDN.


> ... percent encoding in utf-8 in URI would be an issue for IRI as well.

And is covered in RFC 3987. The IRI spec has the following to say re. 
the backwards-compatibility issue (appologies for the length):

    Systems accepting IRIs MAY convert the ireg-name component of an IRI
    as follows (before step 2 above) for schemes known to use domain
    names in ireg-name, if the scheme definition does not allow
    percent-encoding for ireg-name:

    Replace the ireg-name part of the IRI by the part converted using the
    ToASCII operation specified in section 4.1 of [RFC3490] on each
    dot-separated label, and by using U+002E (FULL STOP) as a label
    separator, with the flag UseSTD3ASCIIRules set to TRUE, and with the
    flag AllowUnassigned set to FALSE for creating IRIs and set to TRUE
    otherwise.

    The ToASCII operation may fail, but this would mean that the IRI
    cannot be resolved.  This conversion SHOULD be used when the goal is
    to maximize interoperability with legacy URI resolvers.  For example,
    the IRI

    "http://r&#xE9;sum&#xE9;.example.org"

    may be converted to

    "http://xn--rsum-bpad.example.org"

    instead of

    "http://r%C3%A9sum%C3%A9.example.org".

    An IRI with a scheme that is known to use domain names in ireg-name,
    but where the scheme definition does not allow percent-encoding for
    ireg-name, meets scheme-specific restrictions if either the
    straightforward conversion or the conversion using the ToASCII
    operation on ireg-name result in an URI that meets the scheme-
    specific restrictions.

    Such an IRI resolves to the URI obtained after converting the IRI and
    uses the ToASCII operation on ireg-name.  Implementations do not have
    to do this conversion as long as they produce the same result.

and later down:

    Note: In practice, whether the general mapping (steps 1 and 2) or the
       ToASCII operation of [RFC3490] is used for ireg-name will not be
       noticed if mapping from IRI to URI and resolution is tightly
       integrated (e.g., carried out in the same user agent).  But
       conversion using [RFC3490] may be able to better deal with
       backwards compatibility issues in case mapping and resolution are
       separated, as in the case of using an HTTP proxy.

    Note: Internationalized Domain Names may be contained in parts of an
       IRI other than the ireg-name part.  It is the responsibility of
       scheme-specific implementations (if the Internationalized Domain
       Name is part of the scheme syntax) or of server-side
       implementations (if the Internationalized Domain Name is part of
       'iquery') to apply the necessary conversions at the appropriate
       point.  Example: Trying to validate the Web page at
       http://r&#xE9;sum&#xE9;.example.org would lead to an IRI of
       http://validator.w3.org/check?uri=http%3A%2F%2Fr&#xE9;sum&#xE9;.
       example.org, which would convert to a URI of
       http://validator.w3.org/check?uri=http%3A%2F%2Fr%C3%A9sum%C3%A9.
       example.org.  The server side implementation would be responsible
       for making the necessary conversions to be able to retrieve the
       Web page.

> ... some coordination with html5 might be useful to be in sync.

Yes. It would be good if interested people subscribed to public-iri@w3.org.

> mnot: this is a serious issue indeed and the html5 has probably
> different ideas.
> ... would it be useful to collect the issues somewhere?
>
> John: yes.
>
> mnot: if you send me what you have, I'm willing to help. will also get
> some input from Thomas as well.
>
> <scribe> ACTION: John to send IRI issues to Mark
>
> <scribe> ACTION: Mark to put IRI issues in wiki

What wiki is that? There is an issues list for the update of the iri 
spec at http://www.w3.org/International/iri-edit/#Issues.


> <JcK> plh: That is really URI/IRI issues in both case -- the URIs are
> the much harder problem in some ways.

It would definitely be good to get to know more about what people on the 
call thought the issues were. It is extremely difficult to guess from 
the minutes (a general problem with minutes).


> In some respects, the _only_ IRI
> problem is how much they can be treated as protocol elements and what
> that means

Here is what I wrote on the question of protocol elements vs. user 
interface elements a couple weeks ago in a widely circulated private 
thread, slightly adapted:

Some people think that anything including non-ASCII characters is bound 
to fail sooner or later, and therefore don't want to have them in any 
protocol. Some people think that the more places IRIs get accepted, the 
better. I'm personally somewhat tending to the later view, but it is 
very clear that there are some protocols (and formats) where IRIs are 
very appropriate, and others where they are not. Also, I think that the 
terms 'protocol element' and 'user interface element' are valuable, but 
because there are many protocols, and many user interfaces, it's not the 
only distinction that counts.

To mention some specific examples, IRIs (or some variant thereof) are 
used in HTML, in places that would probably rather be called 'protocol 
elements' than 'user interface elements', with the caveat that HTML 
isn't really a protocol but a format.

A strict 'user interface element only' view would prohibit that, but 
given that IRIs currently work on most browsers, and that people use 
them, and that people think that they can copy something from an 
address/location field to an href attribute in an HTML document, and so 
on, seems to suggest that allowing IRIs in HTML 'protocol elements' 
isn't overly harmful. Similar considerations, with less or different 
baggage, apply e.g. to XML and Atom.

There may be analogies with other IETF work. For IDNs, you can see 
punycode as the protocol element version, and call actual IDNs 'user 
interface elements', and that view is certainly correct from a DNS 
perspective. But a higher layer protocol might easily use INDs directly.
A good example here would be EAI (email address internationalization), 
where indeed IDNs are used directly as right hand sides.

I think such a pragmatic view (make sure we know where IRIs are allowed, 
and where not; maybe give some guidelines for protocol designers on how 
to decide) will prevail in the end.

Regards,    Martin.


> [NEW] ACTION: John to send IRI issues to Mark [recorded in
> http://www.w3.org/2009/05/14-ietf-minutes.html#action06]

> [NEW] ACTION: Mark to put IRI issues in wiki [recorded in
> http://www.w3.org/2009/05/14-ietf-minutes.html#action07]

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
Received on Tuesday, 16 June 2009 09:23:04 UTC