Re: Agreement on IRI "processing spec" moving to W3C from Martin J. Dürst on 2011-11-21 (public-iri@w3.org from November 2011)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Mon, 21 Nov 2011 16:35:45 +0900
To: John C Klensin <john-ietf@jck.com>
CC: Chris Weber <chris@lookout.net>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Message-ID: <4EC9FF51.20806@it.aoyama.ac.jp>
On 2011/11/21 0:29, John C Klensin wrote:

> --On Saturday, November 19, 2011 22:11 -0800 Chris Weber
> <chris@lookout.net>  wrote:
>
>> During IETF 82 an announcement was made that the IRI
>> "processing spec" would move to the W3C for creation as a
>> self-contained document.  See
>> <http://trac.tools.ietf.org/wg/iri/>  for the minutes.
>>
>> Are IRI WG members in agreement on this decision?
>
> For whatever it is worth, I object to this decision.

[I'm not trying to take a stand on this one way or another at the moment.]

> It became clear in the WG meeting (not that it wasn't before)
> that one of the problems with IRIs and "the specification" is
> that it becomes more and more complicated as we try to deal with
> different (sometimes conflicting) objectives and assumptions
> about what the specs are about.   Drawing a few examples from
> the meeting discussion,...

Just trying to give some explanations

>  (i)  Are IRIs protocol elements or presentation/ display
>  elements or halfway in between.

They can be both, depending on the circumstances. The current draft and 
issues list have some confusion there (that you rightly pointed out 
during the meeting), but that's mostly editorial in nature and should be 
fixed easily.

>                                       Does a real
>  presentation element need to be localized sufficiently
>  that it is not confusing to users who haven't been
>  carefully educated about subtle rules? Do we try to
>  answer that question with separate specs?

I think the answer to this is mostly that IRIs are in many ways very 
similar to URIs. For URIs (and limiting users to those familiar with the 
basic Latin alphabet), there's a wide range of opinions from "a user 
should never ever see it" to "use words that convey what the URI is 
standing for so that people know what the URI is about". I have seen 
conflicting opinions uttered on this by one and the same person in a 
very short timespan, on many occasions and by many different people.

For additional help, I'm just copying slide 5 of my presentation for the 
meeting, which I skipped over very quickly because I had already showed 
the same slide at past meetings:

Please Don't Forget
• URIs/IRIs are a META-syntax
• Many pieces with different requirements get thrown together
• URIs/IRIs can be:
   – Absolute, complete from scheme to fragment id
   – Relative, just one or a few pieces
   – User-oriented (short, memorable)
   – Back-end (long, complicated)

As the Web has shown, there's a huge benefit from having an unifying hub 
syntax for URIs/IRIs. The tradeoff is that there's also some confusion, 
but that's still a tradeoff worth making.


>  (ii) Should the domain part of a conversion to URI form
>  be required to be converted to A-labels? Note that, in
>  addition to the "will it work on lookup" question asked
>  several times during the meeting discussion, there is an
>  issue about whether it will convert, not just because of
>  the mapping question (to require that IRIs use only
>  unmapped or post-mapping forms, RFC 5985 with or without
>  some profile, UTR 46 with or without some profile, or
>  something else), there is the matter that RFC 5891/5892
>  prohibits some characters and character sequences that
>  are allowed by some private namespaces (and probably
>  vice versa).

In general, I predict that we will end up with a very strong 
recommendation (SHOULD) for producing and keeping domain name parts as 
U-Labels in IRIs, so that the question of mapping is eliminated as much 
as possible. And then we will have some advice re. mapping, but that 
will only be advice.

Also, my guess is that the registry systems besides DNS don't have as 
strong restrictions as IDNA 2003/8. I seriously doubt that Microsoft 
went to as great lengths to define what's okay and what not in their 
system (but I'd gladly be convinced of the contrary).


>  (iii) Or should the domain part be required to be
>  converted only to %-encoding-of-UTF-8 form, noting that
>  those encodings freeze a particular normalization (or
>  lack thereof) and _that_ can cause lookups to fail in
>  environments that make different assumptions.

The %-encoding form makes a particular form more visible, but doesn't 
really freeze it. It may be claimed to be frozen in the same way as when 
it's in electronic form in UTF-8 (or in UTF-16 or whatever).

And we currently require neither punycode nor %-encoding, because each 
of them has some advantages and some disadvantages.


> And, of course, none of those decisions really affect domain
> names embedded in URI tails because they can be detected to be
> domain names only heuristically.

I forgot to make this point at the meeting, but it is another reason why 
%-encoding for domain name parts is often a good thing.

Regards,   Martin.


> While I normally believe in modularizing problems and breaking
> things up into separate work items, the apparent level of the
> unresolved problems, and problems still being discovered, in
> this area create a serious risk that we are modularizing without
> a sufficient understanding of the issues to define really clear
> boundaries among modules.  By spitting the issues up into
> separate documents, separate working groups, and separate
> organizations, we vastly increase the odds of coming up with an
> overall collection of specifications that don't quite fit
> together, have serious gaps, or are otherwise inherently
> non-interoperable.  I therefore believe that some fundamental
> reconsideration is needed, not just additional ways to try to
> slice and dice the problem.
>
>      john
>
>
>
>
>
>
Received on Monday, 21 November 2011 07:36:19 UTC