W3C home > Mailing lists > Public > uri@w3.org > January 2014

Re: Standardizing on IDNA 2003 in the URL Standard

From: John C Klensin <klensin@jck.com>
Date: Thu, 16 Jan 2014 12:24:57 -0500
To: "PUBLIC-IRI@W3.ORG" <public-iri@w3.org>, uri@w3.org
cc: IDNA update work <idna-update@alvestrand.no>, "www-tag.w3.org" <www-tag@w3.org>
Message-ID: <11891647C5FD408358D18BBB@JcK-HP8200.jck.com>
Hi.

With the understanding that I'm not really saying anything that
Mark, Andrew, and a few others haven't said but that a different
perspective may be worthwhile...

(1) If only because there are other protocols and actors in this
drama than web browsers, this continuing discussion leads us in
the direction of having four "standards":

(i) IDNA2008, plus or minus
	application-instance-specific or platform-specific use
	of RFC 5895.
(ii) IDNA2003
(iii) IDNA2008 + the mapping (as distinct from
	compatibility) part of UTR46 
(iv) IDNA2003 + Unspecified adaptations for Unicode
	versions later than 32 + UTR 46

Given that there are non-web i18n applications --notably the
now-deploying email specs and the work on various
security-related and other specs in PRECIS -- simply having four
"standards" is not going to be popular with users who be
certainly be astonished when what they see as "the same thing"
behaves differently in different contexts.  IMO, the only thing
that has saved us from an explosion about that so far is that
the significantly different behaviors among the above are mostly
edge cases.

The important difference between case (iv) and the others is
that, as others have pointed out, case (iv) is not one case and
no one actually knows what it actually means.  Yet, as I
understand it, that is precisely what Anne is proposing to
specify.  In terms of a standard, that comes pretty close to
"Unicode 3.2 is standardized and we hope that no properties of
it will change; for characters included in later versions of
Unicode, do what you like".  I can't think of anything kind to
say about that.

As to the first three, I remain concerned that there are a few
characters that are PVALID (or CONTEXTJ) under IDNA2008 that
UTS46 essentially prohibits using in any separate and distinct
ways.  There is no doubt in my mind that the maximally
conservative path is precisely that prohibition, preferably
enforced by registry rules that prevent separate registration of
both the IDNA2008-permitted character and whatever it would be
mapped to under IDNA2008 or UTS46.  But those who decide to go
with that plan need to recognize two things, for better or worse:

(i) There are hundreds of thousands, if not millions, of
separately-administered and controlled registries in the DNS.
If the criterion for getting rid of mappings that preempt the
use of the relevant IDNA2008-permitted characters becomes "all
DNS registries prohibit independent registration of both them
and the characters that formerly mapped to them" (or even "proof
that most registries prohibit...", then anyone who believes that
point is different from "never" is deluding themselves.  Worse,
each succeeding year in which web page authors believe that they
can and should depend on the mappings being present makes
discontinuing those mappings (ever) in browsers less possible.

(ii) Some people feel very strongly about the independent
availablity of those characters and, regardless of what "we"
might believe, do not see confusion or conflicts within the
context of their languages (or, e.g., "their" new gTLDs).  We
also know that disagreements about how a particular language is
represented in Unicode have led, in a few places, to very
serious discussions of legislative or judicial action against
the Unicode Consortium or banning the use of Unicode in those
areas.  Fortunately for those of us who favor open international
standards, those efforts have never gone anywhere.  But,
especially where there are conflicting standards, I see a real
possibility of some government taking the position that a
browser that de facto prohibits characters that they think
necessary and that are allowed by one of the standards is
anti-competitive and/or insulting to the national culture.  If
the country or region involved were in any way economically or
culturally significant, I'd assume that browser vendors --
especially those whose existence depends on either market share
in relevant areas or on the perception that they are "good guys"
that leads to contributions, would rapidly discover a need to
either be compatible with the the standard that supported the
relevant national characters or to got to the considerable
expense and aggravation of creating a one-off implementation
that would accommodate the national demands. 

---------------

FWIW, I continue to believe that the right way forward is one
that is largely consistent with all of the present approaches in
the long run.  It would be something like:

(1) Advise web page authors and tool-builders that hrefs, things
that map into them (e.g., IRIs), or equivalent that depend on
mappings are just a bad idea, have been a bad idea since
IDNA2003 was introduced, and that uses of them should be revised
out of existence as quickly as possible.  In other words,
unambiguously deprecate the practice without necessarily
stopping uses of it from working.

(2) Advise browser implementers to support a pair of "no
mapping" switches, one for user input and the other for hrefs
and equivalent.  Ideally, those switched should have values of
"yes, map", "no, don't map", and "warn in cases where mapping is
about to be applied and then do it".   By default, the "user
input" one should start at "yes" and the "href" one should start
with "warm" with the expectation of possibly migrating the "no"
over time, but it should be possible for users and those
specifying system configurations or national localizations to
set them differently.  

That combination allows everyone to move forward and lets
browsers be agile relative to evolving usage and demands.  For
example, if a government did impose a requirement wrt
independent use of characters in a particular language, that
could be handled as a localization matter rather than a browser
revision, regardless of what one thought of the merits of their
position.  People working with sufficiently old HTML files could
set switches appropriately so that those pages would continue to
work in their environments.  And it would allow us to start
moving away from the "four competing standards" situation
because it really does provide the migration path that we don't
have now (and that has led to various versions of what some of
us describe as "IDNA2003, more or less, forever".

best,
   john
Received on Thursday, 16 January 2014 17:25:26 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:16 UTC