W3C home > Mailing lists > Public > uri@w3.org > November 2012

Future of Internationalized Identifiers

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Mon, 12 Nov 2012 20:14:32 +0900
Message-ID: <50A0DA18.7010107@it.aoyama.ac.jp>
To: "public-iri@w3.org" <public-iri@w3.org>, Pete Resnick <presnick@qualcomm.com>, Peter Saint-Andre <stpeter@stpeter.im>, Chris Weber <chris@lookout.net>
CC: "uri@w3.org" <uri@w3.org>, Larry Masinter <masinter@adobe.com>, Anne van Kesteren <annevk@opera.com>, Ian Hickson <ian@hixie.ch>
Like others, I'd like to express my personal view of the future of 
internationalized identifiers.

At the start of internationalization, it was very clear that content had 
to come first. Fortunately, today, it's easy to write a Web page or an 
email in almost any languages one wishes.

Identifiers were a next area of concern. In some contexts, e.g. file 
names, users now take them for granted. On any OS, it would be a big 
hassle if a user had to cook up a romanized or translated name for a 
document just because the OS was ASCII-only. I can say that from my own 
experience; working in Japan, I get lots of job-related documents with 
non-ASCII names, and create some myself. This lets me feel the need for 
internationalized identifiers every day.

In the IETF, there has also been a lot of work. Internationalized Domain 
Names have come a long way since I put out the first proposal in 1996. 
Email Address Internationalization (EAI) went through an experimental 
phase and is now very close to completing Proposed Standards. Other 
technologies such as XMPP use non-ASCII identifiers, too. With 
stringprep and precis, the IETF has also done a lot of work in the area 
of equivalence of internationalized identifiers.

For URIs and IDNs, internationalization is available via ACE 
(ASCII-compatible encoding). This addresses low-level 
backwards-compatibility issues (e.g. HTTP 1.1). But the user obviously 
wants to see räksmörgås, and is annoyed by xn--rksmrgs-5wao1o or 
r%C3%A4ksm%C3%B6rg%C3%A5s. EAI and XMPP take this a step further, they 
just use UTF-8. For EAI, this is amazing because for the longest time, 
there was only the mantra "email will always stay 7-bit".

So things move, even if not at a very fast pace. And my prediction is 
that they will continue to move. Users prefer to see what they can read. 
Implementers prefer UTF-8 rather than a charset zoo. If a new protocol 
or format (in the IETF or elsewhere) is UTF-8 only, there is not much of 
a point to transfer URIs or IDNs in ACE form. But neither are these just 
presentation elements, or just something that needs pre-processing.


Based on this background:

* I support closing the IETF IRI WG. Most of the work on IRIs (from ca. 
1995 to 2009) was done without a WG. A WG is not a precondition for work 
to get done, and not a way to make work magically faster.

* Some time after RFC 3987 was published, I started to update it 
(http://tools.ietf.org/html/draft-duerst-iri-bis-00) to take into 
account errata and feedback from implementers. I plan to continue this work.

* I will continue to support Anne in his work on the WHATWG URL spec. In 
particular, documenting browser bugwards-compatibility and getting the 
browsers to align their implementations in this area is very important, 
and very hard.

* There are other implementations than browsers and other technologies 
than HTML and its surroundings. Browsers have very peculiar market 
pressures on bugwards compatibility that fortunately don't apply in the 
same way to other implementations. Also, other implementations are 
processing URIs/IRIs/URLs in other ways than browsers. I plan to work to 
make sure these needs are covered, too, in whatever form that may take.

* I hope that we can find a good way to proceed with RFC 4395bis 
(registration), and am willing to contribute. There is a lot of good 
stuff in there registration-wise and internationalization-wise. Of the 
four WG specs, it is the one with the most open issues, but probably the 
one which can be moved forward most quickly.


Regards,   Martin.
Received on Monday, 12 November 2012 11:15:05 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:16 UTC