- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Mon, 12 Nov 2012 20:14:32 +0900
- To: "public-iri@w3.org" <public-iri@w3.org>, Pete Resnick <presnick@qualcomm.com>, Peter Saint-Andre <stpeter@stpeter.im>, Chris Weber <chris@lookout.net>
- CC: "uri@w3.org" <uri@w3.org>, Larry Masinter <masinter@adobe.com>, Anne van Kesteren <annevk@opera.com>, Ian Hickson <ian@hixie.ch>
Like others, I'd like to express my personal view of the future of internationalized identifiers. At the start of internationalization, it was very clear that content had to come first. Fortunately, today, it's easy to write a Web page or an email in almost any languages one wishes. Identifiers were a next area of concern. In some contexts, e.g. file names, users now take them for granted. On any OS, it would be a big hassle if a user had to cook up a romanized or translated name for a document just because the OS was ASCII-only. I can say that from my own experience; working in Japan, I get lots of job-related documents with non-ASCII names, and create some myself. This lets me feel the need for internationalized identifiers every day. In the IETF, there has also been a lot of work. Internationalized Domain Names have come a long way since I put out the first proposal in 1996. Email Address Internationalization (EAI) went through an experimental phase and is now very close to completing Proposed Standards. Other technologies such as XMPP use non-ASCII identifiers, too. With stringprep and precis, the IETF has also done a lot of work in the area of equivalence of internationalized identifiers. For URIs and IDNs, internationalization is available via ACE (ASCII-compatible encoding). This addresses low-level backwards-compatibility issues (e.g. HTTP 1.1). But the user obviously wants to see räksmörgås, and is annoyed by xn--rksmrgs-5wao1o or r%C3%A4ksm%C3%B6rg%C3%A5s. EAI and XMPP take this a step further, they just use UTF-8. For EAI, this is amazing because for the longest time, there was only the mantra "email will always stay 7-bit". So things move, even if not at a very fast pace. And my prediction is that they will continue to move. Users prefer to see what they can read. Implementers prefer UTF-8 rather than a charset zoo. If a new protocol or format (in the IETF or elsewhere) is UTF-8 only, there is not much of a point to transfer URIs or IDNs in ACE form. But neither are these just presentation elements, or just something that needs pre-processing. Based on this background: * I support closing the IETF IRI WG. Most of the work on IRIs (from ca. 1995 to 2009) was done without a WG. A WG is not a precondition for work to get done, and not a way to make work magically faster. * Some time after RFC 3987 was published, I started to update it (http://tools.ietf.org/html/draft-duerst-iri-bis-00) to take into account errata and feedback from implementers. I plan to continue this work. * I will continue to support Anne in his work on the WHATWG URL spec. In particular, documenting browser bugwards-compatibility and getting the browsers to align their implementations in this area is very important, and very hard. * There are other implementations than browsers and other technologies than HTML and its surroundings. Browsers have very peculiar market pressures on bugwards compatibility that fortunately don't apply in the same way to other implementations. Also, other implementations are processing URIs/IRIs/URLs in other ways than browsers. I plan to work to make sure these needs are covered, too, in whatever form that may take. * I hope that we can find a good way to proceed with RFC 4395bis (registration), and am willing to contribute. There is a lot of good stuff in there registration-wise and internationalization-wise. Of the four WG specs, it is the one with the most open issues, but probably the one which can be moved forward most quickly. Regards, Martin.
Received on Monday, 12 November 2012 11:15:05 UTC