Re: Future of Internationalized Identifiers

Hi Martin,

On 11/12/12 4:14 AM, "Martin J. Dürst" wrote:
> Like others, I'd like to express my personal view of the future of
> internationalized identifiers.
> 
> At the start of internationalization, it was very clear that content had
> to come first. Fortunately, today, it's easy to write a Web page or an
> email in almost any languages one wishes.
> 
> Identifiers were a next area of concern. In some contexts, e.g. file
> names, users now take them for granted. On any OS, it would be a big
> hassle if a user had to cook up a romanized or translated name for a
> document just because the OS was ASCII-only. I can say that from my own
> experience; working in Japan, I get lots of job-related documents with
> non-ASCII names, and create some myself. This lets me feel the need for
> internationalized identifiers every day.
> 
> In the IETF, there has also been a lot of work. Internationalized Domain
> Names have come a long way since I put out the first proposal in 1996.
> Email Address Internationalization (EAI) went through an experimental
> phase and is now very close to completing Proposed Standards. Other
> technologies such as XMPP use non-ASCII identifiers, too. With
> stringprep and precis, the IETF has also done a lot of work in the area
> of equivalence of internationalized identifiers.
> 
> For URIs and IDNs, internationalization is available via ACE
> (ASCII-compatible encoding). This addresses low-level
> backwards-compatibility issues (e.g. HTTP 1.1). But the user obviously
> wants to see räksmörgås, and is annoyed by xn--rksmrgs-5wao1o or
> r%C3%A4ksm%C3%B6rg%C3%A5s. EAI and XMPP take this a step further, they
> just use UTF-8. For EAI, this is amazing because for the longest time,
> there was only the mantra "email will always stay 7-bit".
> 
> So things move, even if not at a very fast pace. And my prediction is
> that they will continue to move. Users prefer to see what they can read.
> Implementers prefer UTF-8 rather than a charset zoo. If a new protocol
> or format (in the IETF or elsewhere) is UTF-8 only, there is not much of
> a point to transfer URIs or IDNs in ACE form. But neither are these just
> presentation elements, or just something that needs pre-processing.

That all seems reasonable.

> Based on this background:
> 
> * I support closing the IETF IRI WG. 

Sadly, I concur with you and Larry here.

> Most of the work on IRIs (from ca.
> 1995 to 2009) was done without a WG. A WG is not a precondition for work
> to get done, and not a way to make work magically faster.

True. As you know, I've put a lot of work into reaching for a successful
conclusion to the IRI WG (not as much as you and Larry, for sure), and
I'm disappointed that we were not able to to involve more participants
and contributors. As SM notes in his reply to Larry, that seems to be
the way of the world with regard to internationalization (and even
something as fundamental as URIs).

> * Some time after RFC 3987 was published, I started to update it
> (http://tools.ietf.org/html/draft-duerst-iri-bis-00) to take into
> account errata and feedback from implementers. I plan to continue this
> work.

I'm happy to hear it.

> * I will continue to support Anne in his work on the WHATWG URL spec. In
> particular, documenting browser bugwards-compatibility and getting the
> browsers to align their implementations in this area is very important,
> and very hard.

Indeed.

> * There are other implementations than browsers and other technologies
> than HTML and its surroundings. Browsers have very peculiar market
> pressures on bugwards compatibility that fortunately don't apply in the
> same way to other implementations. Also, other implementations are
> processing URIs/IRIs/URLs in other ways than browsers. I plan to work to
> make sure these needs are covered, too, in whatever form that may take.

Great.

> * I hope that we can find a good way to proceed with RFC 4395bis
> (registration), and am willing to contribute. There is a lot of good
> stuff in there registration-wise and internationalization-wise. Of the
> four WG specs, it is the one with the most open issues, but probably the
> one which can be moved forward most quickly.

Agreed. One possibility is spinning up a small WG in the IETF
Applications Area dedicated to simplifying and modernizing registration
requirements for a variety of technologies (URIs/IRIs, link relations,
etc.). I think it would be good to work on these updates in a
semi-coordinated fashion.

Peter

-- 
Peter Saint-Andre
https://stpeter.im/

Received on Tuesday, 13 November 2012 16:16:33 UTC