Concerns about new domain names, particularly non-Latin-scripts -- getting the tech community together

Hi

This was an informal email sent to a perhaps rather random collection of people at the IETF, W3C and Unicode Consortium, to see whether we need to kick off or re-open a conversation, now being re-posted to public-iri to enable conversation there (if the chairs approve).

I think that we have here one of those awkward areas that straddle technology and policy, and historically the technology groups have steered away from ‘policy questions’. Unfortunately, I think it’s a grey area and some policy answers have technology impacts, and that it’s possible to conceive of some that are ‘bad for the Internet’ or ‘break the web’. I think we need to find a way to enable the technical community to get more involved.

* * *

As I am sure you are aware, ICANN has introduced, and will introduce more, top-level domains, of which a number are or will be non-Latin-script.

I have a suspicion that some of the RFCs and other documents that exist were written ‘knowing’ that the top-level domains were essentially just the historic 6 (com, mil, net, org, edu and arpa) and the geographic ones.

It also seems that some of the treatment of ‘structured text’ — that has a structure and meaning associated with that structure, such as URLs and mail addresses — was defined assuming that we would not, or did not need to, treat it differently from regular text.

Attached you will find a PDF document (sorry, since appearance is an important part of the discussion, PDF seemed best; I hope that the formatting and so on has not got messed up), outlining some issues we have noticed recently, and concluding with some recommendations based on those issues. I rather suspect that there are more issues than I outline.  I do wonder if we should be taking more positive steps to build up a shared set of test cases as well, that check for resolution, presentation, entry, selection, and other problems in domain names. I am also aware that in some places the Public Suffix List is used for a secondary purpose, as a way to sanity check host names.
The document doesn’t “dig deeper” into motivations, or policy. But I think it’s worth asking “what would we prefer we had done; how could we have met the needs better?”. We might not be able to get there from here any more, but it’s always (in my opinion) worth knowing what you’d really like and what its characteristics are.  There is also a presumption in the ICANN community that Universal Acceptance of whatever is introduced is desirable and expected, and I fear that there may be real technical or human issues (e.g. readability, phishing) which question that assumption.

* * *

As examples of directions we might have taken, I wonder whether we’d be better off if we do something like the following.

We have an undeniable need to change the net so that it is not Latin-script (really English) centric. The response at the moment is to introduce new domains with names written in other scripts. However, this seems to assume that people will only be exposed to their ‘local’ internet — that names written in Korean will not ‘leak out’ of Korea. I think this is both unrealistic and contrary to the spirit of ‘one web’. In the current regime, anyone in the world can read any email or URL address as long as they can read English. But we seem to be heading towards a world in which everyone will need to be able to read every script, which is, I think, unrealistic. Simple questions — is this the right address? is it plausible? is it phishing? — may be unanswerable if the user cannot read the script(s) the URL is written in.

What could we do?  We could introduce into DNS the possibility that a domain name can have ‘aliases’, and those aliases can be in other script systems (maybe using CNAME/DNAME but probably a new record that has a script code). Then it would be *possible* to take a hostname, and see if aliases can be found for each domain name that are written in a preferred script of the user we’re presenting to, and we’d know that the resulting hostname would be, from a resolution point of view, functionally identical.  (It probably would not be functionally identical if the URL or email address is used as an identifier).  (There are obvious issues here with what the canonical form is, how long such lookup and translation would take, and so on).

This is a ‘thought experiment’, surely not a proposal.

* * *

In summary: do we need to raise the level of discussion in the technical community, and if so, how?


David Singer
Manager, Software Standards, Apple Inc.

Received on Monday, 26 October 2015 00:07:01 UTC