Re: Concerns about new domain names, particularly non-Latin-scripts -- getting the tech community together from Larry Masinter on 2015-10-28 (public-iri@w3.org from October 2015)

From: Larry Masinter <masinter@adobe.com>
Date: Wed, 28 Oct 2015 05:16:18 +0000
To: David Singer <singer@apple.com>, "public-iri@w3.org" <public-iri@w3.org>
CC: Sam Ruby <rubys@intertwingly.net>
Message-ID: <DM2PR02MB132237D1A94DA292F1636660C3210@DM2PR02MB1322.namprd02.prod.outlook.com>

References: you might want to look at the (expired) draft
https://tools.ietf.org/html/draft-ruby-url-problem-01
(from https://github.com/webspecs/url -- also see issues
labeled IETF).

as (dated) background.

Until the implementors of URL processing are willing to
work on the many interoperability problems in general,
it seems just wishful thinking that they would implement
improved presentation of bidi URLs.

I think the best that can be accomplished in the
current environment would be to advise ICANN and
domain registrars to avoid bidi in URLs.




________________________________________
From: singer@apple.com <singer@apple.com> on behalf of David Singer <singer@apple.com>
Sent: Sunday, October 25, 2015 5:06 PM
To: public-iri@w3.org
Subject: Concerns about new domain names,  particularly non-Latin-scripts -- getting the tech community together

Hi

This was an informal email sent to a perhaps rather random collection of people at the IETF, W3C and Unicode Consortium, to see whether we need to kick off or re-open a conversation, now being re-posted to public-iri to enable conversation there (if the chairs approve).

I think that we have here one of those awkward areas that straddle technology and policy, and historically the technology groups have steered away from ‘policy questions’. Unfortunately, I think it’s a grey area and some policy answers have technology impacts, and that it’s possible to conceive of some that are ‘bad for the Internet’ or ‘break the web’. I think we need to find a way to enable the technical community to get more involved.

* * *

As I am sure you are aware, ICANN has introduced, and will introduce more, top-level domains, of which a number are or will be non-Latin-script.

I have a suspicion that some of the RFCs and other documents that exist were written ‘knowing’ that the top-level domains were essentially just the historic 6 (com, mil, net, org, edu and arpa) and the geographic ones.

It also seems that some of the treatment of ‘structured text’ — that has a structure and meaning associated with that structure, such as URLs and mail addresses — was defined assuming that we would not, or did not need to, treat it differently from regular text.

Attached you will find a PDF document (sorry, since appearance is an important part of the discussion, PDF seemed best; I hope that the formatting and so on has not got messed up), outlining some issues we have noticed recently, and concluding with some recommendations based on those issues. I rather suspect that there are more issues than I outline.  I do wonder if we should be taking more positive steps to build up a shared set of test cases as well, that check for resolution, presentation, entry, selection, and other problems in domain names. I am also aware that in some places the Public Suffix List is used for a secondary purpose, as a way to sanity check host names.

Received on Wednesday, 28 October 2015 05:16:55 UTC