RE: Concerns about new domain names, particularly non-Latin-scripts -- getting the tech community together

It wouldn't hurt to improve guidance around Bidi IRIs.... We're trying to break them into labels and arrange each label in right to left (or left to right) order.  Eg:  a bidi http://www.microsoft.com/some/url.html?this=that would be that=this?html.url/some/com.microsoft.www//:http  (Presumably that'd only be triggered if there was BIDI content in the IRI, and perhaps overridden by user locale or preference).

Our investigation has shown that users expect the precedence to be consistently ordered from left to right or right to left regardless of the RTL/LRT letters in any particular label or component of the IRI.

Of course this leads to other interesting questions.  The one thing that is clear is that "mixing" RTL and LTR runs within the same IRI is just tremendously confusing, and potentially spoofable.

-Shawn

-----Original Message-----
From: Larry Masinter [mailto:masinter@adobe.com] 
Sent: October 27, 2015 10:16 PM
To: David Singer <singer@apple.com>; public-iri@w3.org
Cc: Sam Ruby <rubys@intertwingly.net>
Subject: Re: Concerns about new domain names, particularly non-Latin-scripts -- getting the tech community together


References: you might want to look at the (expired) draft https://tools.ietf.org/html/draft-ruby-url-problem-01
also see issues labeled IETF).

as (dated) background.

Until the implementors of URL processing are willing to work on the many interoperability problems in general, it seems just wishful thinking that they would implement improved presentation of bidi URLs.

I think the best that can be accomplished in the current environment would be to advise ICANN and domain registrars to avoid bidi in URLs.




________________________________________
From: singer@apple.com <singer@apple.com> on behalf of David Singer <singer@apple.com>
Sent: Sunday, October 25, 2015 5:06 PM
To: public-iri@w3.org
Subject: Concerns about new domain names,  particularly non-Latin-scripts -- getting the tech community together

Hi

This was an informal email sent to a perhaps rather random collection of people at the IETF, W3C and Unicode Consortium, to see whether we need to kick off or re-open a conversation, now being re-posted to public-iri to enable conversation there (if the chairs approve).

I think that we have here one of those awkward areas that straddle technology and policy, and historically the technology groups have steered away from 'policy questions'. Unfortunately, I think it's a grey area and some policy answers have technology impacts, and that it's possible to conceive of some that are 'bad for the Internet' or 'break the web'. I think we need to find a way to enable the technical community to get more involved.

* * *

As I am sure you are aware, ICANN has introduced, and will introduce more, top-level domains, of which a number are or will be non-Latin-script.

I have a suspicion that some of the RFCs and other documents that exist were written 'knowing' that the top-level domains were essentially just the historic 6 (com, mil, net, org, edu and arpa) and the geographic ones.

It also seems that some of the treatment of 'structured text' - that has a structure and meaning associated with that structure, such as URLs and mail addresses - was defined assuming that we would not, or did not need to, treat it differently from regular text.

Attached you will find a PDF document (sorry, since appearance is an important part of the discussion, PDF seemed best; I hope that the formatting and so on has not got messed up), outlining some issues we have noticed recently, and concluding with some recommendations based on those issues. I rather suspect that there are more issues than I outline.  I do wonder if we should be taking more positive steps to build up a shared set of test cases as well, that check for resolution, presentation, entry, selection, and other problems in domain names. I am also aware that in some places the Public Suffix List is used for a secondary purpose, as a way to sanity check host names.

Received on Wednesday, 28 October 2015 05:40:44 UTC