W3C home > Mailing lists > Public > public-iri@w3.org > May 2010

Re: [bidi] Special ordering for BIDI URLs

From: Mark Davis ☕ <mark@macchiato.com>
Date: Thu, 27 May 2010 15:22:50 -0700
Message-ID: <AANLkTilrxMZs63_Umi5WPwD4x8OrvtowuHWzmpWuii4y@mail.gmail.com>
To: Shawn Steele <Shawn.Steele@microsoft.com>, Adil Allawi <adil@diwan.com>, "public-iri@w3.org" <public-iri@w3.org>, "bidi@unicode.org" <bidi@unicode.org>, Murray Sargent <murrays@exchange.microsoft.com>, "aharon@google.com" <aharon@google.com>, Nasser Kettani <Nasser.Kettani@microsoft.com>
A few comments on various issues.

*1. Market Forces. Make it possible for URLs (actually IRIs) to be
completely RTL*
A. Shawn raised the issue of .html. As I think about it, there are a couple
of ways to deal with this. First, even currently servers don't need to use
those suffixes: http://unicode.org/reports/ doesn't contain a .html.
Secondly, we could establish equivalences for some Hebrew and Arabic-script
suffixes to take the place of those.
*2. Specialized BIDI. **Force a consistent order on URLs, using a
higher-level protocol on top of the UBA.*

A. The proponents of specialized reordering really need to come up with a
good story for how to deal with the security and interoperability issues
presented by plaintext applications and non-new-URL-ordering applications.

B. There are actually two variants of this:

a. have the consistent order be LTR.
b. have the consistent order be the paragraph direction.

(a) is a simpler approach technically, since the generated plaintext can
have single direction associated with the label separators. It can be
implemented in display and cut/paste by having LRMs around each label that
contains a RTL character or no LTR characters.

While for users this may not be quite as natural, the most important feature
is having a predictable ordering (the ordering of labels in URLs is already
somewhat screwy, since the domain name is Little-Endian, and the rest is

3. New Characters (Adil's proposal).

While an interesting proposal, the problems would be:

   - introducing security risks with the new characters.
   - a significant change to the UBA - and even extremely minor changes have
   caused enough problems that the UTC has grown quite leery of rocking the
   - it takes at least a couple of years to get characters accepted by both
   Unicode and ISO.
   - none of the old URL-aware software would handle the new URLs (a problem
   also for the LRM approach).

Received on Thursday, 27 May 2010 22:23:24 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:39:41 UTC