W3C home > Mailing lists > Public > public-iri@w3.org > May 2010

Re: [bidi] Special ordering for BIDI URLs

From: Mark Davis ☕ <mark@macchiato.com>
Date: Thu, 27 May 2010 15:22:50 -0700
Message-ID: <AANLkTilrxMZs63_Umi5WPwD4x8OrvtowuHWzmpWuii4y@mail.gmail.com>
To: Shawn Steele <Shawn.Steele@microsoft.com>, Adil Allawi <adil@diwan.com>, "public-iri@w3.org" <public-iri@w3.org>, "bidi@unicode.org" <bidi@unicode.org>, Murray Sargent <murrays@exchange.microsoft.com>, "aharon@google.com" <aharon@google.com>, Nasser Kettani <Nasser.Kettani@microsoft.com>
A few comments on various issues.

*1. Market Forces. Make it possible for URLs (actually IRIs) to be
completely RTL*
*
*
A. Shawn raised the issue of .html. As I think about it, there are a couple
of ways to deal with this. First, even currently servers don't need to use
those suffixes: http://unicode.org/reports/ doesn't contain a .html.
Secondly, we could establish equivalences for some Hebrew and Arabic-script
suffixes to take the place of those.
*
*
*2. Specialized BIDI. **Force a consistent order on URLs, using a
higher-level protocol on top of the UBA.*

A. The proponents of specialized reordering really need to come up with a
good story for how to deal with the security and interoperability issues
presented by plaintext applications and non-new-URL-ordering applications.

B. There are actually two variants of this:

a. have the consistent order be LTR.
b. have the consistent order be the paragraph direction.


(a) is a simpler approach technically, since the generated plaintext can
have single direction associated with the label separators. It can be
implemented in display and cut/paste by having LRMs around each label that
contains a RTL character or no LTR characters.

While for users this may not be quite as natural, the most important feature
is having a predictable ordering (the ordering of labels in URLs is already
somewhat screwy, since the domain name is Little-Endian, and the rest is
Big-Endian).

3. New Characters (Adil's proposal).

While an interesting proposal, the problems would be:

   - introducing security risks with the new characters.
   - a significant change to the UBA - and even extremely minor changes have
   caused enough problems that the UTC has grown quite leery of rocking the
   boat.
   - it takes at least a couple of years to get characters accepted by both
   Unicode and ISO.
   - none of the old URL-aware software would handle the new URLs (a problem
   also for the LRM approach).

Mark
Received on Thursday, 27 May 2010 22:23:24 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:51:57 GMT