W3C home > Mailing lists > Public > public-iri@w3.org > May 2010

Re: [bidi] Special ordering for BIDI URLs

From: Ted Hardie <ted.ietf@gmail.com>
Date: Thu, 27 May 2010 15:54:02 -0700
Message-ID: <AANLkTim89v40Nxh6kaiI3QqygAAwW8kNsfd22uDaaX2P@mail.gmail.com>
To: Mark Davis ☕ <mark@macchiato.com>
Cc: Shawn Steele <Shawn.Steele@microsoft.com>, Adil Allawi <adil@diwan.com>, "public-iri@w3.org" <public-iri@w3.org>, "bidi@unicode.org" <bidi@unicode.org>, Murray Sargent <murrays@exchange.microsoft.com>, "aharon@google.com" <aharon@google.com>, Nasser Kettani <Nasser.Kettani@microsoft.com>
Some questions and comments in-line.

regards,

Ted Hardie

On Thu, May 27, 2010 at 3:22 PM, Mark Davis ☕ <mark@macchiato.com> wrote:
> A few comments on various issues.
> 1. Market Forces. Make it possible for URLs (actually IRIs) to be completely
> RTL
> A. Shawn raised the issue of .html. As I think about it, there are a couple
> of ways to deal with this. First, even currently servers don't need to use
> those suffixes: http://unicode.org/reports/ doesn't contain a .html.

 Commonly those suffixes are currently used to indicate the MIME
type of the resource; while the page replacing a directory listing may
be returned without the underlying resource name being displayed, it
would be a big
change to require it be removed for all listings.  Look at your page
and the referenced files, e.g. http://www.unicode.org/reports/tr36/tr36-8.html
It could be eliminated and the MIME type returned in the HTTP headers,
but this is both a big user education issue and will hinder some existing uses.

> Secondly, we could establish equivalences for some Hebrew and Arabic-script
> suffixes to take the place of those.

This is certainly possible, but it makes the interoperability more
fragile if those
are substitutable at the identifier layer, as now MIME dispatch must need
to recognize both; so must users who decide whether or not to download
something based on the MIME type (e.g. ignoring PDFs on a smartphone).

> 2. Specialized BIDI. Force a consistent order on URLs, using a higher-level
> protocol on top of the UBA.
> A. The proponents of specialized reordering really need to come up with a
> good story for how to deal with the security and interoperability issues
> presented by plaintext applications and non-new-URL-ordering applications.
> B. There are actually two variants of this:
>
> a. have the consistent order be LTR.

I'm trying to understand what this would mean for domain name registration.  If
a registrant wishes to register some string that is commonly RTL, does
this mean
they register it in RTL order but expect it to be display in LTR
inside IRI contexts?
Or do they register it in LTR order, since LTR order is what the URI
will display?
If it is truly bidirectiona; (because it contains both Hindu-Arabic numerals and
RTL characters within a label), what do we expect to be registered?

 b. have the consistent order be the paragraph direction.

>
> (a) is a simpler approach technically, since the generated plaintext can
> have single direction associated with the label separators. It can be
> implemented in display and cut/paste by having LRMs around each label that
> contains a RTL character or no LTR characters.


> While for users this may not be quite as natural, the most important feature
> is having a predictable ordering (the ordering of labels in URLs is already
> somewhat screwy, since the domain name is Little-Endian, and the rest is
> Big-Endian).

So, I think we may be about to get in trouble in re:  the term "labels".  When
you use that term, I assume you mean DNS labels, but your "the rest is
Big-Endian" confused me here.  Do you mean each of the parts of a URI?

regards,

Ted Hardie

> 3. New Characters (Adil's proposal).
> While an interesting proposal, the problems would be:
>
> introducing security risks with the new characters.
> a significant change to the UBA - and even extremely minor changes have
> caused enough problems that the UTC has grown quite leery of rocking the
> boat.
> it takes at least a couple of years to get characters accepted by both
> Unicode and ISO.
> none of the old URL-aware software would handle the new URLs (a problem also
> for the LRM approach).
>
> Mark
Received on Thursday, 27 May 2010 22:54:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:51:57 GMT