- From: Ted Hardie <ted.ietf@gmail.com>
- Date: Thu, 27 May 2010 15:54:02 -0700
- To: Mark Davis ☕ <mark@macchiato.com>
- Cc: Shawn Steele <Shawn.Steele@microsoft.com>, Adil Allawi <adil@diwan.com>, "public-iri@w3.org" <public-iri@w3.org>, "bidi@unicode.org" <bidi@unicode.org>, Murray Sargent <murrays@exchange.microsoft.com>, "aharon@google.com" <aharon@google.com>, Nasser Kettani <Nasser.Kettani@microsoft.com>
Some questions and comments in-line. regards, Ted Hardie On Thu, May 27, 2010 at 3:22 PM, Mark Davis ☕ <mark@macchiato.com> wrote: > A few comments on various issues. > 1. Market Forces. Make it possible for URLs (actually IRIs) to be completely > RTL > A. Shawn raised the issue of .html. As I think about it, there are a couple > of ways to deal with this. First, even currently servers don't need to use > those suffixes: http://unicode.org/reports/ doesn't contain a .html. Commonly those suffixes are currently used to indicate the MIME type of the resource; while the page replacing a directory listing may be returned without the underlying resource name being displayed, it would be a big change to require it be removed for all listings. Look at your page and the referenced files, e.g. http://www.unicode.org/reports/tr36/tr36-8.html It could be eliminated and the MIME type returned in the HTTP headers, but this is both a big user education issue and will hinder some existing uses. > Secondly, we could establish equivalences for some Hebrew and Arabic-script > suffixes to take the place of those. This is certainly possible, but it makes the interoperability more fragile if those are substitutable at the identifier layer, as now MIME dispatch must need to recognize both; so must users who decide whether or not to download something based on the MIME type (e.g. ignoring PDFs on a smartphone). > 2. Specialized BIDI. Force a consistent order on URLs, using a higher-level > protocol on top of the UBA. > A. The proponents of specialized reordering really need to come up with a > good story for how to deal with the security and interoperability issues > presented by plaintext applications and non-new-URL-ordering applications. > B. There are actually two variants of this: > > a. have the consistent order be LTR. I'm trying to understand what this would mean for domain name registration. If a registrant wishes to register some string that is commonly RTL, does this mean they register it in RTL order but expect it to be display in LTR inside IRI contexts? Or do they register it in LTR order, since LTR order is what the URI will display? If it is truly bidirectiona; (because it contains both Hindu-Arabic numerals and RTL characters within a label), what do we expect to be registered? b. have the consistent order be the paragraph direction. > > (a) is a simpler approach technically, since the generated plaintext can > have single direction associated with the label separators. It can be > implemented in display and cut/paste by having LRMs around each label that > contains a RTL character or no LTR characters. > While for users this may not be quite as natural, the most important feature > is having a predictable ordering (the ordering of labels in URLs is already > somewhat screwy, since the domain name is Little-Endian, and the rest is > Big-Endian). So, I think we may be about to get in trouble in re: the term "labels". When you use that term, I assume you mean DNS labels, but your "the rest is Big-Endian" confused me here. Do you mean each of the parts of a URI? regards, Ted Hardie > 3. New Characters (Adil's proposal). > While an interesting proposal, the problems would be: > > introducing security risks with the new characters. > a significant change to the UBA - and even extremely minor changes have > caused enough problems that the UTC has grown quite leery of rocking the > boat. > it takes at least a couple of years to get characters accepted by both > Unicode and ISO. > none of the old URL-aware software would handle the new URLs (a problem also > for the LRM approach). > > Mark
Received on Thursday, 27 May 2010 22:54:34 UTC