- From: Mark Davis ☕ <mark@macchiato.com>
- Date: Tue, 25 May 2010 15:42:39 -0700
- To: Shawn Steele <Shawn.Steele@microsoft.com>
- Cc: "Phillips, Addison" <addison@lab126.com>, "Aharon (Vladimir) Lanin" <aharon@google.com>, Martin J. Dürst <duerst@it.aoyama.ac.jp>, "public-iri@w3.org" <public-iri@w3.org>, "bidi@unicode.org" <bidi@unicode.org>, Murray Sargent <murrays@exchange.microsoft.com>, Nasser Kettani <Nasser.Kettani@microsoft.com>
- Message-ID: <AANLkTilTw3rLrqN32H6mMKmsRkG9JUZxIwk6x8JmxBEh@mail.gmail.com>
I agree that http and html are issues. http is probably not a big one; if the rest of the URL were ok, it wouldn't matter much if that were at the start or end. html (htm, pdf, ...) are more of a problem. It is unclear whether you mean that the ordering of the labels is always LTR, or that the URL is treated as if it were in a LTR context. Mark — Il meglio è l’inimico del bene — On Tue, May 25, 2010 at 13:42, Shawn Steele <Shawn.Steele@microsoft.com>wrote: > “http://” and “.html” would probably make pure RTL IRIs difficult. > Personally, I think a “if it has RTL, then render the pieces in RTL order” > approach is simplest. (so http://a.B.C/d.e.F.html would render as > http://a.B.C/d.e.F.html or html.F.e.d/C.B.a//:http, but not some mixed > form.) I’m probably oversimplifying it though. > > > > -Shawn > > > > *From:* Phillips, Addison [mailto:addison@lab126.com] > *Sent:* Pōʻ, Mei 25, 2010 12:17 PM > *To:* Mark Davis ☕; Aharon (Vladimir) Lanin > > *Cc:* Shawn Steele; Martin J. Dürst; public-iri@w3.org; bidi@unicode.org; > Murray Sargent > *Subject:* RE: [bidi] Re: Special ordering for BIDI URLs > > > > (chair hat off) > > > > Adding RTL scheme identifiers is not going to be wholly effective. Only if > you can have a completely pure RTL URI (all parts: path, query, scheme, > etc.) can you completely avoid ambiguity in display of unadorned plain text > URIs. But I don’t think that’s a reasonable approach: we don’t call them > “bi-directional” languages for no reason. There is a lot of LTR data in the > world that would like to be expressed in a URI. > > > > I see Mark's point that requiring URI-awareness in plain text is a > non-starter. I think limiting to unidirectional IRIs (either all LTR or all > RTL) is a non-starter: there is no migration except for *total* migration. > > > > Thinking about “specialized bidi”, the simplest solution I can think of is: > give URIs an inherent LTR directionality (which is implied, at least, by a > strongly LTR scheme and the tendency of DNS names to be LTR). I think this > is what Slim is suggesting. It means that you would need to insert a > left-to-right override in front of a bidi URI in running plain text, or, in > the case of things like address bars, behave with an inherent LTR reading > order. As a rule this could be understandable to users, and, since URIs > today are in the main ASCII it might be the "least surprising" to users as > they migrate to placing RTL text into a URI. > > > > Here's an experiment (although I used actual Arabic text, I present as > ASCII for convenience here. I use <lro> for the Unicode character). I typed > four URIs: > > > > http://example.com/1CIBARA/2CIBARA > > http://CIBARA.com/1CIBARA/2CIBARA > > <lro>http://example.com/1CIBARA/2CIBARA > > <lro>http://CIBARA.com/1CIBARA/2CIBARA > > > > I typed the above into Notepad and set the reading order to right to left > and saw: > > > > 2CIBARA/1CIBARA/http://example.com > > 2CIBARA/1CIBARA/com.CIBARA//:http > > http://example.com/1CIBARA/2CIBARA > > http://CIBARA.com/1CIBARA/2CIBARA > > > > Note that the first two, in a left-to-right reading order displays as: > > > > http://example.com/2CIBARA/1CIBARA > > http://CIBARA.com/2CIBARA/1CIBARA > > > > The LRO bearing versions display the same in both RTL and LTR contexts, > although the path element order appears backwards to RTL readers. The > unadorned text versions display "normally" (to an RTL reader) only when they > are predominantly right-to-left with isolated LTR runs. They look broken (I > suspect even to RTL readers) when there are successive left to right runs. > > > > One downside is that it doesn't work very well in a markup environment. > Consider: > > > > <a href="<lro>http://CIBARA.com">Is <http://CIBARA.com%22%3eIs> the LRO > part of the uri?</a> > > > > If we are to print URIs on the sides of buses or on napkins under our tea > cups, I'm not sure if it would be that bad to require the left-to-right > reading order inherent in URI today as a "carryover" to IRIs. While > unnatural to RTL speakers in the abstract, perhaps in practice "//:http" > would seem unnatural to users (because they never see URIs like that) and it > doesn't require any knowledge of the interior structure of a URI to apply an > overall reading order in many (but not all) contexts. > > > > I also see the other side of this argument. I must admit that I am in > agreement with the sentiments in the email John Klensin just sent [1]. I > think I tend to favor a solution that is more universal over one that > requires a lot of specialized handling for bidi, but in practice this > ensnares us in the corner cases inherent in UBA and disadvantages, at least > to some degree, speakers of languages written in RTL scripts. > > > > Addison > > > > [1] http://lists.w3.org/Archives/Public/public-iri/2010May/0039.html > > > > Addison Phillips > > Globalization Architect (Lab126) > > Chair (W3C I18N, IETF IRI WGs) > > > > Internationalization is not a feature. > > It is an architecture. > > > > *From:* public-iri-request@w3.org [mailto:public-iri-request@w3.org] *On > Behalf Of *Mark Davis ? > *Sent:* Tuesday, May 25, 2010 11:31 AM > *To:* Aharon (Vladimir) Lanin > *Cc:* Shawn Steele; Martin J. Dürst; public-iri@w3.org; bidi@unicode.org; > Murray Sargent > *Subject:* Re: [bidi] Re: Special ordering for BIDI URLs > > > > It looks like we are having some useful discussions. Let me try to clarify > a bit of what I said. My original message was getting longish, and I know > people's eyes glaze when it gets too long, so I think I wasn't clear on a > couple of matters. > > > > At a high level, there are two choices (as far as I know): > > > > *1. Market Forces.* Make it possible for URLs (actually IRIs) to be > completely RTL, and push sites and programs to use them. Note that part of > this can be adding mechanisms to URL-aware programs to flag to users when > BIDI reordering is changing the order of labels, such as flagging them with > a special format. > > > > *2. Specialized BIDI. *Force a consistent order on URLs, using a > higher-level protocol on top of the UBA. > > > > You mention %, which is relevant to #1 and RLM/LRMs, which are relevant to > #2. > > > > > > *A. *As far as % goes, what that means is that every label can be > constructed so as to contain no LTR characters. By "label", I mean in a > broad sense, so each of the three letter sequences below counts as a label. > > > > http://abc.def.ghi/jkl/mno?pqr=stu&vwx=yza#bcd > > > > (The scheme is an exception: it has problems that Martin and John point > out, but if that alone is LTR, it is not too bad; people can handle that > being reordered if it is limited to it.) > > > > The % is an issue, although in an ideal world its use would be minimized in > what the user sees. Although the characters have to be % encoded or > punycoded to go over the web, they can be restored for display to the user. > That is, only occurring in a label where the character would have to be > quoted in order to not have the label be terminated. We can discuss how to > handle the cases where they cannot be minimized; how sites can work around > it, whether the remaining cases represent a significant problem, and if so, > whether there is some alternative syntax that could be used. > > > > Where the query string contains LTR characters, there are a couple of > choices. For most people, the query part is just technical gorp. And > websites are able to put whatever they want into those strings; their > interpretation is private to that site. So there are a couple of approaches > (at least): > > - Not really bother with it: if it contains LTR characters then it > reorders in a funny way, but since it is technical gorp we don't care. A > - Have some simple standardized way of mapping LTR characters in the > query part into bidi characters that sites can use if they want to be wholly > RTL. > > > > *B. *As far as RLM/LRMs, they are relevant to the Specialized BIDI > approach. (As I said before, I have doubts as to whether this approach is > viable, but it is worth pursuing how it could be). > > > > What we recommend in the UBA is that if people are going to override the > BIDI algorithm for any purpose, that they effectively do so by the insertion > of bidi controls (we should make that recommendation clearer, however). So > how would this play out with URLs? > > 1. I type a URL into an address bar. Since the program is URL-aware*, > it parses out the labels. Based on whatever standard mechanism is defined > (eg the URL contains a RTL character), it is detected as a BIDI label, and > ordered consistently. Effectively, that is done by inserting RLM at the > start of each label that doesn't begin with a RTL character and at the end > of each label that doesn't end with a RTL character. One could use the > embedding codes, but they are more dangerous. > 2. This is the display form: when the URL is looked up, the RLMs have > to be stripped before it is transformed into punycode and %escaped. > 3. If I cut or copy that URL, then the RLMs go with it into plain text > on the clipboard. > 4. When I paste that address into plain text, it then appears in the > same order as it was in the address bar. > > Take another case: > > 1. I see a URL in some plain text (whether or not it is consistently > ordered), and cut and paste that plaintext URL into an address bar (or other > URL-aware* program). In that case, the program *renormalizes* the URL. > That is, it strips out all bidi controls, and then reapplies the BIDI > detection and RLM insertion. I then end up with consistent ordering in the > result. > > Note that in no cases would we expect people to manually put in the RLMs. > > > > By URL-aware*, I mean that not only is it able to parse out URLs, but it > also applies the special ordering. Initially, there are no such programs. > And there are many problems with this approach: the old URL-aware programs > would choke on the RLMs; old programs would behave differently from new > programs; &c. > > > > Mark >
Received on Tuesday, 25 May 2010 22:43:14 UTC