- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Thu, 05 Nov 2009 20:26:17 +0900
- To: Shawn Steele <Shawn.Steele@microsoft.com>
- CC: "Abdulrahman I. ALGhadir" <aghadir@citc.gov.sa>, Alireza Saleh <saleh@nic.ir>, "muhtaseb@kfupm.edu.sa" <muhtaseb@kfupm.edu.sa>, "idna-update@alvestrand.no" <idna-update@alvestrand.no>, Lisa Dusseault <lisa.dusseault@gmail.com>, "public-iri@w3.org" <public-iri@w3.org>
Hello Shawn, others, [cc-ing public-iri@w3.org, because this is essentially an IRI issue, not (only) an IDN issue] On 2009/11/05 2:57, Shawn Steele wrote: > 2) http://microsoft.com gets displayed http://microsoft.com because in this case the direction between two runs didn't change so :// will take the run direction which is LTR. > > I understand why :) I'm not sure it's right. Certainly http://L1.R2 doesn't render right in IE (R2.http://L1 makes no sense). I could accept that LTR only should maybe do that, but once you have any RTL I think a different rule is needed. > > I think that labels are like a list. If I have a list (a, b, c, d), then I expect the list to be in order a, b, c, d. In an RTL context I would reasonably expect the list to be rendered (d, c, b, a). If the individual values happen to be in different scripts, that's not going to change the fact that I expect the list to have each element progress in an orderly fashion from least significant to most significant. > > So http://R1.L2.L3.R4, I think that the expectation of R4.L3.L2.R1//:http makes sense. The list progresses from 1 through 4. R4.L3.L2.R1//:http makes a lot of sense in particular to people like us who know exactly what the components are, what the syntactically significant boundaries are, and so on. They may or may not make sense to everyday bidi users, the same way http://microsoft.com didn't make any sense to an average computer user around 1993. > Unfortunately the character properties and rendering engines don't help much with that. Indeed they don't help at all. I think there are essentially two ways to deal with this: a) Try to get smart: Invent tweaks to the Unicode Bidi Algorithm, heuristics for detecting IRIs in context, special treatment for IRIs e.g. in browser address fields, and so on. This way, we may be able to improve some specific cases, but that could easily come at the expense of some other cases, and the solutions may not be applied everywhere in the same way, with risks to produce quite a lot of confusion (the same domain looking different in different contexts, and different domains looking the same). b) Try to use a simple and clear way to display IRIs within the context of the Unicode Bidi Algorithm, e.g. as currently specified in RFC 3987 (or a suitable variant thereof if we can agree on it quickly; I think in particular for absolute IRIs, there isn't necessarily a need for requiring an LTR embedding direction). Help people understand how to read these things (groups of consecutive RTL components are read RTL, groups of consecutive LTR components are read LTR, which is *the same way* this is done in plain text with groups of words unless there's an embedding structure). This will reduce the potential for confusion. Average computer users may not have to learn that much (just read these things like you read sentences with words from different directionalities). Specialists such as us may have to work a bit harder, but it may be worth it. Once people get used to it, they will have gotten used to it, the same way they got used to http:// and similar cryptic stuff in the first place. And in most cases, domain names should be RTL.RTL.RTL or LTR.LTR.LTR anyway, and I don't mind if we put a bit more pressure on that. In summary, overall, less may be more, even if it may be difficult to admit for experts like us. Regards, Martin. > -Shawn > _______________________________________________ > Idna-update mailing list > Idna-update@alvestrand.no > http://www.alvestrand.no/mailman/listinfo/idna-update > -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Thursday, 5 November 2009 11:27:19 UTC