- From: Mark Davis ☕ <mark@macchiato.com>
- Date: Wed, 27 Jan 2010 18:06:02 -0800
- To: Shawn Steele <Shawn.Steele@microsoft.com>
- Cc: Slim Amamou <slim@alixsys.com>, "public-iri@w3.org" <public-iri@w3.org>
- Message-ID: <30b660a21001271806w743e7f10kcfece2e6f3f44c76@mail.gmail.com>
Various people have considered a special reordering of labels before. The problem is that while in the address bar or other special locations one could have a special handling for the order of the labels, it is really bad if the labels aren't in the same order everywhere the URL could appear - the spoofing possibilities are unpleasant. And *everywhere* means in everyone's address bar in every browser, and in plaintext, and in emailers, and so on. Mark On Wed, Jan 27, 2010 at 15:44, Shawn Steele <Shawn.Steele@microsoft.com>wrote: > I'm not sure that solves the problem. Specifically with these examples: > > Logical representation: "http://ab.CDE.FGH/ij/kl/mn/op.html" > Visual representation: "http://ab.HGF.EDC/ij/kl/mn/op.html" > > "real users" seem to get confused by the HGF.EDC behavior, and instead > expect the data to have the hierarchy remain in a consistent direction, eg: > http://ab.EDC.HGF/ij/kl/mn/op.html seems to be the expected behavior. As > far as I can tell, the swapping of the 2 in the hierarchy is not intuitive > to those that don't understand the Unicode bidi algorithm. There also seems > to be little variation in the users expectations in this respect. This is > arrived at from feedback from the Saudi gov't on IE's IDN behavior, and some > casual user feedback. We have yet to conduct more formal usability testing. > > Furthermore it isn't clear to me that users in a bidi context would really > prefer the individual labels/elements to be represented with the hierarchy > reading from LTR. This is less clear though. Specifically it was suggested > that the elements render from RTL if they include RTL elements, eg: > html.op/mn/kl/ij/HGF.CDE.ab//:http - As I said, this expectation seems > less certain. Furthermore some users expressed a desire for even ASCII URLs > to read in RTL order when displayed in a RTL machine with RTL UI, eg: > com.microsoft//:http. > > I don't think that it's appropriate for the WG as "engineers" to state > what's best here, I believe we have biases and an understanding of computers > not available to the average user. I'd prefer a real usability study to > determine the user expectations: > > * Validate that the list/hierarchy model fits user expectations. > * Determine whether that list should be displayed in LTR or RTL when the > list contains elements that are RTL. > * Determine if there are times when a general LTR or RTL directionality of > the list elements are unexpected. (eg: all-ASCII, but on an RTL system). > > -Shawn > > SSDE > Windows UX > Microsoft > > -----Original Message----- > From: public-iri-request@w3.org [mailto:public-iri-request@w3.org] On > Behalf Of Slim Amamou > Sent: Poʻakolu, Ianuali 27, 2010 9:53 AM > To: public-iri@w3.org > Subject: BIDI : tackling the delimiter weirdness > > Hello everybody, > congratulations for the WG. > > Sometimes BIDI IRIs look really weird. For instance, the most advanced > examples in section 4.4, beginning with example 5, are really confusing for > an Arabic script reader like me. But I had time to think about it since 2007 > when IDN wiki first started, and I think I nailed the problem and I am > coming with a proposition. > > http://www.ietf.org/id/draft-duerst-iri-bis-07.txt > > section 4.2. Bidi IRI Structure > > > > (...) some restrictions on bidirectional IRIs > > are necessary. These restrictions are given in terms of delimiters > > (structural characters, mostly punctuation such as "@", ".", ":", > > and > > "/") and components (usually consisting mostly of letters and > > digits). > > Delimiters are at the core of the issue. I suggest a more in depth > explanation of their usage in conjunction with components. For most IRI > schemas, delimiters define a relationship between their left component and > their right component. Most of the time this relationship is a hierarchical > relationship. > > ex. for http: the "/" defines a hierarchy between the path components > whereas A/B/C means actually : A includes B which in turn includes C . > Note here that the inclusion relationship is *directional* : left component > includes right component and thus the "/" delimiter in the > http: schema has a LTR "directionality". It is this directionality which is > broken by the examples in the IRI and which creates confusion. > > Another ex. in domain names, the "." delimiter also defines a hierarchy but > this time the directionality is RTL. > > I think the IRI draft should state that schema definitions MUST define > their delimiters relationships and directionality. That would solve the > problem. > > section 4.4. Examples > > (...) > > Example 5: Example 2, applied to components of different kinds: > > Logical representation: "http://ab.cd.EF/GH/ij/kl.html" > > Visual representation: "http://ab.cd.HG/FE/ij/kl.html" > > The inversion of the domain name label and the path component may be > > unexpected, but it is consistent with other bidi behavior. For > > reassurance that the domain component really is "ab.cd.EF", it may > > be > > helpful to read aloud the visual representation following the bidi > > algorithm. After "http://ab.cd." one reads the RTL block > > "E-F-slash-G-H", which corresponds to the logical representation. > > > > Example 6: Same as Example 5, with more rtl components: > > Logical representation: "http://ab.CD.EF/GH/IJ/kl.html" > > Visual representation: "http://ab.JI/HG/FE.DC/kl.html" > > The inversion of the domain name labels and the path components may > > be easier to identify because the delimiters also move. > > > > Example 7: A single rtl component includes digits: > > Logical representation: "http://ab.CDE123FGH.ij/kl/mn/op.html" > > Visual representation: "http://ab.HGF123EDC.ij/kl/mn/op.html" > > Numbers are written ltr in all cases but are treated as an > > additional > > embedding inside a run of rtl characters. This is completely > > consistent with usual bidirectional text. > > > > Example 8 (not allowed): Numbers are at the start or end of an rtl > > component: > > Logical representation: "http://ab.cd.ef/GH1/2IJ/KL.html" > > Visual representation: "http://ab.cd.ef/LK/JI1/2HG.html" > > The sequence "1/2" is interpreted by the bidi algorithm as a > > fraction, fragmenting the components and leading to confusion. > > There > > are other characters that are interpreted in a special way close to > > numbers; in particular, "+", "-", "#", "$", "%", ",", ".", and ":". > > > > Example 9 (not allowed): The numbers in the previous example are > > percent-encoded: > > Logical representation: "http://ab.cd.ef/GH%31/%32IJ/KL.html", > > Visual representation: "http://ab.cd.ef/LK/JI%32/%31HG.html" > > > > Example 10 (allowed but not recommended): > > Logical representation: "http://ab.CDEFGH.123/kl/mn/op.html" > > Visual representation: "http://ab.123.HGFEDC/kl/mn/op.html" > > Components consisting of only numbers are allowed (it would be > > rather > > difficult to prohibit them), but these may interact with adjacent > > RTL > > components in ways that are not easy to predict. > > > > Example 11 (allowed but not recommended): > > Logical representation: "http://ab.CDEFGH.123ij/kl/mn/op.html" > > Visual representation: "http://ab.123.HGFEDCij/kl/mn/op.html" > > Components consisting of numbers and left-to-right characters are > > allowed, but these may interact with adjacent RTL components in ways > > that are not easy to predict. > > > -- > Slim Amamou | سليم عمامو > http://alixsys.com > > >
Received on Thursday, 28 January 2010 02:06:36 UTC