Re: BIDI : tackling the delimiter weirdness

Various people have considered a special reordering of labels before. The
problem is that while in the address bar or other special locations one
could have a special handling for the order of the labels, it is really bad
if the labels aren't in the same order everywhere the URL could appear - the
spoofing possibilities are unpleasant. And *everywhere* means in everyone's
address bar in every browser, and in plaintext, and in emailers, and so on.

Mark


On Wed, Jan 27, 2010 at 15:44, Shawn Steele <Shawn.Steele@microsoft.com>wrote:

> I'm not sure that solves the problem.  Specifically with these examples:
>
>   Logical representation: "http://ab.CDE.FGH/ij/kl/mn/op.html"
>   Visual representation: "http://ab.HGF.EDC/ij/kl/mn/op.html"
>
> "real users" seem to get confused by the HGF.EDC behavior, and instead
> expect the data to have the hierarchy remain in a consistent direction, eg:
> http://ab.EDC.HGF/ij/kl/mn/op.html seems to be the expected behavior.  As
> far as I can tell, the swapping of the 2 in the hierarchy is not intuitive
> to those that don't understand the Unicode bidi algorithm.  There also seems
> to be little variation in the users expectations in this respect.  This is
> arrived at from feedback from the Saudi gov't on IE's IDN behavior, and some
> casual user feedback.  We have yet to conduct more formal usability testing.
>
> Furthermore it isn't clear to me that users in a bidi context would really
> prefer the individual labels/elements to be represented with the hierarchy
> reading from LTR.  This is less clear though.  Specifically it was suggested
> that the elements render from RTL if they include RTL elements, eg:
> html.op/mn/kl/ij/HGF.CDE.ab//:http  -  As I said, this expectation seems
> less certain.  Furthermore some users expressed a desire for even ASCII URLs
> to read in RTL order when displayed in a RTL machine with RTL UI, eg:
> com.microsoft//:http.
>
> I don't think that it's appropriate for the WG as "engineers" to state
> what's best here, I believe we have biases and an understanding of computers
> not available to the average user.  I'd prefer a real usability study to
> determine the user expectations:
>
> * Validate that the list/hierarchy model fits user expectations.
> * Determine whether that list should be displayed in LTR or RTL when the
> list contains elements that are RTL.
> * Determine if there are times when a general LTR or RTL directionality of
> the list elements are unexpected.  (eg: all-ASCII, but on an RTL system).
>
> -Shawn
>
> SSDE
> Windows UX
> Microsoft
>
> -----Original Message-----
> From: public-iri-request@w3.org [mailto:public-iri-request@w3.org] On
> Behalf Of Slim Amamou
> Sent: Poʻakolu, Ianuali 27, 2010 9:53 AM
> To: public-iri@w3.org
> Subject: BIDI : tackling the delimiter weirdness
>
> Hello everybody,
> congratulations for the WG.
>
> Sometimes BIDI IRIs look really weird. For instance, the most advanced
> examples in section 4.4, beginning with example 5, are really confusing for
> an Arabic script reader like me. But I had time to think about it since 2007
> when IDN wiki first started, and I think I nailed the problem and I am
> coming with a proposition.
>
> http://www.ietf.org/id/draft-duerst-iri-bis-07.txt
>
> section 4.2.  Bidi IRI Structure
> >
> >   (...) some restrictions on bidirectional IRIs
> >   are necessary.  These restrictions are given in terms of delimiters
> >   (structural characters, mostly punctuation such as "@", ".", ":",
> > and
> >   "/") and components (usually consisting mostly of letters and
> >   digits).
>
> Delimiters are at the core of the issue. I suggest a more in depth
> explanation of their usage in conjunction with components. For most IRI
> schemas, delimiters define a relationship between their left component and
> their right component. Most of the time this relationship is a hierarchical
> relationship.
>
> ex. for http: the "/" defines a hierarchy between the path components
> whereas A/B/C means actually : A includes B which in turn includes C .
> Note here that the inclusion relationship is *directional* : left component
> includes right component and thus the "/" delimiter in the
> http: schema has a LTR "directionality". It is this directionality which is
> broken by the examples in the IRI and which creates confusion.
>
> Another ex. in domain names, the "." delimiter also defines a hierarchy but
> this time the directionality is RTL.
>
> I think the IRI draft should state that schema definitions MUST define
> their delimiters relationships and directionality. That would solve the
> problem.
>
> section 4.4.  Examples
> > (...)
> >   Example 5: Example 2, applied to components of different kinds:
> >   Logical representation: "http://ab.cd.EF/GH/ij/kl.html"
> >   Visual representation: "http://ab.cd.HG/FE/ij/kl.html"
> >   The inversion of the domain name label and the path component may be
> >   unexpected, but it is consistent with other bidi behavior.  For
> >   reassurance that the domain component really is "ab.cd.EF", it may
> > be
> >   helpful to read aloud the visual representation following the bidi
> >   algorithm.  After "http://ab.cd." one reads the RTL block
> >   "E-F-slash-G-H", which corresponds to the logical representation.
> >
> >   Example 6: Same as Example 5, with more rtl components:
> >   Logical representation: "http://ab.CD.EF/GH/IJ/kl.html"
> >   Visual representation: "http://ab.JI/HG/FE.DC/kl.html"
> >   The inversion of the domain name labels and the path components may
> >   be easier to identify because the delimiters also move.
> >
> >   Example 7: A single rtl component includes digits:
> >   Logical representation: "http://ab.CDE123FGH.ij/kl/mn/op.html"
> >   Visual representation: "http://ab.HGF123EDC.ij/kl/mn/op.html"
> >   Numbers are written ltr in all cases but are treated as an
> > additional
> >   embedding inside a run of rtl characters.  This is completely
> >   consistent with usual bidirectional text.
> >
> >   Example 8 (not allowed): Numbers are at the start or end of an rtl
> >   component:
> >   Logical representation: "http://ab.cd.ef/GH1/2IJ/KL.html"
> >   Visual representation: "http://ab.cd.ef/LK/JI1/2HG.html"
> >   The sequence "1/2" is interpreted by the bidi algorithm as a
> >   fraction, fragmenting the components and leading to confusion.
> > There
> >   are other characters that are interpreted in a special way close to
> >   numbers; in particular, "+", "-", "#", "$", "%", ",", ".", and ":".
> >
> >   Example 9 (not allowed): The numbers in the previous example are
> >   percent-encoded:
> >   Logical representation: "http://ab.cd.ef/GH%31/%32IJ/KL.html",
> >   Visual representation: "http://ab.cd.ef/LK/JI%32/%31HG.html"
> >
> >   Example 10 (allowed but not recommended):
> >   Logical representation: "http://ab.CDEFGH.123/kl/mn/op.html"
> >   Visual representation: "http://ab.123.HGFEDC/kl/mn/op.html"
> >   Components consisting of only numbers are allowed (it would be
> > rather
> >   difficult to prohibit them), but these may interact with adjacent
> > RTL
> >   components in ways that are not easy to predict.
> >
> >   Example 11 (allowed but not recommended):
> >   Logical representation: "http://ab.CDEFGH.123ij/kl/mn/op.html"
> >   Visual representation: "http://ab.123.HGFEDCij/kl/mn/op.html"
> >   Components consisting of numbers and left-to-right characters are
> >   allowed, but these may interact with adjacent RTL components in ways
> >   that are not easy to predict.
>
>
> --
> Slim Amamou | سليم عمامو
> http://alixsys.com
>
>
>

Received on Thursday, 28 January 2010 02:06:36 UTC