Re: [bidi] BIDI? from Adil Allawi on 2011-06-07 (public-iri@w3.org from June 2011)

From: Adil Allawi <adil@diwan.com>
Date: Tue, 07 Jun 2011 14:14:09 +0100
To: public-iri@w3.org
Message-ID: <4DEE2421.1090706@diwan.com>
Following is a sunmary of the discussion so far.

I tried to write it as text but got lost in the details so I created a 
mind map. You can see it as an image here: 
http://ironymark.diwan.com/2011/06/the-trials-of-bidi-iris/ and the text 
of the map is below. Please tell me if you see any omissions or want to 
add any points:

    * Bidi Iri
          o Consistency
                + usage across different applications
                + copy the contents of an address bar into an email
          o Migration
                + There will be a long migration period, so making sure
                  that the negative effects are mitigated as much as
                  possible.
          o Usability
                + simple, comprehensible way to recognize IRIs in plaintext
                + easy and unambiguous human translation from a
                  displayed IRI (napkin, bus side) to the corresponding
                  logical string
          o Security

                WWW.HACKERS.COM/com.bank.www
                <http://WWW.HACKERS.COM/com.bank.www> would be displayed
                as www.bank.com/MOC.SREKCAH.WWW
                <http://www.bank.com/MOC.SREKCAH.WWW>, the same as
                www.bank.com/COM.HACKERS.WWW
                <http://www.bank.com/COM.HACKERS.WWW>.

                Furthermore,
                http://WWW.HACKERS.COM?path/boring/and/long/very/a/com.bank.www//:http
                <http://WWW.HACKERS.COM/?path/boring/and/long/very/a/com.bank.www//:http> would
                be displayed as
                http://www.bank.com/a/very/long/and/boring/path?MOC.SREKCAH.WWW//:http,
                the same as
                http://www.bank.com/a/very/long/and/boring/path?COM.HACKERS.WWW//:http.

          o *Proposal 1: UBA Extension -*/the “fields” of an IRI to flow
            in a consistent direction./
                + A /separator/ is defined as any instance of the quoted
                  strings in the bidi_IRI BNF:

                      right after a scheme: “://”
                      in a domain: IDNSep
                      right after a domain: “/” , “?”, “#”
                      in a path: “/”
                      right after a path: “?”, “#”
                      in a query: “=” or “&”
                      right after a query: “#”

                + A /field/ is defined as any text between separators,
                  or at the front or end.
                + Ordering options:
                     1. Each bidi_IRI is displayed with fields from left
                        to right. Thus the following will always appear
                        with the same display, whether in a RTL or LTR
                        environment.

                            http://ab.cd.com/mn/op
                            http://ab.cd.*FE.HG.*com/*JI/LK/*mn/op
                            http://*FE.HG/JI/LK*

                     2. the ordering of fields could be subject to the
                        environment (whether the current embedding level
                        is RTL or LTR). In that case, the display would
                        be something like:

                            LTR:
                            http://ab.cd.com/mn/op
                            http://ab.cd.*FE.HG.*com/*JI/LK/*mn/op
                            http://FE.HG/JI/LK
                            RTL:
                            op/mn/com.cd.ab//:http
                            op/mn*/LK/JI*/com*.HG.FE*.cd.ab//:http
                            *LK/JI/HG.FE*//:http

                     3. the ordering not depend only on the environment,
                        but instead depend on whether there were any RTL
                        characters in the IRI.

                            http://ab.cd.com/mn/op
                            op/mn*/LK/JI*/com*.HG.FE*.cd.ab//:http
                            *LK/JI/HG.FE*//:http

                + Method:
                      # the entire bidi_IRI is embedded in <LRE>...<PDF>
                      # each field is surrounded by LRMs or RLMs
                        depending on the main direction.
          o Definition of a Bidi-IRI
                + characters

                      *bidiIri* := ((scheme “://” domain) | domain2)
                      (“/” path)? (“?” query)? (“#” fragment)?
                      *domain* := UTS46Chars + ( IDNSep UTS46Chars+)*
                      IDNSep?
                      *domain2* := domain IDNSep TLD IDNSep?
                      *path* := (char - “?” - “#”)*
                      *query* := (char - “#”)*
                      *fragment* := char*
                      *IDNSep* := [\u002E \uFF0E \u3002\uFF61] // see
                      http://unicode.org/reports/tr46/#Notation
                      <http://unicode.org/reports/tr46/#Notation>
                      *TLD* := <list on
                      http://www.iana.org/domains/root/db/>
                      *char* := percentEncodedUTF8
                                |
                      [[:L:][:N:][:M:][:S:][:Pd:][:Pc:][:Cf:]
                      inclusionChar - exclusionChar]
                      *inclusionChar* :=
                      U+0021
                      <http://unicode.org/cldr/utility/character.jsp?a=0021> (
                      ! ) EXCLAMATION MARK
                      U+0022
                      <http://unicode.org/cldr/utility/character.jsp?a=0022> (
                      " ) QUOTATION MARK
                      U+0023
                      <http://unicode.org/cldr/utility/character.jsp?a=0023> (
                      # ) NUMBER SIGN
                      U+0025
                      <http://unicode.org/cldr/utility/character.jsp?a=0025> (
                      % ) PERCENT SIGN
                      U+0026
                      <http://unicode.org/cldr/utility/character.jsp?a=0026> (
                      & ) AMPERSAND
                      U+0027
                      <http://unicode.org/cldr/utility/character.jsp?a=0027> (
                      ' ) APOSTROPHE
                      U+002A
                      <http://unicode.org/cldr/utility/character.jsp?a=002A> (
                      * ) ASTERISK
                      U+002C
                      <http://unicode.org/cldr/utility/character.jsp?a=002C> (
                      , ) COMMA
                      U+002E
                      <http://unicode.org/cldr/utility/character.jsp?a=002E> (
                      . ) FULL STOP
                      U+002F
                      <http://unicode.org/cldr/utility/character.jsp?a=002F> (
                      / ) SOLIDUS
                      U+003A
                      <http://unicode.org/cldr/utility/character.jsp?a=003A> (
                      : ) COLON
                      U+003B
                      <http://unicode.org/cldr/utility/character.jsp?a=003B> (
                      ; ) SEMICOLON
                      U+003F
                      <http://unicode.org/cldr/utility/character.jsp?a=003F> (
                      ? ) QUESTION MARK
                      U+0040
                      <http://unicode.org/cldr/utility/character.jsp?a=0040> (
                      @ ) COMMERCIAL AT
                      U+005C
                      <http://unicode.org/cldr/utility/character.jsp?a=005C> (
                      \ ) REVERSE SOLIDUS
                      U+00A1
                      <http://unicode.org/cldr/utility/character.jsp?a=00A1> (
                      ¡ ) INVERTED EXCLAMATION MARK
                      U+00B7
                      <http://unicode.org/cldr/utility/character.jsp?a=00B7> (
                      · ) MIDDLE DOT
                      U+00BF
                      <http://unicode.org/cldr/utility/character.jsp?a=00BF> (
                      ¿ ) INVERTED QUESTION MARK
                      *exclusionChar* :=
                      U+003C
                      <http://unicode.org/cldr/utility/character.jsp?a=003C> (
                      < ) LESS-THAN SIGN
                      U+003E
                      <http://unicode.org/cldr/utility/character.jsp?a=003E> (
                       > ) GREATER-THAN SIGN

                + termination

                      1.Unassigned, surrogates, private-use, control codes
                      Whitespace
                      Open, close or most ‘other’ punctuation, plus
                      special cases < and >.

                      U+003C
                      <http://unicode.org/cldr/utility/character.jsp?a=003C>
                      ( < ) LESS-THAN SIGN
                      U+003E
                      <http://unicode.org/cldr/utility/character.jsp?a=003E>
                      ( > ) GREATER-THAN SIGN
                      U+0028
                      <http://unicode.org/cldr/utility/character.jsp?a=0028>
                      ( ( ) LEFT PARENTHESIS
                      U+0029
                      <http://unicode.org/cldr/utility/character.jsp?a=0029>
                      ( ) ) RIGHT PARENTHESIS
                      U+005B
                      <http://unicode.org/cldr/utility/character.jsp?a=005B>
                      ( [ ) LEFT SQUARE BRACKET
                      U+005D
                      <http://unicode.org/cldr/utility/character.jsp?a=005D>
                      ( ] ) RIGHT SQUARE BRACKET
                      U+007B
                      <http://unicode.org/cldr/utility/character.jsp?a=007B>
                      ( { ) LEFT CURLY BRACKET
                      U+007D
                      <http://unicode.org/cldr/utility/character.jsp?a=007D>
                      ( } ) RIGHT CURLY BRACKET
                      U+00AB
                      <http://unicode.org/cldr/utility/character.jsp?a=00AB>
                      ( « ) LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
                      U+00BB
                      <http://unicode.org/cldr/utility/character.jsp?a=00BB>
                      ( » ) RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

                + Continuation

                      1.Letters, Marks, Numbers
                      Dash and connector punctuation
                      Symbols (except terminating symbols)

          o issues
                + TLD
                      # Is a TLD fixed or are custom patterns allowed?
                + Scheme
                      # require a scheme for Bidi-IRI recognizer
                      # enforce a direction based on the scheme
                + UBA Extension
                      # The same URL (IRI) will be displayed differently
                        according to the embedding level. This is confusing
                      # Pure Latin-character URLs will be displayed in a
                        new and strange way when the embedding level is
                        odd. For instance, "htttp://docs.google.com"
                        will be displayed as "com.google.docs//:http".
                            * feedback that is is actually preferred
                            * preferences seem to be dependent partially
                              on the user’s culture and partially on
                              other life experiences
                + define "mostly Latin" and "mostly Arabic or Hebrew".
                      # first strong in the domain name?
          o Proposals
                + Enforce Direction on the basis of the Domain language
                      # all of domain?
                      # part of domain?
                + * always order the labels/fields either from left to
                  right or right to left.
                  * pick the initial direction from the user environment
                  (eg: English gets left to right fields, Arabic gets
                  right to left fields).
                  * allow the user to override the direction in their
                  preferences.
                + *locale-based ordering*
                      # Eg: visiting an en-US web page may have a
                        different behavior than an ar-EG web page
                      # get all-rtl iris displayed rtl overall, not in a
                        constant back-and-forth at every separator. This
                        should be based on the presence of rtl in the
                        domain name



On 06/06/2011 13:53, Larry Masinter wrote:
>
> Could someone summarize the requirements for BIDI representation and 
> display, and the design choices we’re facing and how they match up 
> against the requirements?
>
Received on Tuesday, 7 June 2011 13:14:36 UTC