- From: Adil Allawi <adil@diwan.com>
- Date: Tue, 07 Jun 2011 14:14:09 +0100
- To: public-iri@w3.org
- Message-ID: <4DEE2421.1090706@diwan.com>
Following is a sunmary of the discussion so far. I tried to write it as text but got lost in the details so I created a mind map. You can see it as an image here: http://ironymark.diwan.com/2011/06/the-trials-of-bidi-iris/ and the text of the map is below. Please tell me if you see any omissions or want to add any points: * Bidi Iri o Consistency + usage across different applications + copy the contents of an address bar into an email o Migration + There will be a long migration period, so making sure that the negative effects are mitigated as much as possible. o Usability + simple, comprehensible way to recognize IRIs in plaintext + easy and unambiguous human translation from a displayed IRI (napkin, bus side) to the corresponding logical string o Security WWW.HACKERS.COM/com.bank.www <http://WWW.HACKERS.COM/com.bank.www> would be displayed as www.bank.com/MOC.SREKCAH.WWW <http://www.bank.com/MOC.SREKCAH.WWW>, the same as www.bank.com/COM.HACKERS.WWW <http://www.bank.com/COM.HACKERS.WWW>. Furthermore, http://WWW.HACKERS.COM?path/boring/and/long/very/a/com.bank.www//:http <http://WWW.HACKERS.COM/?path/boring/and/long/very/a/com.bank.www//:http> would be displayed as http://www.bank.com/a/very/long/and/boring/path?MOC.SREKCAH.WWW//:http, the same as http://www.bank.com/a/very/long/and/boring/path?COM.HACKERS.WWW//:http. o *Proposal 1: UBA Extension -*/the “fields” of an IRI to flow in a consistent direction./ + A /separator/ is defined as any instance of the quoted strings in the bidi_IRI BNF: right after a scheme: “://” in a domain: IDNSep right after a domain: “/” , “?”, “#” in a path: “/” right after a path: “?”, “#” in a query: “=” or “&” right after a query: “#” + A /field/ is defined as any text between separators, or at the front or end. + Ordering options: 1. Each bidi_IRI is displayed with fields from left to right. Thus the following will always appear with the same display, whether in a RTL or LTR environment. http://ab.cd.com/mn/op http://ab.cd.*FE.HG.*com/*JI/LK/*mn/op http://*FE.HG/JI/LK* 2. the ordering of fields could be subject to the environment (whether the current embedding level is RTL or LTR). In that case, the display would be something like: LTR: http://ab.cd.com/mn/op http://ab.cd.*FE.HG.*com/*JI/LK/*mn/op http://FE.HG/JI/LK RTL: op/mn/com.cd.ab//:http op/mn*/LK/JI*/com*.HG.FE*.cd.ab//:http *LK/JI/HG.FE*//:http 3. the ordering not depend only on the environment, but instead depend on whether there were any RTL characters in the IRI. http://ab.cd.com/mn/op op/mn*/LK/JI*/com*.HG.FE*.cd.ab//:http *LK/JI/HG.FE*//:http + Method: # the entire bidi_IRI is embedded in <LRE>...<PDF> # each field is surrounded by LRMs or RLMs depending on the main direction. o Definition of a Bidi-IRI + characters *bidiIri* := ((scheme “://” domain) | domain2) (“/” path)? (“?” query)? (“#” fragment)? *domain* := UTS46Chars + ( IDNSep UTS46Chars+)* IDNSep? *domain2* := domain IDNSep TLD IDNSep? *path* := (char - “?” - “#”)* *query* := (char - “#”)* *fragment* := char* *IDNSep* := [\u002E \uFF0E \u3002\uFF61] // see http://unicode.org/reports/tr46/#Notation <http://unicode.org/reports/tr46/#Notation> *TLD* := <list on http://www.iana.org/domains/root/db/> *char* := percentEncodedUTF8 | [[:L:][:N:][:M:][:S:][:Pd:][:Pc:][:Cf:] inclusionChar - exclusionChar] *inclusionChar* := U+0021 <http://unicode.org/cldr/utility/character.jsp?a=0021> ( ! ) EXCLAMATION MARK U+0022 <http://unicode.org/cldr/utility/character.jsp?a=0022> ( " ) QUOTATION MARK U+0023 <http://unicode.org/cldr/utility/character.jsp?a=0023> ( # ) NUMBER SIGN U+0025 <http://unicode.org/cldr/utility/character.jsp?a=0025> ( % ) PERCENT SIGN U+0026 <http://unicode.org/cldr/utility/character.jsp?a=0026> ( & ) AMPERSAND U+0027 <http://unicode.org/cldr/utility/character.jsp?a=0027> ( ' ) APOSTROPHE U+002A <http://unicode.org/cldr/utility/character.jsp?a=002A> ( * ) ASTERISK U+002C <http://unicode.org/cldr/utility/character.jsp?a=002C> ( , ) COMMA U+002E <http://unicode.org/cldr/utility/character.jsp?a=002E> ( . ) FULL STOP U+002F <http://unicode.org/cldr/utility/character.jsp?a=002F> ( / ) SOLIDUS U+003A <http://unicode.org/cldr/utility/character.jsp?a=003A> ( : ) COLON U+003B <http://unicode.org/cldr/utility/character.jsp?a=003B> ( ; ) SEMICOLON U+003F <http://unicode.org/cldr/utility/character.jsp?a=003F> ( ? ) QUESTION MARK U+0040 <http://unicode.org/cldr/utility/character.jsp?a=0040> ( @ ) COMMERCIAL AT U+005C <http://unicode.org/cldr/utility/character.jsp?a=005C> ( \ ) REVERSE SOLIDUS U+00A1 <http://unicode.org/cldr/utility/character.jsp?a=00A1> ( ¡ ) INVERTED EXCLAMATION MARK U+00B7 <http://unicode.org/cldr/utility/character.jsp?a=00B7> ( · ) MIDDLE DOT U+00BF <http://unicode.org/cldr/utility/character.jsp?a=00BF> ( ¿ ) INVERTED QUESTION MARK *exclusionChar* := U+003C <http://unicode.org/cldr/utility/character.jsp?a=003C> ( < ) LESS-THAN SIGN U+003E <http://unicode.org/cldr/utility/character.jsp?a=003E> ( > ) GREATER-THAN SIGN + termination 1.Unassigned, surrogates, private-use, control codes Whitespace Open, close or most ‘other’ punctuation, plus special cases < and >. U+003C <http://unicode.org/cldr/utility/character.jsp?a=003C> ( < ) LESS-THAN SIGN U+003E <http://unicode.org/cldr/utility/character.jsp?a=003E> ( > ) GREATER-THAN SIGN U+0028 <http://unicode.org/cldr/utility/character.jsp?a=0028> ( ( ) LEFT PARENTHESIS U+0029 <http://unicode.org/cldr/utility/character.jsp?a=0029> ( ) ) RIGHT PARENTHESIS U+005B <http://unicode.org/cldr/utility/character.jsp?a=005B> ( [ ) LEFT SQUARE BRACKET U+005D <http://unicode.org/cldr/utility/character.jsp?a=005D> ( ] ) RIGHT SQUARE BRACKET U+007B <http://unicode.org/cldr/utility/character.jsp?a=007B> ( { ) LEFT CURLY BRACKET U+007D <http://unicode.org/cldr/utility/character.jsp?a=007D> ( } ) RIGHT CURLY BRACKET U+00AB <http://unicode.org/cldr/utility/character.jsp?a=00AB> ( « ) LEFT-POINTING DOUBLE ANGLE QUOTATION MARK U+00BB <http://unicode.org/cldr/utility/character.jsp?a=00BB> ( » ) RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK + Continuation 1.Letters, Marks, Numbers Dash and connector punctuation Symbols (except terminating symbols) o issues + TLD # Is a TLD fixed or are custom patterns allowed? + Scheme # require a scheme for Bidi-IRI recognizer # enforce a direction based on the scheme + UBA Extension # The same URL (IRI) will be displayed differently according to the embedding level. This is confusing # Pure Latin-character URLs will be displayed in a new and strange way when the embedding level is odd. For instance, "htttp://docs.google.com" will be displayed as "com.google.docs//:http". * feedback that is is actually preferred * preferences seem to be dependent partially on the user’s culture and partially on other life experiences + define "mostly Latin" and "mostly Arabic or Hebrew". # first strong in the domain name? o Proposals + Enforce Direction on the basis of the Domain language # all of domain? # part of domain? + * always order the labels/fields either from left to right or right to left. * pick the initial direction from the user environment (eg: English gets left to right fields, Arabic gets right to left fields). * allow the user to override the direction in their preferences. + *locale-based ordering* # Eg: visiting an en-US web page may have a different behavior than an ar-EG web page # get all-rtl iris displayed rtl overall, not in a constant back-and-forth at every separator. This should be based on the presence of rtl in the domain name On 06/06/2011 13:53, Larry Masinter wrote: > > Could someone summarize the requirements for BIDI representation and > display, and the design choices we’re facing and how they match up > against the requirements? >
Received on Tuesday, 7 June 2011 13:14:36 UTC