- From: Aharon (Vladimir) Lanin <aharon@google.com>
- Date: Mon, 6 Jun 2011 09:21:49 +0300
- To: Shawn Steele <Shawn.Steele@microsoft.com>
- Cc: Matitiahu Allouche <matial@il.ibm.com>, "bidi@unicode.org" <bidi@unicode.org>, "bidi-bounce@unicode.org" <bidi-bounce@unicode.org>, Mark Davis ☕ <mark@macchiato.com>, Mohamed Mohie <MOHIEM@eg.ibm.com>, "public-iri@w3.org" <public-iri@w3.org>, "public-iri-request@w3.org" <public-iri-request@w3.org>
- Message-ID: <BANLkTi=_pykwjUQxhAL1RoX5qPwCd2DZfQ@mail.gmail.com>
> I think that the "side of a bus" case often skips the http:// part, so it does matter a bit. I didn't mean to sound like it doesn't. It does matter, and a lot. I was going to propose a modified approach, that the IRI be displayed RTL overall if the domain is all-RTL (i.e. contains no LTR characters), but this too has the napkin-cum-bus problem: WWW.HACKERS.COM/com.bank.www would be displayed as www.bank.com/MOC.SREKCAH.WWW, the same as www.bank.com/COM.HACKERS.WWW. Furthermore, http://WWW.HACKERS.COM?path/boring/and/long/very/a/com.bank.www//:http would be displayed as http://www.bank.com/a/very/long/and/boring/path?MOC.SREKCAH.WWW//:http, the same as http://www.bank.com/a/very/long/and/boring/path?COM.HACKERS.WWW//:http. This does seem like a fatal problem. > FWIW: I don't think we have to worry about this case so much: > [...] > but rather simpler cases like "msnbc.com", "biz.host.com", or > "host.com/biz", since that's what's on the side of a bus. In some RTL countries, we will probably soon be seeing HOST.CO.XX on the side of a bus quite a bit. We do have to worry about it. > I don't know if that "simplifies" the problem any, but a few RTL > characters deep in an obscure file path in an otherwise LTR string > probably aren't very interesting. > Also, ancedotal evidence suggests that the "average" user may not > be aware that www.msnbc.com means "the www server at msnbc, > which registered with .com". It can be misinterpreted as "msnbc's part > of the web (www)", eg, msnbc somehow registered with www. So I > don't think we can ensure that LTR or RTL ordering preserves some > sort of security heirarchy, at least for the average user. The average user can tell the security difference between www.hackers.comand www.bank.com. The case above is such a killer because it is designed to fool precisely the LTR user living in an LTR country who has never heard of RTL. The WWW.HACKERS.COM IRI sent to such a user in some spam looks to that user exactly like a www.bank.com IRI - both in the spam and in the browser address bar. This is unacceptable. Thus, it seems I am now converted to the "IRI is always LTR overall" camp. Aharon On Sun, Jun 5, 2011 at 10:19 PM, Shawn Steele <Shawn.Steele@microsoft.com>wrote: > I think that the "side of a bus" case often skips the http:// part, so it > does matter a bit. > > > > FWIW: I don't think we have to worry about this case so much: > > > http://worldblog.msnbc.msn.com/_news/2011/06/05/6789539-amid-the-ruins-a-fisherman-contemplates-a-daunting-future > > but rather simpler cases like "msnbc.com", "biz.host.com", or " > host.com/biz", since that's what's on the side of a bus. And, of course, > the email variations. > > > > I don't know if that "simplifies" the problem any, but a few RTL characters > deep in an obscure file path in an otherwise LTR string probably aren't very > interesting. > > > > Also, ancedotal evidence suggests that the "average" user may not be aware > that www.msnbc.com means "the www server at msnbc, which registered with > .com". It can be misinterpreted as "msnbc's part of the web (www)", eg, > msnbc somehow registered with www. So I don't think we can ensure that LTR > or RTL ordering preserves some sort of security heirarchy, at least for the > average user. > > > > I think the key point is "how do we get someone to write it down and key it > in later without any mistakes"? > > > > -Shawn > > > > > > http://blogs.msdn.com/shawnste > > > ------------------------------ > *From:* Aharon (Vladimir) Lanin [aharon@google.com] > *Sent:* Sunday, June 05, 2011 11:07 AM > *To:* Matitiahu Allouche > *Cc:* bidi@unicode.org; bidi-bounce@unicode.org; Mark Davis ☕; Mohamed > Mohie; public-iri@w3.org; public-iri-request@w3.org; Shawn Steele > *Subject:* Re: [bidi] BIDI? > > You have a point, although for http://MY.DOMAIN.org and > http://org.DOMAIN.MY, the results would be different: > org.NIAMOD.YM//:http and http://org.NIAMOD.YM, respectively. > > Aharon > > On Sun, Jun 5, 2011 at 7:17 PM, Matitiahu Allouche <matial@il.ibm.com>wrote: > >> Aharon (Vladimir) Lanin wrote: "To my taste, first strong in the domain >> name is best". >> First strong in the domain name fails the napkin test. If the logical name >> is (upper case = RTL): >> MY.DOMAIN.org >> it would be displayed >> org.NIAMOD.YM >> >> Such a display could come from the logical name "MY.DOMAIN.org", but also >> from "org.MY.DOMAIN", thus it is not unambiguous. >> >> >> Shalom (Regards), Mati >> >> >> >> From: "Aharon (Vladimir) Lanin" <aharon@google.com> >> To: Matitiahu Allouche/Israel/IBM@IBMIL >> Cc: Shawn Steele <Shawn.Steele@microsoft.com>, bidi@unicode.org, >> bidi-bounce@unicode.org, "public-iri@w3.org" <public-iri@w3.org>, Mohamed >> Mohie <MOHIEM@eg.ibm.com>, public-iri-request@w3.org, Mark Davis ☕ < >> mark@macchiato.com> >> Date: 05/06/2011 18:43 >> Subject: Re: [bidi] Re: BIDI? >> ------------------------------ >> >> >> >> I think that there needs to be a secondary objective: to get all-rtl iris >> displayed rtl overall, not in a constant back-and-forth at every separator. >> Like Mohammed, I think that this should be based on the presence of rtl in >> the domain name. To my taste, first strong in the domain name is best, but I >> think that the exact algorithm to use (on the domain name) is less >> important. >> >> Aharon >> >> On Jun 5, 2011 10:27 AM, "Matitiahu Allouche" <*matial@il.ibm.com*<matial@il.ibm.com>> >> wrote: >> > Please define "mostly Latin" and "mostly Arabic or Hebrew". >> > >> > Are you suggesting to count LTR and RTL characters? Are they all equally >> >> > weighted? >> > Does the counting include the scheme (e.g. "http")? the TLD? >> > >> > Please consider that the prime objective, IMHO, is to enable easy and >> > unambiguous human translation from a displayed IRI (napkin, bus side) to >> >> > the corresponding logical string. >> > >> > Shalom (Regards), Mati >> > Bidi Architect >> > Globalization Center Of Competency - Bidirectional Scripts >> > IBM Israel >> > Fax: +972 2 5870333 Mobile: +972 52 2554160 >> > >> > >> > >> > >> > From: Mohamed Mohie <*MOHIEM@eg.ibm.com* <MOHIEM@eg.ibm.com>> >> > To: Matitiahu Allouche/Israel/IBM@IBMIL >> > Cc: *bidi@unicode.org* <bidi@unicode.org>, *bidi-bounce@unicode.org*<bidi-bounce@unicode.org>, >> Mark Davis ☕ >> > <*mark@macchiato.com* <mark@macchiato.com>>, "*public-iri@w3.org*<public-iri@w3.org>" >> <*public-iri@w3.org* <public-iri@w3.org>>, Shawn >> > Steele <*Shawn.Steele@microsoft.com* <Shawn.Steele@microsoft.com>> >> > Date: 03/06/2011 22:06 >> > Subject: Re: [bidi] Re: BIDI? >> > Sent by: *public-iri-request@w3.org* <public-iri-request@w3.org> >> > >> > >> > >> > Hello Mati, >> > To overcome the problem you highlighted below I have a suggestion to be >> > added for the URL design which is to set the embedding level according >> to >> > the directionality of the domain name. >> > 1- If the domain name "MY.OWN.DOMAIN" is mostly Latin set the embedding >> > level to even. >> > 2- If the domain name "MY.OWN.DOMAIN" is mostly Arabic or Hebrew set the >> > embedding level to odd. >> > >> > Thanks And Best regards, >> > Mohamed Mohie , PMP® >> > ________________________________________________ >> > GCoC BIDI , >> > Advisory Software Engineer, Project Manager, M.Sc. >> > Cairo Technology Development Center (CTDC) >> > IBM Egypt >> > email : *mohiem@eg.ibm.com* <mohiem@eg.ibm.com> >> > >> > >> > >> > >> > >> > From: Matitiahu Allouche <*matial@il.ibm.com* <matial@il.ibm.com>> >> > To: Mark Davis ☕ <*mark@macchiato.com* <mark@macchiato.com>> >> > Cc: *bidi@unicode.org* <bidi@unicode.org>, *bidi-bounce@unicode.org*<bidi-bounce@unicode.org>, >> "*public-iri@w3.org* <public-iri@w3.org>" >> > <*public-iri@w3.org* <public-iri@w3.org>>, Shawn Steele <* >> Shawn.Steele@microsoft.com* <Shawn.Steele@microsoft.com>> >> > Date: 27/04/2011 10:38 ص >> > Subject: [bidi] Re: BIDI? >> > Sent by: *bidi-bounce@unicode.org* <bidi-bounce@unicode.org> >> > >> > >> > >> > Hello, Mark! >> > >> > I am glad to see somebody daring to tackle this issue. >> > >> > You wrote: <quote> >> > If a bidiIri is recognized, then it is handled by the UBA as if each >> > separator is surrounded by: >> > LRM (if the embedding level is even) or >> > RLM (if the embedding level is odd) >> > <end of quote> >> > >> > This design has the following consequences, which IMHO are not optimal: >> > a) The same URL (IRI) will be displayed differently according to the >> > embedding level. This is confusing. >> > b) Pure Latin-character URLs will be displayed in a new and strange way >> > when the embedding level is odd. For instance, "htttp://* >> docs.google.com* <http://docs.google.com/>" >> > will be displayed as "com.google.docs//:http". >> > >> > Consequently, I second Slim Amamou's suggestion to "have a >> > predefined/enforced directionality in the specs for each scheme? (ex. >> LTR >> > for URLs)". >> > It is true that pure or mostly Hebrew or Arabic URLs will be displayed >> in >> > a >> > way which may seem strange. For instance, "*http://MY.OWN.DOMAIN.com*<http://my.own.domain.com/>" >> >> > (where >> > upper case letters represent RTL letters) will be displayed as " >> > *http://YM.NWO.NIAMOD.com* <http://ym.nwo.niamod.com/>", but >> > 1. The scheme and the TLD currently are pure LTR, and I guess that this >> is >> > not going to change soon, so the display of mixed LTR/RTL URLs will be >> > strange anyway. >> > 2. The use of domain names with RTL labels is still scarce, there is no >> > common usage to overcome, so the public will get accustomed to the >> > "strange" display right from the beginning. >> > >> > >> > Shalom (Regards), Mati >> > Bidi Architect >> > Globalization Center Of Competency - Bidirectional Scripts >> > IBM Israel >> > Fax: +972 2 5870333 Mobile: +972 52 2554160 >> > >> > >> > >> > >> > From: Mark Davis ☕ <*mark@macchiato.com* <mark@macchiato.com>> >> > To: Shawn Steele <*Shawn.Steele@microsoft.com*<Shawn.Steele@microsoft.com> >> > >> > Cc: "*public-iri@w3.org* <public-iri@w3.org>" <*public-iri@w3.org*<public-iri@w3.org>>, >> *bidi@unicode.org* <bidi@unicode.org> >> > Date: 27/04/2011 02:24 >> > Subject: [bidi] Re: BIDI? >> > Sent by: *bidi-bounce@unicode.org* <bidi-bounce@unicode.org> >> > >> > >> > >> > Here are some rough thoughts on how we could handle bidi IRIs. >> > >> > *http://goo.gl/QwSoo* <http://goo.gl/QwSoo> >> > >> > Feedback is welcome. >> > >> > Mark >> > >> > On Wed, Apr 20, 2011 at 23:20, Shawn Steele <* >> Shawn.Steele@microsoft.com* <Shawn.Steele@microsoft.com>> >> > wrote: >> > I'm wondering what the current thinking around BIDI IRIs is? A few >> things >> > in draft-ietf-iri-3987bis-05 jump out at me. >> > >> > >> > -Shawn> >> >> >
Received on Monday, 6 June 2011 06:22:37 UTC