Re: [bidi] BIDI?

> I think that the "side of a bus" case often skips the http:// part, so it
does matter a bit.

I didn't mean to sound like it doesn't. It does matter, and a lot. I was
going to propose a modified approach, that the IRI be displayed RTL overall
if the domain is all-RTL (i.e. contains no LTR characters), but this too has
the napkin-cum-bus problem:

WWW.HACKERS.COM/com.bank.www would be displayed as
www.bank.com/MOC.SREKCAH.WWW, the same as www.bank.com/COM.HACKERS.WWW.

Furthermore,
http://WWW.HACKERS.COM?path/boring/and/long/very/a/com.bank.www//:http would
be displayed as
http://www.bank.com/a/very/long/and/boring/path?MOC.SREKCAH.WWW//:http,
the same as
http://www.bank.com/a/very/long/and/boring/path?COM.HACKERS.WWW//:http.

This does seem like a fatal problem.

> FWIW: I don't think we have to worry about this case so much:
> [...]
> but rather simpler cases like "msnbc.com", "biz.host.com", or
> "host.com/biz", since that's what's on the side of a bus.

In some RTL countries, we will probably soon be seeing HOST.CO.XX on the
side of a bus quite a bit. We do have to worry about it.

> I don't know if that "simplifies" the problem any, but a few RTL
> characters deep in an obscure file path in an otherwise LTR string
> probably aren't very interesting.
> Also, ancedotal evidence suggests that the "average" user may not
> be aware that www.msnbc.com means "the www server at msnbc,
> which registered with .com".  It can be misinterpreted as "msnbc's part
> of the web (www)", eg, msnbc somehow registered with www.  So I
> don't think we can ensure that LTR or RTL ordering preserves some
> sort of security heirarchy, at least for the average user.

The average user can tell the security difference between www.hackers.comand
www.bank.com. The case above is such a killer because it is designed to fool
precisely the LTR user living in an LTR country who has never heard of RTL.
The WWW.HACKERS.COM IRI sent to such a user in some spam looks to that user
exactly like a www.bank.com IRI - both in the spam and in the browser
address bar. This is unacceptable.

Thus, it seems I am now converted to the "IRI is always LTR overall" camp.

Aharon

On Sun, Jun 5, 2011 at 10:19 PM, Shawn Steele <Shawn.Steele@microsoft.com>wrote:

>  I think that the "side of a bus" case often skips the http:// part, so it
> does matter a bit.
>
>
>
> FWIW: I don't think we have to worry about this case so much:
>
>
> http://worldblog.msnbc.msn.com/_news/2011/06/05/6789539-amid-the-ruins-a-fisherman-contemplates-a-daunting-future
>
> but rather simpler cases like "msnbc.com", "biz.host.com", or "
> host.com/biz", since that's what's on the side of a bus.  And, of course,
> the email variations.
>
>
>
> I don't know if that "simplifies" the problem any, but a few RTL characters
> deep in an obscure file path in an otherwise LTR string probably aren't very
> interesting.
>
>
>
> Also, ancedotal evidence suggests that the "average" user may not be aware
> that www.msnbc.com means "the www server at msnbc, which registered with
> .com".  It can be misinterpreted as "msnbc's part of the web (www)", eg,
> msnbc somehow registered with www.  So I don't think we can ensure that LTR
> or RTL ordering preserves some sort of security heirarchy, at least for the
> average user.
>
>
>
> I think the key point is "how do we get someone to write it down and key it
> in later without any mistakes"?
>
>
>
> -Shawn
>
>
>
>  
>
> http://blogs.msdn.com/shawnste
>
>
>   ------------------------------
> *From:* Aharon (Vladimir) Lanin [aharon@google.com]
> *Sent:* Sunday, June 05, 2011 11:07 AM
> *To:* Matitiahu Allouche
> *Cc:* bidi@unicode.org; bidi-bounce@unicode.org; Mark Davis ☕; Mohamed
> Mohie; public-iri@w3.org; public-iri-request@w3.org; Shawn Steele
> *Subject:* Re: [bidi] BIDI?
>
>   You have a point, although for http://MY.DOMAIN.org and
> http://org.DOMAIN.MY, the results would be different:
> org.NIAMOD.YM//:http and http://org.NIAMOD.YM, respectively.
>
>  Aharon
>
> On Sun, Jun 5, 2011 at 7:17 PM, Matitiahu Allouche <matial@il.ibm.com>wrote:
>
>> Aharon (Vladimir) Lanin wrote: "To my taste, first strong in the domain
>> name is best".
>> First strong in the domain name fails the napkin test. If the logical name
>> is (upper case = RTL):
>>       MY.DOMAIN.org
>> it would be displayed
>>       org.NIAMOD.YM
>>
>> Such a display could come from the logical name "MY.DOMAIN.org", but also
>> from "org.MY.DOMAIN", thus it is not unambiguous.
>>
>>
>> Shalom (Regards),  Mati
>>
>>
>>
>> From:        "Aharon (Vladimir) Lanin" <aharon@google.com>
>> To:        Matitiahu Allouche/Israel/IBM@IBMIL
>>  Cc:        Shawn Steele <Shawn.Steele@microsoft.com>, bidi@unicode.org,
>> bidi-bounce@unicode.org, "public-iri@w3.org" <public-iri@w3.org>, Mohamed
>> Mohie <MOHIEM@eg.ibm.com>, public-iri-request@w3.org, Mark Davis ☕ <
>> mark@macchiato.com>
>> Date:        05/06/2011 18:43
>> Subject:        Re: [bidi] Re: BIDI?
>>  ------------------------------
>>
>>
>>
>> I think that there needs to be a secondary objective: to get all-rtl iris
>> displayed rtl overall, not in a constant back-and-forth at every separator.
>> Like Mohammed, I think that this should be based on the presence of rtl in
>> the domain name. To my taste, first strong in the domain name is best, but I
>> think that the exact algorithm to use (on the domain name) is less
>> important.
>>
>> Aharon
>>
>> On Jun 5, 2011 10:27 AM, "Matitiahu Allouche" <*matial@il.ibm.com*<matial@il.ibm.com>>
>> wrote:
>> > Please define "mostly Latin" and "mostly Arabic or Hebrew".
>> >
>> > Are you suggesting to count LTR and RTL characters? Are they all equally
>>
>> > weighted?
>> > Does the counting include the scheme (e.g. "http")? the TLD?
>> >
>> > Please consider that the prime objective, IMHO, is to enable easy and
>> > unambiguous human translation from a displayed IRI (napkin, bus side) to
>>
>> > the corresponding logical string.
>> >
>> > Shalom (Regards), Mati
>> > Bidi Architect
>> > Globalization Center Of Competency - Bidirectional Scripts
>> > IBM Israel
>> > Fax: +972 2 5870333 Mobile: +972 52 2554160
>> >
>> >
>> >
>> >
>> > From: Mohamed Mohie <*MOHIEM@eg.ibm.com* <MOHIEM@eg.ibm.com>>
>> > To: Matitiahu Allouche/Israel/IBM@IBMIL
>> > Cc: *bidi@unicode.org* <bidi@unicode.org>, *bidi-bounce@unicode.org*<bidi-bounce@unicode.org>,
>> Mark Davis ☕
>> > <*mark@macchiato.com* <mark@macchiato.com>>, "*public-iri@w3.org*<public-iri@w3.org>"
>> <*public-iri@w3.org* <public-iri@w3.org>>, Shawn
>> > Steele <*Shawn.Steele@microsoft.com* <Shawn.Steele@microsoft.com>>
>> > Date: 03/06/2011 22:06
>> > Subject: Re: [bidi] Re: BIDI?
>> > Sent by: *public-iri-request@w3.org* <public-iri-request@w3.org>
>> >
>> >
>> >
>> > Hello Mati,
>> > To overcome the problem you highlighted below I have a suggestion to be
>> > added for the URL design which is to set the embedding level according
>> to
>> > the directionality of the domain name.
>> > 1- If the domain name "MY.OWN.DOMAIN" is mostly Latin set the embedding
>> > level to even.
>> > 2- If the domain name "MY.OWN.DOMAIN" is mostly Arabic or Hebrew set the
>> > embedding level to odd.
>> >
>> > Thanks And Best regards,
>> > Mohamed Mohie , PMP®
>> > ________________________________________________
>> > GCoC BIDI ,
>> > Advisory Software Engineer, Project Manager, M.Sc.
>> > Cairo Technology Development Center (CTDC)
>> > IBM Egypt
>> > email : *mohiem@eg.ibm.com* <mohiem@eg.ibm.com>
>> >
>> >
>> >
>> >
>> >
>> > From: Matitiahu Allouche <*matial@il.ibm.com* <matial@il.ibm.com>>
>> > To: Mark Davis ☕ <*mark@macchiato.com* <mark@macchiato.com>>
>> > Cc: *bidi@unicode.org* <bidi@unicode.org>, *bidi-bounce@unicode.org*<bidi-bounce@unicode.org>,
>> "*public-iri@w3.org* <public-iri@w3.org>"
>> > <*public-iri@w3.org* <public-iri@w3.org>>, Shawn Steele <*
>> Shawn.Steele@microsoft.com* <Shawn.Steele@microsoft.com>>
>> > Date: 27/04/2011 10:38 ص
>> > Subject: [bidi] Re: BIDI?
>> > Sent by: *bidi-bounce@unicode.org* <bidi-bounce@unicode.org>
>> >
>> >
>> >
>> > Hello, Mark!
>> >
>> > I am glad to see somebody daring to tackle this issue.
>> >
>> > You wrote: <quote>
>> > If a bidiIri is recognized, then it is handled by the UBA as if each
>> > separator is surrounded by:
>> > LRM (if the embedding level is even) or
>> > RLM (if the embedding level is odd)
>> > <end of quote>
>> >
>> > This design has the following consequences, which IMHO are not optimal:
>> > a) The same URL (IRI) will be displayed differently according to the
>> > embedding level. This is confusing.
>> > b) Pure Latin-character URLs will be displayed in a new and strange way
>> > when the embedding level is odd. For instance, "htttp://*
>> docs.google.com* <http://docs.google.com/>"
>> > will be displayed as "com.google.docs//:http".
>> >
>> > Consequently, I second Slim Amamou's suggestion to "have a
>> > predefined/enforced directionality in the specs for each scheme? (ex.
>> LTR
>> > for URLs)".
>> > It is true that pure or mostly Hebrew or Arabic URLs will be displayed
>> in
>> > a
>> > way which may seem strange. For instance, "*http://MY.OWN.DOMAIN.com*<http://my.own.domain.com/>"
>>
>> > (where
>> > upper case letters represent RTL letters) will be displayed as "
>> > *http://YM.NWO.NIAMOD.com* <http://ym.nwo.niamod.com/>", but
>> > 1. The scheme and the TLD currently are pure LTR, and I guess that this
>> is
>> > not going to change soon, so the display of mixed LTR/RTL URLs will be
>> > strange anyway.
>> > 2. The use of domain names with RTL labels is still scarce, there is no
>> > common usage to overcome, so the public will get accustomed to the
>> > "strange" display right from the beginning.
>> >
>> >
>> > Shalom (Regards), Mati
>> > Bidi Architect
>> > Globalization Center Of Competency - Bidirectional Scripts
>> > IBM Israel
>> > Fax: +972 2 5870333 Mobile: +972 52 2554160
>> >
>> >
>> >
>> >
>> > From: Mark Davis ☕ <*mark@macchiato.com* <mark@macchiato.com>>
>> > To: Shawn Steele <*Shawn.Steele@microsoft.com*<Shawn.Steele@microsoft.com>
>> >
>> > Cc: "*public-iri@w3.org* <public-iri@w3.org>" <*public-iri@w3.org*<public-iri@w3.org>>,
>> *bidi@unicode.org* <bidi@unicode.org>
>> > Date: 27/04/2011 02:24
>> > Subject: [bidi] Re: BIDI?
>> > Sent by: *bidi-bounce@unicode.org* <bidi-bounce@unicode.org>
>> >
>> >
>> >
>> > Here are some rough thoughts on how we could handle bidi IRIs.
>> >
>> > *http://goo.gl/QwSoo* <http://goo.gl/QwSoo>
>> >
>> > Feedback is welcome.
>> >
>> > Mark
>> >
>> > On Wed, Apr 20, 2011 at 23:20, Shawn Steele <*
>> Shawn.Steele@microsoft.com* <Shawn.Steele@microsoft.com>>
>> > wrote:
>> > I'm wondering what the current thinking around BIDI IRIs is? A few
>> things
>> > in draft-ietf-iri-3987bis-05 jump out at me.
>> >
>> >
>> > -Shawn>
>>
>>
>

Received on Monday, 6 June 2011 06:22:37 UTC