RE: Transcription requirements

Sure...  I did add an issue "27 hours ago" though, since the public-iri discussion seems to keep dropping this...

http://trac.tools.ietf.org/wg/iri/trac/ticket/121


From: Larry Masinter [mailto:masinter@adobe.com]
Sent: Wednesday, March 14, 2012 12:02 PM
To: Shawn Steele
Subject: RE: Transcription requirements

Would you mind forwarding this discussion to the public-iri list, if events haven't taken over? Or raising this as an issue on whatever drafts you think are appropriate?

Thanks...



From: Shawn Steele [mailto:Shawn.Steele@microsoft.com]<mailto:[mailto:Shawn.Steele@microsoft.com]>
Sent: Monday, March 05, 2012 10:05 AM
To: Larry Masinter
Subject: RE: Transcription requirements

Yes, transcription by "some" human is my point :)  It cannot be global though, it must be a person literate in the language.

IDN doesn't really discuss the display.  (It forbids some things that'd mess up a single label, but doesn't handle multiple label display, it's really only about the logical order).

Currently, the IRI specification says that the display order for an entirely RTL domain name must be http://RTLLABEL1.RTLLABEL2.  I believe that, for the typical human reader in at least some of those languages, that requirement is very artificial.

I think your point is about "specific use cases."  I think the "Side of a bus in Cairo" scenario is a reasonable one.


·         Side of the bus usually is just the domain, skipping the http:// part

·         Sometimes www is present, sometimes not.

·         So it's often "company.tld".

·         My assertion is that someone seeing Arabic on the side of the bus would read the entire thing from RTL, regardless of where the dots are.  That would mimic how they read it, and how they transcribe/type it.

o   From the information I have, that may not apply to Hebrew.

o   It also may not apply to all users :(

o   That's pretty much what the Saudi gov't asked us for.

o   It's what my boss (native Arabic speaker) finds natural.

o   Even "worse," there's some evidence that com.microsoft would be preferred if the rest of the UI is all Arabic.  In this case it doesn't impact the transcription order, even between an Arabic and English speaker, as it would be right-aligned and the reader would still read the "microsoft" label first.

o   Either way, clearly the order of the labels must be consistent, regardless of whether the "first" label is on the left or the right.

Second, similar case, is how do you type it if someone is reading it to you?

·         I think that so long as the reader is expecting the "first" label on either the right or left, and then continues reading to the other end, that the person typing it will enter it in the correct order, regardless of whether there display mixes the left and the right.

Third case is cross-culture, which is probably where 99% of the problem is

·         If I see a bus in Cairo that says wanted to take me to the "pyramids" web site, should the Latin form be aimed at a tourist and say "pyramids.eg"?  And if there's an arabic form, should it say "EG.PYRAMIDS" even though tourists may see it?  (Presumably only Arabic-literate tourists would be able to type that though).  IMO it's OK to say "pyramids.eg" and "EG.PYRAMIDS" on the same ad, because the target scripts/audiences are obvious.

·         If the ad doesn't have an Arabic form and is aimed at local users (the site owner doesn't know about ccTLD yet), then the order is less obvious.  (Eg: the ad should be an Arabic domain name, but technical limitations until now have prevented it).

o   Legacy names should have it in the legacy order.

o   However there's some indication that speakers immersed in an otherwise-Arabic UI would prefer for the labels to be arranged from Right to Left, which would be a breaking change for that ad on the bus.

-Shawn

From: Larry Masinter [mailto:masinter@adobe.com]<mailto:[mailto:masinter@adobe.com]>
Sent: Sunday, March 04, 2012 11:32 PM
To: Shawn Steele
Subject: Transcription requirements

Shawn wrote:


Ø   global transcription cannot happen. I don't know how to write (or type) Arabic or Chinese or...

"Global transcription" cannot happen for IRIs.

The context was the distinction between URIs and IRIs. URIs were designed for "global transcription"; this is very explicit in section 1.2.1 of RFC 3986. The character repertoire was limited based on our model at the time of what could be typed, reliably, in most locales.


My point was that even if, for IRIs, "global transcription" is taken as a lower priority, there still remain "transcription" requirements for IRIs. I do not know what those transcription requirements are.

Most of the discussion in the IRI document, most of the restrictions, are based on transcription requirements; characters and forms are (or should be) disallowed if they cannot be rendered, read, copied, and transcribed by *some* human.  It's a matter of judgment (and consensus of judgment) about how many humans need to be able to transcribe a form in order for us to allow it.

Bidi adds  a new categories of transcription requirements.

One thing I think might help a lot would be if we had specific use cases against which to evaluate transcription requirements, and, in particular, the contexts of RTL vs LTR running text.

I'm not sure these use cases need to be a formal document, but I think it would be helpful.

For example, I do not understand at all the comment "I shouldn't force my ltr bias on someone who uses that language."

If party A creates a document with a bidi IRI which is then displayed, party B sees the visual representation of that IRI, and attempts to transcribe it by entering it on a keyboard ... who is forcing a "ltr bias" on whom?

Were there any explicit discussions of transcription in the IDN discussions?
Are there any use cases?


From: Shawn Steele [mailto:Shawn.Steele@microsoft.com]<mailto:[mailto:Shawn.Steele@microsoft.com]>
Sent: Friday, March 02, 2012 12:58 PM
To: Larry Masinter; Adil Allawi; Najib Tounsi
Cc: PUBLIC-IRI@W3.ORG<mailto:PUBLIC-IRI@W3.ORG>
Subject: RE: Bidi Doc

Quick comment: global transcription cannot happen.  I don't know how to write (or type) Arabic or Chinese or...

However someone who can read it does understand rtl behavior.  I shouldn't force my ltr bias on someone who uses that language.

There's also no real support issue here... So long as all the pieces are ordered consistently, its easy to recognize which label is first or last.


Sent from my Windows Phone 7
________________________________
From: Larry Masinter
Sent: 3/2/2012 11:31 AM
To: Shawn Steele; Adil Allawi; Najib Tounsi
Cc: PUBLIC-IRI@W3.ORG<mailto:PUBLIC-IRI@W3.ORG>
Subject: RE: Bidi Doc
I'm not sure it is a "user" preference as much as it is a situational one. And I think we need to really back away from the idea that IRIs can meet a requirement of "intuitive presentation".

The problem is over-constrained. Between URI and IRI there is a difference in how they meet the conflicting requirements of "global transcription" vs. "local ease of use", where IRIs reflect a design choice to allow more reasonable local names.

However, we still need to maintain at least some level of transcription interoperability.... that it should at least be possible to construct IRIs that, when displayed by ordinary Unicode display methods, the display presented to a user, and then entered by keyboard or speech or some other method by that user, that the result will be the "same" IRI.

You are questioning a  "MUST", though, in "Bidirectional IRIs MUST be rendered by using the ...".

> That forbids display such as FED.CBA//:http, or more to the point FED.CBA, which many bidi speakers find more intuitive.

I'm not sure "FED.CBA" is a "rendered" though.

Would it help if we said that the -bidi- document should be thought of as "best practices" rather than standards track?  Usually when you write a "MUST" it's clear what implementations might be affected. I don't know who would be non-compliant or how to test "Bidirectional IRIs MUST be rendered ...".

Larry


-----Original Message-----
From: Shawn Steele [mailto:Shawn.Steele@microsoft.com]
Sent: Friday, March 02, 2012 9:25 AM
To: Larry Masinter; Adil Allawi; Najib Tounsi
Cc: PUBLIC-IRI@W3.ORG<mailto:PUBLIC-IRI@W3.ORG>
Subject: RE: Bidi Doc

The whole thing?  Starting with 2:

   "Bidirectional IRIs MUST be rendered by using the Unicode
   Bidirectional Algorithm [UNIV6], [UNI9].  Bidirectional IRIs MUST be
   rendered in the same way as they would be if they were in a left-to-
   right embedding; i.e., as if they were preceded by U+202A, LEFT-TO-
   RIGHT EMBEDDING (LRE), and followed by U+202C, POP DIRECTIONAL
   FORMATTING (PDF).  Setting the embedding direction can also be done
   in a higher-level protocol (e.g., the dir='ltr' attribute in HTML)."

That forbids display such as FED.CBA//:http, or more to the point FED.CBA, which many bidi speakers find more intuitive.

Like I said, our investigation has indicated that this is a user preference.  People with a strong math, CS or other backgrounds are happy with the more computer-like display.  Laymen (for lack of a better word), and some professionals, seem much happier with the RTL label ordering.  There appears to also be a cultural bias, not just math/cs, but it's not perfect.

I think the best case is the "tell me what site you went to" scenario.  For the logical name ABC.DEF, someone on the phone is going to say "I went to A B C <dot> D E F".  And that's what the user is going to type.  If an RTL speaker is transcribing that logical order "ABC.DEF" off the side of the bus, they're going to be reading and writing it naturally from RTL, "FED.CBA" (visual order).  If they were to read the visual "CBA.FED" from the side of a bus, they'd naturally say "D E F <dot> A B C", which is wrong for logical order, and won't work when they guy on the other end of the phone tries to type it in.  The might be able to be trained to read it in a funny way, but I don't think that's at all natural for many speakers.

To this point, how IRI's say http://buy.stuff.com/get/your/stuff/here.html?user=23456&account=abcd is far less important to most people than the display ads which say "pepsi.com".

Us computer scientists will be able to read the IRI no matter how it's displayed, we'll know what the spec says, or close enough anyway.  It's the user trying to go to "pepsi.com" that is the most important case.  How we handle the rest of the IRI should be based on that.  We shouldn't force some behavior on the domain name because we have trouble figuring out what to do with query strings.

-Shawn

-----Original Message-----
From: Larry Masinter [mailto:masinter@adobe.com]
Sent: Friday, March 02, 2012 12:32 AM
To: Shawn Steele; Adil Allawi; Najib Tounsi
Cc: PUBLIC-IRI@W3.ORG<mailto:PUBLIC-IRI@W3.ORG>
Subject: RE: Bidi Doc

Shawn, I'm not sure what part of the Bidi IRI spec would be affected by your comments.... could you be more specific about which section it refers to?

Thanks,

Larry


-----Original Message-----
From: Shawn Steele [mailto:Shawn.Steele@microsoft.com]
Sent: Monday, February 27, 2012 8:39 AM
To: Adil Allawi; Najib Tounsi
Cc: PUBLIC-IRI@W3.ORG<mailto:PUBLIC-IRI@W3.ORG>
Subject: Bidi Doc

It's been a while since I've taken a look at this document.

IMO the embedding display forcing LTR behavior still doesn't match the feedback I've received from many Arabic speakers, though it does seem to fit the expectations of other BIDI speakers.  It would appear that some users would be best served by RLE type behavior instead.

Unfortunately this appears to be a user preference and appears to be influenced by their life experience, not necessarily tied directly to the content language.  Users with strong math or CS backgrounds seem more likely to find the LTR behavior acceptable.  If someone's going to read an IRI to someone over the phone, it needs to be in the order they'd read/type it.

FWIW:  Outside of skilled users, the structure of the actual IRI is opaque.  Eg: to a Phd educated user in a different field, www.foo.com<http://www.foo.com> means "foo company's spot on the web", somehow reversing the order of the domain.

-Shawn

Received on Wednesday, 14 March 2012 19:08:00 UTC