[Bug 10807] i18n comment 1 : new attribute: ubi


--- Comment #21 from Aryeh Gregor <Simetrical+w3cbug@gmail.com> 2010-10-19 17:03:41 UTC ---
(In reply to comment #17)
> Is it the case that all these cases should also have a language specified? All
> the examples so far seem to be english text mixed in with hebrew; would it be
> correct to say that they should all be marked up with lang="" attributes? If
> so, can we just make all elements with lang="" attributes have
> unicode-bidi:isolate? Or are there examples of where setting the language
> doesn't change (and you do know the language doesn't change, it's not just that
> you don't know the language) but you still want this isolation behaviour?

Assuming that every language is written either LTR or RTL, not either
interchangeably -- this should be true at least if you use language codes like
kk-ar and kk-cy to distinguish -- then clearly you don't need isolation if the
language of the whole string is known to be the same as the language of the
surrounding page.

However, the provided text is generally going to be in an unknown language, and
might be in a mix of languages.  For instance, on an English page, a user might
submit a one-line input (like a wiki edit summary, or a username) that contains
a Hebrew word.  It's possible for this to mess up direction if this isn't
contained somehow, e.g.:

Logical:          comments: "abc", "def GHI", "KJL mno"
Expected display: comments: "abc", "def IHG", "LJK mno"
Actual display:   comments: "abc", "def LJK" ,"IHG mno"

Not to mention the possibility that the comments might actually contain
directionality marks themselves.  Again, this can mostly be fixed by inserting
control characters, but those are a pain to work with, e.g., they get caught in
copy-paste.  Note that this affects even LTR text on an LTR page, if the app is
bidi-aware -- it's simplest to just output the isolation character
unconditionally, and then copy-paste will include invisible garbage that will
foil simple string matches and so on.  (Bidi control characters are supposed to
be ignored for string matching, but that's generally not done in practice.)

Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Tuesday, 19 October 2010 17:03:45 UTC