[inline bidi update]. from Jan Nelson on 2014-02-12 (www-international@w3.org from January to March 2014)

From: Jan Nelson <Jan.Nelson@microsoft.com>
Date: Wed, 12 Feb 2014 22:25:46 +0000
To: "www-international@w3.org" <www-international@w3.org>
Message-ID: <7ae15a2de8ed43f7b0b56b48e5ac6df8@BL2PR03MB180.namprd03.prod.outlook.com>
HI Richard, all;



I had our bidi folks take a look at the doc, they feel it looks good overall for Arabic, Urdu, Persian and Dari languages (from the developers guide standpoint).



I am including their suggestions / questions/ comments for your consideration:



1)      Should the shim for browsers (that don’t support css) use bdi instead of bdo as everywhere else we are using bdi ?

bdo[dir='ltr'], bdo[dir='rtl'] {

                unicode-bidi: bidi-override;

                unicode-bidi: -webkit-isolate-override;

                unicode-bidi: -moz-isolate-override;

                unicode-bidi: -ms-isolate-override;

                unicode-bidi: isolate-override;

                }



2)      Does the scope of this feedback include collecting set of rules for advanced text processing and driving features in the HTML5 (text analysis engine)?  Good thing is that we are empowering the user by enabling them to control the direction of the specific content (even though it may be a bit complex which it should not be) and be able to override the default rendering algorithms.



a.      One thing that we (HTML5) is still making the developer do a lot of work which it could automatically identify i.e. set of rules that should be applied in an intelligent fashion. For example if someone writes “Introduction to C++” and there is no space between the directional character C and the weak/directional-neutral characters ‘+’, the direction of the language neutral characters preceded by directional characters (without a space) should take the direction of the preceding characters. This same rule applies to “Introduction to C++!”.



The evaluation of the candidate place will become a bit tricky if there is a space in between C++ and !. For which some more intelligent contextual rules needs to be applied to first detect base language (at a word, line, paragraph or page level), and then identifying relevance and place of the language neutral character i.e. whether it falls in the start, at the end of statement or somewhere in between LTR and RTL.



b.      By following the rule of space / no space between a directional neutral and directional letter (2.a) in the HTML5’s text processing engine, Fixing use case 2 in HTML5<http://www.w3.org/International/articles/inline-bidi-markup/update#uc2html5> can be fixed automatically without a need for specifying tags to control the directionality, as a space exists between “5” and “International Activity” in a content that has more LTR text in it.



c.      The directionality of the page should be evaluated by determining the overall content and rank of the content (at a word, sentence, paragraph and page level) to determine the base directionality. I don’t know, however, the design details of the “auto” tag i.e. what attributes of the page / content are taken into account and how ranking is done.



d.      Similarly, in use case 3, if we have contextual rules i.e. if context is the list of items i.e. A, B or C OR A, B and C, we can automatically control the direction as soon as we determine that there is a probable list of items in the statement. Intelligently identifying key words like “in” “on” “for”, “في”,من   can help rendering engine to place the information at a correct place (before or after)



e.      Another example in the contextual space is to be able to identify phone numbers accurately (which Skype Click to Call plugin<https://support.skype.com/en/faq/FA12006/how-do-i-script-webpages-to-find-phone-numbers-using-click-to-call> already does) and then render it in correct order. Use case 4 falls under this category.



3)  We may be able to determine the correct context based on rules and then determine where to place the characters. SAT<http://toolbox/SAT> was one of the rules based engine developed to identify the contextual issues in a given string. If we can define the rules and detect issues, we should be able to catch them as well.



Respectfully,



Jan





-----Original Message-----

From: Richard Ishida [mailto:ishida@w3.org]

Sent: Friday, January 17, 2014 9:04 AM

To: www International; public-i18n-bidi@w3.org<mailto:public-i18n-bidi@w3.org>

Subject: For review: Update to What you need to know about the bidi algorithm and inline markup



An updated version of What you need to know about the bidi algorithm and inline markup[1] is out for wide review. We are looking for comments over the next two weeks. After the review period is over, this content will be copied to the same location as the current version of What you need to know about the bidi algorithm and inline markup[2] and the URL of the updated version will cease to exist.



The update rewrites the article to reflect the recent changes in bidi markup in the HTML5 specification.



Technically speaking, the main change is that the dir attribute now isolates text by default with respect to the bidi algorithm. Isolation as a default is the recommendation of the Unicode Standard as of version 6.3.



From a less technical point of view, the main advantages to the update are that the new methods introduced here reduce the need to use a new approach when the direction of content is known, and therefore makes for a much simpler transition for both content authors and browser developers to support the advances in the handling of bidirectional text content. At the same time, these approaches have good results for existing legacy content.



Please send comments to www-international@w3.org<mailto:www-international@w3.org> and start the subject line with



Thank you.







[1] http://www.w3.org/International/articles/inline-bidi-markup/update




[2] http://www.w3.org/International/articles/inline-bidi-markup/
Received on Wednesday, 12 February 2014 22:26:16 UTC