- From: Tomer Mahlin <TOMERM@il.ibm.com>
- Date: Sun, 21 Jun 2015 22:29:30 +0300
- To: Richard Ishida <ishida@w3.org>
- Cc: www International <www-international@w3.org>
- Message-ID: <OFC5E0EEDD.35611948-ONC2257E6B.0065BEE2-C2257E6B.006B12F9@il.ibm.com>
Hello Ri,
Very interesting article. Thanks for sharing.
Several observations with your kind permission.
1. Identification of proper direction of text.
If we manage to accurately identify the language to which text belongs
we would easily find out optimal text direction . Getting this information
(natural text direction for given language) is available in many services
( including iOS API and ICU ). Please observe that identification of
language of text is not the same as identification of code page or even
language of each specific word. Several languages share to a large degree
the same alphabet (i.e. Arabic / Farsi, Russian / Bulgarian etc.)
In general I believe we all agree that even first strong (aka
contextual or auto) is just a heuristic. In other words, it is not
accurate identification of language to which text (sentence) belong.
Much more accurate identification is provided by Language
Identification service (i.e. Cortana from Microsoft, Siri from Apple,
Google Language tools from Google, Watson from IBM).
I realize that at this moment those services might be too
computationally expensive to be leveraged at the text rendering level.
However, it may change in the future.
2. Specification of text direction at authoring time
I think that specification of text direction at authoring time via
metadata should be improved in general. Very good support is already
provided in various rich text editors (i.e. http://www.tinymce.com/ or
http://ckeditor.com/) - the assumption is presence of built in capability
to store metadata along side with text data..
However, as opposed to rich text editors, simple plain text input
fields (HTML input fields) don't allow any such mechanism. Various
browsers (IE, FF) and native platforms (including iOS, Windows) allow
changing text direction on the fly.
However, there is no way for consuming code to retrieve such
information from the input field (even less a way to store it with the
text).
I believe ideally, JS string object should support getTextDirection
and setTextDirection methods (just like other methods it currently
supports- http://www.w3schools.com/jsref/jsref_obj_string.asp).
At authoring time in HTML input field, when end user interactively
changes direction of text, setTextDirection method could be used to store
this information.
3. Usage of UCC as metadata for specification of text direction
I don't think in general that usage of Unicode Control Character (UCC)
is an optimal way to specify the direction of text. It may be appropriate
only if this is done "on the glass" so to speak for display purposes only.
However, if UCC are injected into string itself this will have far
reaching consequences. This is simply because all consuming sides
(storage, string manipulation, search etc.) should be aware of the fact
that on the authoring side UCC were added as metadata (not as part of the
string data). If one controls both authoring and consuming side it is
probably doable. Otherwise it is not.
4. Proper display of tokens sequence separated by comma in CSV file
This case belongs to so called "structured text" category (to the same
category belongs file path, URL, breadcrumb, list of tags separated by
comma etc.).
They are described in more details in this design document:
https://docs.google.com/document/d/1y9LhT7rbGGVHjh2uqTAYHzN5PfbAkPxO5sMJygOPc3I/edit?usp=sharing
It is suggested to address proper display of such patterns by a
higher level protocol. More details are in the document above.
I also attach hereafter just in case:
PS. I apologize for extending this discussion into directions which go
beyond "cell" context. Nevertheless, I would highly appreciate your
thoughts / feedback.
Best Regards,
Tomer Mahlin
Bidi Champion, Globalization Manager, GCoC
Bidi Development Lab
Phone: +972-2-6491784 | Mobile: +972-54-3368122 E-mail: tomerm@il.ibm.com
IBM R&D Labs
Malcha Technology Park
Jerusalem 96951
Israel
From: Richard Ishida <ishida@w3.org>
To: www International <www-international@w3.org>
Date: 19/06/2015 16:32
Subject: determining cell text direction
to help resolve this github issue https://github.com/w3c/csvw/issues/620
i have begun making notes about handling bidi in plain text environments
on the Web – particularly, so far, CSV data.
it's just a start. Comments welcome, as i go.
ri
Attachments
- image/gif attachment: 01-part
- image/gif attachment: 02-part
- application/octet-stream attachment: Properdisplayofbidirectionalstructuredtext.pdf
Received on Sunday, 21 June 2015 19:30:19 UTC