- From: r12a via GitHub <sysbot+gh@w3.org>
- Date: Mon, 04 Apr 2016 17:34:05 +0000
- To: www-international@w3.org
r12a has just created a new issue for https://github.com/w3c/charmod-norm: == Arabic & Hebrew issues with 2.4 == [ moved here from issue #80 ] raised by tomerm **On Unicode Controls and Invisible Markers** Languages with bidirectional scripts may include different sections (called directional runs) having different directions (i.e. Arabic words running from right to left , while Latin words and numbers running from left to right). It is not a secret that sentence in Arabic / Hebrew includes quite often Latin words / numbers. Readability of sentence is greatly affected by direction with which text is displayed (this direction affects relative order of directional runs as they are laid out on the screen). If this direction is different from natural direction of language in which sentence is expressed, it makes it incomprehensible. Unfortunately none of current technologies allows to specify direction of text (i.e. String in Java is a final class and does not include any information about text directionality). Thus unless there is a higher level protocol (i.e. HTML markup with dir attribute) which can be used for that purpose, there is no way to persist text direction information. Consequently many solutions rely on Unicode Control Characters. Those are explicitly mentioned in Unicode Bidi Algorithm specification: http://unicode.org/reports/tr9/. Those are valid Unicode character which don't have any glyph associated with them (namely they are invisible characters). However, they do affect how text is displayed. For enforcing LTR text direction, text is usually enclosed between LRE and PDF control characters, while for enforcing RTL text direction, text is usually enclosed between RLE and PDF control characters. As a result of such techniques the text can include UCC characters which will for sure affect both search and sorting of the text. The suggested approach is to ignore UCC which can be used for storing text directionality during text sorting / searching. See https://github.com/w3c/charmod-norm/issues/82 Further comments on this issue will NOT be notified to this list. If you'd like to follow the discussion, please do so by subscribing to the issue via the above link. Do not reply to this email.
Received on Monday, 4 April 2016 17:34:07 UTC