- From: tomerm via GitHub <sysbot+gh@w3.org>
- Date: Sun, 10 Apr 2016 08:19:29 +0000
- To: public-i18n-archive@w3.org
Code page is not a problem. The problem is in different bidi layouts in which data can be present. Different bidi layouts are realized in different position of characters / text segments. Thus comparing of the same data in different bidi layouts will most definitely produce incorrect results. Since for historic reasons different bidi layouts are associated with EBCDIC code page and legacy systems I mentioned both. However, in general case, even data stored in Unicode can also be present in different Bidi layouts. Bidi layouts and code pages are completely orthogonal concepts (from functional / technical perspectives). The suggested textual amendments is as follows: assure that text being sorted / searched is present in the same bidi layout. Normalization to same bidi layout is conceptually similar to code page conversion. Before you can compare two pieces of text you must assure they are encoded with the same code page (i.e. Unicode). Very similarly, if you wish to compare two pieces of Bidi text, you must assure they are transformed to common bidi layout. For more information on bidi layouts please see: http://www.ibm.com/developerworks/websphere/library/techarticles/bidi/bidigen.html PS. The encodings you mentioned are relevant for display (browser interpret data encoded in such encodings differently). When we are talking about search / sort, we refer to text in the storage. -- GitHub Notification of comment by tomerm Please view or discuss this issue at https://github.com/w3c/charmod-norm/issues/80#issuecomment-207942376 using your GitHub account
Received on Sunday, 10 April 2016 08:19:32 UTC