- From: tomerm via GitHub <sysbot+gh@w3.org>
- Date: Sun, 10 Apr 2016 08:19:29 +0000
- To: public-i18n-archive@w3.org
Code page is not a problem. The problem is in different bidi layouts
in which data can be present. Different bidi layouts are realized in
different position of characters / text segments. Thus comparing of
the same data in different bidi layouts will most definitely produce
incorrect results. Since for historic reasons different bidi layouts
are associated with EBCDIC code page and legacy systems I mentioned
both. However, in general case, even data stored in Unicode can also
be present in different Bidi layouts. Bidi layouts and code pages are
completely orthogonal concepts (from functional / technical
perspectives).
The suggested textual amendments is as follows:
assure that text being sorted / searched is present in the same
bidi layout.
Normalization to same bidi layout is conceptually similar to code page
conversion. Before you can compare two pieces of text you must assure
they are encoded with the same code page (i.e. Unicode). Very
similarly, if you wish to compare two pieces of Bidi text, you must
assure they are transformed to common bidi layout. For more
information on bidi layouts please see:
http://www.ibm.com/developerworks/websphere/library/techarticles/bidi/bidigen.html
PS. The encodings you mentioned are relevant for display (browser
interpret data encoded in such encodings differently). When we are
talking about search / sort, we refer to text in the storage.
--
GitHub Notification of comment by tomerm
Please view or discuss this issue at
https://github.com/w3c/charmod-norm/issues/80#issuecomment-207942376
using your GitHub account
Received on Sunday, 10 April 2016 08:19:32 UTC