- From: Suzanne Topping <stopping@rochester.rr.com>
- Date: Tue, 11 Apr 2000 11:35:43 -0400
- To: "www" <www-international@w3.org>
There was recently a lengthy discussion about sorting Asian Unicode characters on the Unicode list, in case you'd like to take a look through the archives. ----- Original Message ----- From: Stockett, Jeff <stockett@quadralay.com> To: <www-international@w3.org> Sent: Tuesday, April 11, 2000 11:01 AM Subject: word breaking CJK languages > Can any one point me to books/RFCs/websites that explain the proper > way to break words for building a full text search database when parsing > HTML/XML in any of the following MBCS encodings: > > UTF-8 > GB2312 > Shift-JIS > EUC-KR > Big5 > > Thanks, Jeff > >
Received on Tuesday, 11 April 2000 11:41:52 UTC