W3C home > Mailing lists > Public > www-international@w3.org > April to June 2000

word breaking CJK languages

From: Stockett, Jeff <stockett@quadralay.com>
Date: Tue, 11 Apr 2000 10:01:08 -0500
Message-ID: <1265B09067C4D311A8E3009027D39997017C0E@khan.quadralay.com>
To: "'www-international@w3.org'" <www-international@w3.org>
Can any one point me to books/RFCs/websites that explain the proper
way to break words for building a full text search database when parsing
HTML/XML in any of the following MBCS encodings:

UTF-8
GB2312
Shift-JIS
EUC-KR
Big5

Thanks,  Jeff
Received on Tuesday, 11 April 2000 10:58:45 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT