RE: word breaking CJK languages

If you decide to investigate products to handle this issue, my company
offers a Chinese Morphological Analyzer and Japanese Morphological Analyzer.
Product information can be found at http://www.basistech.com/products/ .

>Can any one point me to books/RFCs/websites that explain the proper
>way to break words for building a full text search database when parsing
>HTML/XML in any of the following MBCS encodings:

>UTF-8
>GB2312
>Shift-JIS
>EUC-KR
>Big5

>Thanks,  Jeff


> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
> Jeff Halperin           One Kendall Square      Tel: 617-252-5636
> Basis Technology Corp.  Cambridge, MA 02139     Fax: 617-252-9150
> jeff@basistech.com      U.S.A.                  www.basistech.com 
> 

Received on Thursday, 27 April 2000 17:07:12 UTC