word breaking CJK languages

Can any one point me to books/RFCs/websites that explain the proper
way to break words for building a full text search database when parsing
HTML/XML in any of the following MBCS encodings:

UTF-8
GB2312
Shift-JIS
EUC-KR
Big5

Thanks,  Jeff

Received on Tuesday, 11 April 2000 10:58:45 UTC