Re: word breaking CJK languages

There was recently a lengthy discussion about sorting Asian Unicode
characters on the Unicode list, in case you'd like to take a look through
the archives.

----- Original Message -----
From: Stockett, Jeff <stockett@quadralay.com>
To: <www-international@w3.org>
Sent: Tuesday, April 11, 2000 11:01 AM
Subject: word breaking CJK languages


> Can any one point me to books/RFCs/websites that explain the proper
> way to break words for building a full text search database when parsing
> HTML/XML in any of the following MBCS encodings:
>
> UTF-8
> GB2312
> Shift-JIS
> EUC-KR
> Big5
>
> Thanks,  Jeff
>
>

Received on Tuesday, 11 April 2000 11:41:52 UTC