W3C home > Mailing lists > Public > www-international@w3.org > April to June 2000

Re: word breaking CJK languages

From: Suzanne Topping <stopping@rochester.rr.com>
Date: Tue, 11 Apr 2000 11:35:43 -0400
Message-ID: <014301bfa3cb$99ed0840$ab261818@rochester.rr.com>
To: "www" <www-international@w3.org>
There was recently a lengthy discussion about sorting Asian Unicode
characters on the Unicode list, in case you'd like to take a look through
the archives.

----- Original Message -----
From: Stockett, Jeff <stockett@quadralay.com>
To: <www-international@w3.org>
Sent: Tuesday, April 11, 2000 11:01 AM
Subject: word breaking CJK languages


> Can any one point me to books/RFCs/websites that explain the proper
> way to break words for building a full text search database when parsing
> HTML/XML in any of the following MBCS encodings:
>
> UTF-8
> GB2312
> Shift-JIS
> EUC-KR
> Big5
>
> Thanks,  Jeff
>
>
Received on Tuesday, 11 April 2000 11:41:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:55 GMT