Re: What is a language detection algorithm?

One way to detect/infer a language (and character encoding as a bi-product)
is use of N-gram.  This technique make use of statistics of
particular combination of bytes that likely to be appear
in a language (and encoding).
Basis Technology for example has a product
http://www.basistech.com/language-identification/
I'm sure there are other companies and open source projects that
make use of N-gram algorithm.
-- 
KUROSAKA ("Kuro") Teruhiko, San Francisco, California, USA
Internationalization Consultant
http://www.bhlab.com/

Received on Thursday, 4 November 2004 07:01:52 UTC