One way to detect/infer a language (and character encoding as a bi-product) is use of N-gram. This technique make use of statistics of particular combination of bytes that likely to be appear in a language (and encoding). Basis Technology for example has a product http://www.basistech.com/language-identification/ I'm sure there are other companies and open source projects that make use of N-gram algorithm. -- KUROSAKA ("Kuro") Teruhiko, San Francisco, California, USA Internationalization Consultant http://www.bhlab.com/Received on Thursday, 4 November 2004 07:01:52 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 20 September 2007 14:34:18 GMT