Re: charset and language of C strings?

Hi Bill!

Bill Janssen wrote:

> I'd like to find an algorithm to determine the charset and language
> (in the sense of those terms defined by IETF RFC 2277,
> http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2277.txt) of a C
> string, probably using the information returned by a call to setlocale:
>
>         current_locale = setlocale (LC_ALL, NULL);
>
> Is this in any way standardized?  Are there good heuristics that
> can be used?

Which platform(s)? Windows? Mac? Unix?

Where does the C string come from? String literal? Keyboard? Network? File?
Resources?

setlocale is not useful on all platforms. It is somewhat useful on Unix, maybe
also on NT. Don't know about Win95. Probably not on Mac.

There are organizations that have worked on systems that guess the charset
and/or language of a piece of text. Some of those organizations have people on
this mailing list. Maybe they will reply.

Erik

Received on Friday, 6 March 1998 12:46:57 UTC