W3C home > Mailing lists > Public > www-international@w3.org > January to March 1998

Re: charset and language of C strings?

From: Erik van der Poel <erik@netscape.com>
Date: Fri, 06 Mar 1998 09:46:27 -0800
Message-ID: <35003672.71AA1269@netscape.com>
To: Bill Janssen <janssen@parc.xerox.com>
CC: www-international@w3.org
Hi Bill!

Bill Janssen wrote:

> I'd like to find an algorithm to determine the charset and language
> (in the sense of those terms defined by IETF RFC 2277,
> http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2277.txt) of a C
> string, probably using the information returned by a call to setlocale:
>
>         current_locale = setlocale (LC_ALL, NULL);
>
> Is this in any way standardized?  Are there good heuristics that
> can be used?

Which platform(s)? Windows? Mac? Unix?

Where does the C string come from? String literal? Keyboard? Network? File?
Resources?

setlocale is not useful on all platforms. It is somewhat useful on Unix, maybe
also on NT. Don't know about Win95. Probably not on Mac.

There are organizations that have worked on systems that guess the charset
and/or language of a piece of text. Some of those organizations have people on
this mailing list. Maybe they will reply.

Erik
Received on Friday, 6 March 1998 12:46:57 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:52 GMT