W3C home > Mailing lists > Public > www-talk@w3.org > November to December 2003

Problem on finding charset

From: Paul Lin <linjinbo@bj.tom.com>
Date: Sun, 16 Nov 2003 13:07:34 +0800
To: www-talk@w3.org
Message-Id: <1068959253.5653.8.camel@nb.nearbyweb.net>
I use libwww to grab web pages, but I have difficulty to find the
charset of web page
I have following codes in file to get charset.
========================
  anchor = HTAnchor_parent ( (HTAnchor *)HTRequest_anchor ( request ) );
		HTCharset charset = HTAnchor_charset(anchor);
		if (charset)
			strcpy ( pCharsetStr, HTAtom_name(charset) );
		else
			strcpy ( pCharsetStr, "NONE" ); // pCharsetStr is char array.
    
HTPrint ("chartset %s\n", pCharsetStr ); 
=========================
most of the time, the result returns "NONE", seems can not find charset,
but I check the source code of web page, the source codes have
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
or other charsets.
so I though it should return "gb2312" or whatever after "charset=".

anyone knows how to solve this problem? thanks in advance.

Paul Lin
Received on Sunday, 16 November 2003 20:02:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 27 October 2010 18:14:28 GMT