- From: Paul Lin <linjinbo@bj.tom.com>
- Date: Sun, 16 Nov 2003 13:07:34 +0800
- To: www-talk@w3.org
Received on Sunday, 16 November 2003 20:02:20 UTC
I use libwww to grab web pages, but I have difficulty to find the
charset of web page
I have following codes in file to get charset.
========================
anchor = HTAnchor_parent ( (HTAnchor *)HTRequest_anchor ( request ) );
HTCharset charset = HTAnchor_charset(anchor);
if (charset)
strcpy ( pCharsetStr, HTAtom_name(charset) );
else
strcpy ( pCharsetStr, "NONE" ); // pCharsetStr is char array.
HTPrint ("chartset %s\n", pCharsetStr );
=========================
most of the time, the result returns "NONE", seems can not find charset,
but I check the source code of web page, the source codes have
<meta http-equiv="Content-Type" content="text/html; charset=gb2312">
or other charsets.
so I though it should return "gb2312" or whatever after "charset=".
anyone knows how to solve this problem? thanks in advance.
Paul Lin
Received on Sunday, 16 November 2003 20:02:20 UTC