W3C home > Mailing lists > Public > www-lib@w3.org > July to September 2000

Bug in http header parsing ?

From: Francois Nicot <fnicot@silicom.fr>
Date: Mon, 24 Jul 2000 18:25:11 +0200
Message-ID: <397C6DE7.6F1F14BD@silicom.fr>
To: www-lib@w3.org
Hi all,

there is something weird about  the HTAnchor_charset()  . I never get
something in it.

I use libwww-pre.5.3.0 but it was the same in 5.2.8

Please, consider the following test case :

here is a part of a code that I do not decode correctly :
(http://www.silicom.fr/sams/sams_present.htm)
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
<title>sams_present</title>
</head>

I try to collect the URL above (I should add that everything is OK
except this problem).

The following  xxgdb debugger trace shows extracted values from the
Anchor being processed in my terminate_handler.

  content_type = 0x27ac08,
  type_parameters = 0x0,
  meta_tags = 0x28cc48,
  content_base = 0x28d610 "http://www.silicom.fr/sams/sams_present.htm",

  content_encoding = 0x0,
  content_language = 0x0,
  content_length = 16047,

The content_type field of my anchor contains only text/html. The value
of the content_type is as below !?
(xxgdb) print *((HTAtom *) 0x27ac08)
$7 = {
  next = 0x0,
  name = 0x2795c8 "text/html"
}

so , obviously, no charset in it.

the meta_tags field HTAssocList only contains  "GENERATOR" and
"Microsoft FrontPage Express 2.0"

As far as I know, all these values are extracted from the Response
Object  thanks to HTAnchor_update(). So my application is not
responsible for handling with these items.

Where is my charset ????

Has anybody got the same problem ? Am I wrong in my anlysis ?Is it a
known problem ? Do you have a solution ?

Thanks a lot for your answers.

Francois.
Received on Monday, 24 July 2000 12:21:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 23 April 2007 18:18:37 GMT