Re: Problems with RAW GET from Roger McCalman on 1999-11-25 (www-lib@w3.org from October to December 1999)

From: Roger McCalman <r.mccalman@elsevier.co.uk>
Date: Thu, 25 Nov 1999 09:14:04 +0000
To: www-lib@w3.org
Message-ID: <19991125091404.C11798@rheidol.elsevier.co.uk>

I would say that the results you are seeing are due to the HTTP 1.1 chunked
encoding. Each block is preceeded by a count. Try using WWW_SOURCE rather
than WWW_RAW.

Cheers, Roger

On Thu, Nov 25, 1999 at 10:04:51AM +0100, Mark Wormgoor wrote:
> Hi,
> 
> Using libwww I am trying to write a small application that will fetch
> newsheaders from sites like slashdot and such.  For this reason I'm using a
> raw get of the page (slashdot and freshmeat use xml).  The platform is
> Redhat 6.0 with libwww 5.2.8.  I'v attached test-source to the program.
> 
> The problem is this.  When I try to fetch the URL in the sourcecode (a
> Dutch newssite), it contains strange characters in the middle of the raw
> output, for example:
> <img src=
> 19c
> '../grafx/nw_letter_nieuws.gif'
> When I download the same page in Netscape, it prints:
> <img src='../grafx/nw_letter_nieuws.gif'
> which is the correct code.  Every time I download the page, these things
> appear at the same place.  When the page changes, I get different
> characters at different locations.
> 
> If somebody knows what's causing this, I would really like to know.
> BTW, I compile this using:
> gcc -O6 `libwww-config --cflags` -Wall `libwww-config --libs` -o test
> test.c
> 
> Kind regards,
> 
>                 Mark Wormgoor
>

Received on Thursday, 25 November 1999 04:14:15 UTC