Bugs: mod_php, w3c-libwww, "content-length", "chunked" and RFC 2068

So, I was using w3c-libwww to download some XML documents generated by
mod_php, and I kept losing the last 5 or 6 bytes.  Ethereal said those
bytes were on the wire, so I dug into w3c-libwww's guts.

Half a day later, here's what I've learned:

  mod_php will happily generate messages that include both a content-length
  and a "chunked" transfer encoding *if* a script supplies a
  "content-length" header.  The content-length header will not include the
  bytes used to specify the chunk size.  I don't know where or how this
  happens in mod_php.

  w3c-libwww 5.2.8 does not respond gracefully to messages which have both
  a content-length and a "chunked" transfer encoding.  And I *think* this
  problem exists in CVS, too, but I'm not entirely sure.

According to RFC 2068:

  Messages MUST NOT include both a Content-Length header field and the
  "chunked" transfer coding. If both are received, the Content-Length
  MUST be ignored.

But w3c-libwww gets this precisely backwards.  If, for example, a server
sends an illegal message of the form:

  Transfer-Encoding: chunked\r\n
  Content-Type: text/xml\r\n
  Content-length: 32\r\n
  \r\n
  20\r\n
  (32 characters in body)
  0\r\n
  \r\n

...then w3c-libwww feeds it through HTMIME, which honors the content-length
header, and chops off four trailing bytes to make up for the "20\r\n".  (It
then does various unspeakable things with the last four bytes of the
message and the five trailer bytes, but this isn't important.)

So this is technically a bug in mod_php *and* w3c-libwww. ;-)

Cheers,
Eric

(If you need any more information from me, please contact me via e-mail.
I'm not on the php-dev or w3c-libwww mailing lists.)

Received on Thursday, 28 June 2001 03:29:48 UTC