Tidy nov99 segfaults on some htmls

Dear sir,

I've downloaded latest version of html-tidy to use as preprocessor in
our web-based publishing system, but found out that it segfaults
on some ill-formed html. Particular piece of html was produced by
RTF::Parser perl module ver 1.07, but I suspect that other converters
from office formats would give simular results.

When linked with electric fence debugging library, it gives following
in gdb:
(gdb) run bad.html
Starting program: /usr/local/src/tidy24nov99/tidy bad.html

  Electric Fence 2.0.5 Copyright (C) 1987-1995 Bruce Perens.

Tidy (vers 24th November 1999) Parsing "bad.html"
line 8 column 35 - Warning: missing </b> before <p>
line 8 column 35 - Warning: missing </i> before <p>
line 8 column 37 - Warning: <i> is probably intended as </i>
line 8 column 37 - Warning: trimming empty <p>

Program received signal SIGSEGV, Segmentation fault.
0x804f26f in GetToken (lexer=0x40933f80, mode=0) at lexer.c:1195
1195            if (lexer->token->type != TextNode || (!lexer->insert &&
!lexer->inode))
(gdb) bt
#0  0x804f26f in GetToken (lexer=0x40933f80, mode=0) at lexer.c:1195
#1  0x804c5a6 in ParseBody (lexer=0x40933f80, body=0x4094afc8, mode=0)
    at parser.c:2411
#2  0x8049f77 in ParseTag (lexer=0x40933f80, node=0x4094afc8, mode=0)
    at parser.c:357
#3  0x804cf45 in ParseHTML (lexer=0x40933f80, html=0x4093cfc8, mode=0)
    at parser.c:2908
#4  0x804d027 in ParseDocument (lexer=0x40933f80) at parser.c:2955
#5  0x805811e in main (argc=2, argv=0xbffffca4) at tidy.c:983
(gdb)  
  
Without debugging info and -lefence tidy still
craches on this file, but in some obscure place inside malloc.
Disabling optimization doesn't help.

My platform is Linux x86 glibc 2.0.7 gcc 2.7.2

Piece of ill-formed html and my tidy_config.txt are attached.
--------------------------------------------------
Victor Wagner			vitus@ice.ru
Programmer			Office:7-(095)-203-50-60
Institute for Commerce 		Home: 7-(095)-135-46-61
Engineering                     http://www.ice.ru/~vitus

Received on Monday, 29 November 1999 05:00:35 UTC