[Prev][Next][Index][Thread]

HTML parser and incomplete entities



I am using an old version of the W3C lib for parsing HTML files, and
I have discovered that the SGML parser is unable to handle entities that
are not terminated with ';'.

For example:

	déja	->	déja

	d&eacuteja	->	d&eacuteja

Unfortunately, most of the HTML source I am using is written for Netsape,
and the entities are not terminated with ';'.

Is there a new version of the W3C HTML parser that bypasses this
limitation, or does somebody modifiy the parser sources for that ?

Thanks in advance !

--
______________________________________________________________________
Romain Vignes                                     Computer Answer Line
Macintosh Software Engineer                           92, cours Vitton
E-mail: rvignes@cal.fr                             69006 LYON - FRANCE
Tel: +33 72 83 10 18                                 http://www.cal.fr
______________________________________________________________________