HTML Parser problem from Shashank Kavishwar on 2005-03-09 (www-lib@w3.org from January to March 2005)

From: Shashank Kavishwar <skavishwar@frontbridge.com>
Date: Wed, 9 Mar 2005 12:10:04 -0800
To: <www-lib@w3.org>
Message-ID: <048501c524e3$fe88c700$a801a8c0@internal.bigfish.com>

I have a problem with parsing HTML data.

 

If the HTML data contains a '<' with no ending '>' the rest of the HTML
is not parsed.

 

Eg:

 

<html>

<body>

<p>This is a test paragraph </p>


Normal text here

 

<<<<<<<

John Doe

e-mail: jdoe@hotmail.com

web site: www.johndoe.com <http://www.johndoe.com/> 

 

</body>

</html>

 

When I parse this HTML, using the HTMLToPlain() call, I only get until
'. text here'. Everything after the '<<<<<<' is skipped. 

Am I missing something?

 

Thanks,

Shashank



FrontBridge introduces Message Archive and Secure Email. Get leading Enterprise Message Security services from FrontBridge. www.frontbridge.com.

Received on Thursday, 10 March 2005 01:47:42 UTC