RE: error when clean html with tidy (fwd)

At 1:17 AM -0500 8/24/01, Chunbo Shao wrote:
>thanks for reply.
>
>Attached files are some html page on some university. I didn't make the
>page. I just use Tidy to parse it.
>
>The file "48-washington.edu" gave the error "Error: <meta> missing '>' for
>end of tag".
>
>The file "42-upenn.edu" gave me the error "Error: <a> missing '>' for end
>of tag".
>
>At the beginning of each file, you can see the url link address for this
>url file. I already took out these extra lines before I use Tidy to clean
>this url file.
>
>I cannot see any clue to figure out why the error happans.
>thanks for help.

I have seen some horrible HTML generated in my time, but those 2 documents
are terrible, and I'm not surprized Tidy has a problem. Hopefully the HTML
below won't get mangled by the mail.

1) "48-washington.edu" :

(a) There are 3 <body>...</body> which are nested.

(b) The META tag Tidy is complaining about is in the following (line #45) :

line 45 column 16 - Error: <meta> missing '>' for end of tag

<META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=iso-8859-1"<BR>

Note the missing >. If you fix this error, Tidy will produce output.

(c) I won't comment on the rest of the horrible HTML code.

2) "42-upenn.edu" :

(a) The A tag Tidy is complaining about is in the following (line #192) :

line 192 column 5 - Error: <a> missing '>' for end of tag
line 192 column 59 - Error: discarding unexpected </a>

<LI><A href="http://www.arstechnica.com/"</A>Ars Technica</A> <FONT
Size=-2>Ars Technica</FONT><br>

Note the missing >. And the extra </A>. If you fix this error, Tidy will
produce output.

I used the current (23 Aug 01) C version of Tidy, so you might see
different behaviour from an older version. But as I see it, Tidy tells you
exactly what and where the error is - all you need to do is look at the
rest of the line that Tidy is complaining about, to see the problem,.

Regards, Terry

Received on Friday, 24 August 2001 03:15:44 UTC