Re: HTML Parsing Step?

/ Alex Milowski <alex@milowski.org> was heard to say:
| On 7/3/07, Norman Walsh <ndw@nwalsh.com> wrote:
|> / Alex Milowski <alex@milowski.org> was heard to say:
|> | At the end of our e-mail discussion in May I suggested we have a separate
|> | step for parsing HTML.  I still think this is a good idea.  Anyone else?
|>
|> So this is the equivalent of "tidy" not the equivalent of "tagsoup",
|> right?
|
| I don't understand this question.
|
| Tidy and Tagsoup cleanup HTML.

You're right. Brain cramp. I was thinking that tidy had knowledge of
the HTML vocabulary (that img and hr are empty, for example) whereas
tagsoup just cleaned up not-well-formed XML. But that's not the case.
So nevermind.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | You must not think me necessarily
http://nwalsh.com/            | foolish because I am facetious, nor
                              | will I consider you necessarily wise
                              | because you are grave.--Sydney Smith

Received on Tuesday, 3 July 2007 15:14:35 UTC