Re: End tags in TidyLib from Bjoern Hoehrmann on 2005-06-09 (html-tidy@w3.org from April to June 2005)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Thu, 09 Jun 2005 16:17:06 +0200
To: Vacuum Joe <vacuumjoe@yahoo.com>
Cc: html-tidy@w3.org
Message-ID: <42be4e7d.74948062@smtp.bjoern.hoehrmann.de>

* Vacuum Joe wrote:
>I have a simple question, but it's puzzling me: I have
>used LibTidy to parse some HTML files, and then I
>recurse through them, exploring every node in the
>tree.  I can never find any nodes of type
>TidyNode_End.

That's very good then. In Tidy we have a Node struct that's really
misnamed, it's for all sorts of tokens like start-tags, end-tags,
document type declarations, etc. end-tags are not useful to keep
so these are dropped at parse time.

>It does make sense: there should never be an "end"
>node, because when you have a node like a P, then all
>the content is children of that P node, and therefore
>the "end" of this P node should never be encountered. 
>In other words, all nodes should be either starts, or
>text.

Well, you get probably different types for <img> and <img/> and
there are processing instructions, CDATA sections, comments, etc.
you should get those too.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Thursday, 9 June 2005 14:16:17 UTC