- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Thu, 09 Jun 2005 16:17:06 +0200
- To: Vacuum Joe <vacuumjoe@yahoo.com>
- Cc: html-tidy@w3.org
* Vacuum Joe wrote: >I have a simple question, but it's puzzling me: I have >used LibTidy to parse some HTML files, and then I >recurse through them, exploring every node in the >tree. I can never find any nodes of type >TidyNode_End. That's very good then. In Tidy we have a Node struct that's really misnamed, it's for all sorts of tokens like start-tags, end-tags, document type declarations, etc. end-tags are not useful to keep so these are dropped at parse time. >It does make sense: there should never be an "end" >node, because when you have a node like a P, then all >the content is children of that P node, and therefore >the "end" of this P node should never be encountered. >In other words, all nodes should be either starts, or >text. Well, you get probably different types for <img> and <img/> and there are processing instructions, CDATA sections, comments, etc. you should get those too. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Thursday, 9 June 2005 14:16:17 UTC