* Vacuum Joe wrote: >I have a simple question, but it's puzzling me: I have >used LibTidy to parse some HTML files, and then I >recurse through them, exploring every node in the >tree. I can never find any nodes of type >TidyNode_End. That's very good then. In Tidy we have a Node struct that's really misnamed, it's for all sorts of tokens like start-tags, end-tags, document type declarations, etc. end-tags are not useful to keep so these are dropped at parse time. >It does make sense: there should never be an "end" >node, because when you have a node like a P, then all >the content is children of that P node, and therefore >the "end" of this P node should never be encountered. >In other words, all nodes should be either starts, or >text. Well, you get probably different types for <img> and <img/> and there are processing instructions, CDATA sections, comments, etc. you should get those too. -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/Received on Thursday, 9 June 2005 14:16:17 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 13:57:19 GMT