- From: Karl Dubost <karl@w3.org>
- Date: Thu, 7 Dec 2006 14:55:07 +0900
Sam, Le 6 d?c. 2006 ? 23:13, Sam Ruby a ?crit : > My original interest was to write a replacement for Python's > SGMLLIB, i.e., one that was not based on the theoretical ideal of > how SGML vocabularies work, but one based on the practical notion > of how HTML actually is parsed. I'm not sure sgmllib would be the best target. Specifically if it's used in many other products. But maybe you are talking about a new library altogether. http://docs.python.org/lib/module-sgmllib.html 8.2 sgmllib -- Simple SGML parser This module defines a class SGMLParser which serves as the basis for parsing text files formatted in SGML (Standard Generalized Mark-up Language). In fact, it does not provide a full SGML parser -- it only parses SGML insofar as it is used by HTML, and the module only exists as a base for the htmllib module. Another HTML parser which supports XHTML and offers a somewhat different interface is available in the HTMLParser module. It seems a better candidate. http://docs.python.org/lib/module-HTMLParser.html 8.1 HTMLParser -- Simple HTML and XHTML parser New in version 2.2. This module defines a class HTMLParser which serves as the basis for parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML. Unlike the parser in htmllib, this parser is not based on the SGML parser in sgmllib. I'm adding them to the list of HTML parsers. http://esw.w3.org/topic/HTMLAsSheAreSpoke -- Karl Dubost - http://www.w3.org/People/karl/ W3C Conformance Manager, QA Activity Lead QA Weblog - http://www.w3.org/QA/ *** Be Strict To Be Cool ***
Received on Wednesday, 6 December 2006 21:55:07 UTC