- From: Laurent Carcone <laurent@w3.org>
- Date: Wed, 30 Jun 2004 15:16:50 +0200
- To: olczyk@interaccess.com
- Cc: www-amaya-dev@w3.org
Hello Thaddeus, In fact, Amaya uses 2 different parsers, expat for XHTML documents (and for XML documents in general) and an ad'hoc parser for other HTML documents. This parser is specific to Amaya and has no well-defined API. Nevertheless, you can have a look on it in the module 'amaya/html2thot.c', and particularly on the definition of the automaton. Hope this will help you, Laurent Carcone > > Hi. > I've been going nuts looking for a non-perl HTML parser > which handles "real world" HTML. On the libwww page, > it says that their parser is primitive and if you are looking > for a robust HTML parser, look at Amaya. > > So I've gotten Amaya. I've skinned through the documentation. > It seems rather vague on where the parser is and what it's API > is. > > So three questions. > For a person for whom expat, libxml and libwww used with ( or without) > HTML Tidy is not good enough, will the parser in Amaya be sufficient? > > Is the Amaya code modularised enough to extract the parser? > > In terms of the code, where would I start with the procedure. > > Thank You > -- > Thaddeus L. Olczyk > ----------------------- > Think twice, code once. > >
Received on Wednesday, 30 June 2004 09:17:05 UTC