- From: Thaddeus L. Olczyk <olczyk@interaccess.com>
- Date: Wed, 30 Jun 2004 22:43:14 -0500
- To: www-amaya-dev@w3.org
- Cc: www-amaya-dev@w3.org
On Wed, 30 Jun 2004 15:16:50 +0200, Laurent Carcone <laurent@w3.org> wrote: > >Hello Thaddeus, > >In fact, Amaya uses 2 different parsers, expat for XHTML documents (and for >XML documents in general) Which is of minor interest because this is something I can already do quite easily. >nd an ad'hoc parser for other HTML documents. >This parser is specific to Amaya and has no well-defined API. Nevertheless, >you can have a look on it in the module 'amaya/html2thot.c', and particularly >on the definition of the automaton. > Ok. So you've basically answered my last question, but the first two are still left unanswered. Is the parser relatively bullet proof? I find the combination of Tidy+expat simply unusable. Tidy chokes on some rather simple From previous experience if there are problems with with simple input a system is going to have lots more problems when the input scales up. I don't want to be dealing with tons of special cases that Tidy can't handle. That's the way to disaster. Is the parser code in Amaya easily extractable? I once tried to do the same thing with the pile of -- they call mozilla, and it was a disaster. Now that was mozilla, and using anything from there is asking for trouble. The question is what about Amaya? Thaddeus L. Olczyk ----------------------- Think twice, code once.
Received on Thursday, 1 July 2004 00:32:02 UTC