- From: Chris Ridpath <chris.ridpath@utoronto.ca>
- Date: Tue, 11 Apr 2000 09:46:58 -0400
- To: <w3c-wai-er-ig@w3.org>, "Leonard R. Kasday" <kasday@acm.org>
For A-Prompt we wrote our own tokenizer/parser. It gives us complete control over what comes out of the program. It is also quite robust and can handle all sorts of bad HTML. However, it's a bit of work to create something like this. There is an HTML parser control that can be used by C++ or visual Basic programs (Windows based) that performs HTML parsing. It's easy to use and seems to work quite well. Chris ----- Original Message ----- From: Leonard R. Kasday <kasday@acm.org> To: <w3c-wai-er-ig@w3.org> Sent: Monday, April 10, 2000 10:06 AM Subject: Software Architecture > I'd like to start some discussions about the software architecture for the > tools we're building. > > For starters, here's a low level question: how to parse and process the HTML. > > Personally, for WAVE, I've been using the perl HTML::Parser module, which > does a lexical parse into start tags, end tags, comments, and text. That > makes it easy to do a tag by tag analysis, but I wind up doing a home made > state machine to get beyond tags, and it gets a bit klugy. > > I'm about to recode it to use a proper tree. One possibility is perl's > HTML::TreeBuilder which makes it easy to walk the tree, find parents, > children, attribute values, etc. > > Is there a better way to do this? What would it buy to switch to Java or > C++? How much can we do with XSL? > > Len > > > -- > Leonard R. Kasday, Ph.D. > Institute on Disabilities/UAP, and > Department of Electrical Engineering > Temple University > 423 Ritter Annex, Philadelphia, PA 19122 > > kasday@acm.org > http://astro.temple.edu/~kasday > > (215) 204-2247 (voice) > (800) 750-7428 (TTY) >
Received on Tuesday, 11 April 2000 09:47:19 UTC