Re: Software Architecture

For A-Prompt we wrote our own tokenizer/parser. It gives us complete control
over what comes out of the program. It is also quite robust and can handle
all sorts of bad HTML. However, it's a bit of work to create something like

There is an HTML parser control that can be used by C++ or visual Basic
programs (Windows based) that performs HTML parsing. It's easy to use and
seems to work quite well.


----- Original Message -----
From: Leonard R. Kasday <>
To: <>
Sent: Monday, April 10, 2000 10:06 AM
Subject: Software Architecture

> I'd like to start some discussions about the software architecture for the
> tools we're building.
> For starters, here's a low level question: how to parse and process the
> Personally, for WAVE, I've been using the perl HTML::Parser module, which
> does a lexical parse into start tags, end tags, comments, and text.  That
> makes it easy to do a tag by tag analysis, but I wind up doing a home made
> state machine to get beyond tags, and it gets a bit klugy.
> I'm about to recode it to use a proper tree.  One possibility is perl's
> HTML::TreeBuilder which makes it easy to walk the tree, find parents,
> children, attribute values, etc.
> Is there a better way to do this?  What would it buy to switch to Java or
> C++?  How much can we do with XSL?
> Len
> --
> Leonard R. Kasday, Ph.D.
> Institute on Disabilities/UAP, and
> Department of Electrical Engineering
> Temple University
> 423 Ritter Annex, Philadelphia, PA 19122
> (215) 204-2247 (voice)
> (800) 750-7428 (TTY)

Received on Tuesday, 11 April 2000 09:47:19 UTC