W3C home > Mailing lists > Public > w3c-wai-er-ig@w3.org > April 2000

Re: Software Architecture

From: Chris Ridpath <chris.ridpath@utoronto.ca>
Date: Tue, 11 Apr 2000 09:46:58 -0400
Message-ID: <09aa01bfa3bc$696c7d40$b040968e@ic.utoronto.ca>
To: <w3c-wai-er-ig@w3.org>, "Leonard R. Kasday" <kasday@acm.org>
For A-Prompt we wrote our own tokenizer/parser. It gives us complete control
over what comes out of the program. It is also quite robust and can handle
all sorts of bad HTML. However, it's a bit of work to create something like
this.

There is an HTML parser control that can be used by C++ or visual Basic
programs (Windows based) that performs HTML parsing. It's easy to use and
seems to work quite well.

Chris


----- Original Message -----
From: Leonard R. Kasday <kasday@acm.org>
To: <w3c-wai-er-ig@w3.org>
Sent: Monday, April 10, 2000 10:06 AM
Subject: Software Architecture


> I'd like to start some discussions about the software architecture for the
> tools we're building.
>
> For starters, here's a low level question: how to parse and process the
HTML.
>
> Personally, for WAVE, I've been using the perl HTML::Parser module, which
> does a lexical parse into start tags, end tags, comments, and text.  That
> makes it easy to do a tag by tag analysis, but I wind up doing a home made
> state machine to get beyond tags, and it gets a bit klugy.
>
> I'm about to recode it to use a proper tree.  One possibility is perl's
> HTML::TreeBuilder which makes it easy to walk the tree, find parents,
> children, attribute values, etc.
>
> Is there a better way to do this?  What would it buy to switch to Java or
> C++?  How much can we do with XSL?
>
> Len
>
>
> --
> Leonard R. Kasday, Ph.D.
> Institute on Disabilities/UAP, and
> Department of Electrical Engineering
> Temple University
> 423 Ritter Annex, Philadelphia, PA 19122
>
> kasday@acm.org
> http://astro.temple.edu/~kasday
>
> (215) 204-2247 (voice)
> (800) 750-7428 (TTY)
>
Received on Tuesday, 11 April 2000 09:47:19 GMT

This archive was generated by hypermail 2.2.0 + w3c-0.30 : Thursday, 9 June 2005 12:10:34 GMT