Parsing HTML: Easiest way?

Bowden Wise (wiseb@cs.rpi.edu)
Tue, 17 Oct 1995 13:53:07 -0400


Message-Id: <199510171753.AA10064@cs.rpi.edu>
To: www-html@www0.cern.ch
Subject: Parsing HTML: Easiest way?
Date: Tue, 17 Oct 1995 13:53:07 -0400
From: Bowden Wise <wiseb@cs.rpi.edu>


Hello HTML colleagues:

I am doing some reasearch in multimodal interfaces and want to develop
a multimodal Web browser.   I want to use sound/speech as well as
visual graphics to present HTML to users. 

Since this is a demonstration app for my PhD, I do not need to develop
a full featured browser. 

What I would like to do is parse an HTML file into some structure that
I can use in my app to base my presentation on (either auditory or
visual) so that both presentations are driven by the same high level
information about the HTML file.

I do not have a Web browser to base my browser on, so my question is
what is the best way to parse HTML for my purposes?  I am using a
Windows 3.x platform (16-bit).

Some ideas I have thought of doing include:

- using sgmls
- using the W3C Reference Library
- using the W3C Line Mode Browser as a base

are there any other mechanisms I might use?  I haven't much experience
with coding browsers/parsers for HTML.  So, I welcome your insights
into this dilemma.  

Also, I have not subscribed to www-html, so please reply via e-mail.

Many thanks in advance. 
Bowden

--------------------------------------------------------------------
G. Bowden Wise
Computer Science Dept, Rensselaer Polytechnic Inst, Troy, NY 12180
Email: wiseb@cs.rpi.edu         WWW: http://www.cs.rpi.edu/~wiseb/