Re: Parsing HTML: Easiest way? from Daniel W. Connolly on 1995-10-17 (www-html@w3.org from October 1995)

From: Daniel W. Connolly <connolly@beach.w3.org>
Date: Tue, 17 Oct 1995 14:25:49 -0400
To: Bowden Wise <wiseb@cs.rpi.edu>
Cc: www-html@www0.cern.ch
Cc: frystyk@w3.org
Message-Id: <199510171825.OAA17111@beach.w3.org>

In message <199510171753.AA10064@cs.rpi.edu>, Bowden Wise writes:
>
>What I would like to do is parse an HTML file into some structure that
>I can use in my app to base my presentation

>I do not have a Web browser to base my browser on, so my question is
>what is the best way to parse HTML for my purposes?  I am using a
>Windows 3.x platform (16-bit).
>
>Some ideas I have thought of doing include:
>
>- using sgmls

This will work, but it may not be convenient.

>- using the W3C Reference Library

The HTML parsing code in the W3C reference library has gotten
kinda crufty. Henrik has been concentrating on protocols
for quite some time, and the SGML/HTML stuff hasn't been
revised much, even though we've found some bugs and changed
our minds about the best way to do some things.

I've been working on some code to update the library. I have
it working, but I haven't done much integration with the
library.

A tech report describing my work is in progress at:

"A Lexical Analyzer for HTML and Basic SGML"
$Id: sgml-lex.html,v 1.8 1995/10/11 21:47:30 connolly Exp $
http://www.w3.org/pub/WWW/MarkUp/SGML/sgml-lex/sgml-lex.html

It includes a lex spec. You probably can't run lex on a 16bit
platform, but you should be able to use the code that lex
spits out when I run it.

Let me know if you want to be an alpha tester. I don't have
a public distribution ready.

Dan

Received on Tuesday, 17 October 1995 14:26:36 UTC