- From: Daniel W. Connolly <connolly@beach.w3.org>
- Date: Tue, 17 Oct 1995 14:25:49 -0400
- To: Bowden Wise <wiseb@cs.rpi.edu>
- Cc: www-html@www0.cern.ch
- Cc: frystyk@w3.org
In message <199510171753.AA10064@cs.rpi.edu>, Bowden Wise writes: > >What I would like to do is parse an HTML file into some structure that >I can use in my app to base my presentation >I do not have a Web browser to base my browser on, so my question is >what is the best way to parse HTML for my purposes? I am using a >Windows 3.x platform (16-bit). > >Some ideas I have thought of doing include: > >- using sgmls This will work, but it may not be convenient. >- using the W3C Reference Library The HTML parsing code in the W3C reference library has gotten kinda crufty. Henrik has been concentrating on protocols for quite some time, and the SGML/HTML stuff hasn't been revised much, even though we've found some bugs and changed our minds about the best way to do some things. I've been working on some code to update the library. I have it working, but I haven't done much integration with the library. A tech report describing my work is in progress at: "A Lexical Analyzer for HTML and Basic SGML" $Id: sgml-lex.html,v 1.8 1995/10/11 21:47:30 connolly Exp $ http://www.w3.org/pub/WWW/MarkUp/SGML/sgml-lex/sgml-lex.html It includes a lex spec. You probably can't run lex on a 16bit platform, but you should be able to use the code that lex spits out when I run it. Let me know if you want to be an alpha tester. I don't have a public distribution ready. Dan
Received on Tuesday, 17 October 1995 14:26:36 UTC