SGML/HTML Lexical analyzer update

Daniel W. Connolly (
Wed, 07 Feb 1996 12:31:37 -0500

Message-Id: <>
To: "Bob Peterson" <>,
Subject: SGML/HTML Lexical analyzer update
Date: Wed, 07 Feb 1996 12:31:37 -0500
From: "Daniel W. Connolly" <>


First, thanks for all the great feedback on the sgml-lex report and
code. I am happy to announce this release, which incorporates much of
it. Stay tuned to

for details (including the tech report and source distribution).

The relavent excerpt is attached.

Recent changed include:

revision 1.8
date: 1996/02/07 15:32:31;  author: connolly;  state: Exp;  lines: +25 -14
* SGML_lexCase -> SGML_lexNorm, which covers whitespace etc.
	as well as case conversion. This allows pass-thru filtering.

	This involved changing the way whitespace is handled in the lexer.

	Also, tag close tokens (>) are explicitly reported.

	sgml_lex -c becomes sgml_lex -n

	@@ problem remaining: erroneous markup is reported out
		of order

* added filter test

* Fixed a bug in main.c reported in:
	From: Joris Roling <>
	To: "'Connolly, Dan'" <>
	Subject: Remarks on 'A Lexical Analyzer for HTML and Basic SGML'
	Date: Fri, 19 Jan 96 14:16:00 CET
	Message-Id: <30FF9AC4@msmsmtp>

* fixed lex spec bug reported in:

	Message-Id: <v01530502ad25cc1a251b@[]>
	From: (Chris Lovett)
	Subject: Re: Daniel Connolly's SGML Lex Specification

* fixed memory leak reported in:

	Message-Id: <>
	From: Simon Watfa <>
	To: "''" <>
	Subject: sgml-lex
	Date: Thu, 25 Jan 1996 21:01:28 -0700

The one remaining major bug is that the case of malloc() returning
NULL treated as a fatal error (i.e. abort() is called).

The python support is still spotty. In fact, I haven't really tested
the python module this time.

A number of higher level APIs are needed, and to some extent planned.
The first thing is just something to reduce an attribute value literal ala:


to its value:


Content-Type: text/html

<base href="">
<TITLE>SGML and the Web</TITLE>


<DT><A HREF="sgml-lex/sgml-lex"> A Lexical Analyzer for HTML and Basic

<DD>W3C Tech Report on SGML <A ID=sgml-lex>low-level parsing
details</A>.  Includes <A href="sgml-lex/sgml.l">flex spec</A>, <A
href="sgml-lex/lex-test.sgm">test file</A>, and source distribution:

-rw-rw-r--   1 connolly 69          50650 Feb  7 11:59 <A HREF="sgml-lex/sgml-lex-19960207.tar.gz">sgml-lex-19960207.tar.gz</A>
-rw-rw-r--   1 connolly 69          57182 Feb  7 12:00 <A HREF="sgml-lex/"></A>
21f7b70ec7135531bc84fd4c5e3cdf3d  <A HREF="sgml-lex/sgml-lex-19960207.tar.gz">sgml-lex-19960207.tar.gz</A> (<A HREF="sgml-lex/sgml-lex-19960207.tar.gz.asc">pgp sig</A>)
083e21759d223b1005402120cdbf8169  <A HREF="sgml-lex/"></A> (<A HREF="sgml-lex/">pgp sig</A>)