W3C home > Mailing lists > Public > www-lib@w3.org > January to March 1996

SGML/HTML Lexical analyzer update

From: Daniel W. Connolly <connolly@beach.w3.org>
Date: Wed, 07 Feb 1996 12:31:37 -0500
Message-Id: <m0tkDiT-0002U6C@beach.w3.org>
To: "Bob Peterson" <peterson@openmarket.com>, Art Pollard <pollarda@hawaii.edu>, David Megginson <dmeggins@aix1.uottawa.ca>, David Ornstein <davido@apocalypse.org>, Donald Beaudry <Donald.Beaudry@sgibos.boston.sgi.com>, Hakon Lie <Hakon.Lie@sophia.inria.fr>, Ian Burrell <iburrell@leland.stanford.edu>, Joris Roling <joris@altair.nl>, Martijn Koster <m.koster@webcrawler.com>, Rick Jelliffe <ricko@allette.com.au>, Simon Watfa <simonw@quadrus.com>, Tom Christiansen <tchrist@mox.perl.com>, chris@walkaboutsoft.com (Chris Lovett), combee@techwood.org (Ben Combee), donpark@telewise.com (Don Park), iburrell@loki.stanford.edu (Ian Burrell), jbottoms@world.std.com (John W Bottoms), pch@mystech.com (Pete Halverson)
Cc: www-html@w3.org, w3c-tech@w3.org, www-lib@w3.org
First, thanks for all the great feedback on the sgml-lex report and
code. I am happy to announce this release, which incorporates much of
it. Stay tuned to

	http://www.w3.org/pub/WWW/MarkUp/SGML/#sgml-lex

for details (including the tech report and source distribution).

The relavent excerpt is attached.

Recent changed include:


revision 1.8
date: 1996/02/07 15:32:31;  author: connolly;  state: Exp;  lines: +25 -14
* SGML_lexCase -> SGML_lexNorm, which covers whitespace etc.
	as well as case conversion. This allows pass-thru filtering.

	This involved changing the way whitespace is handled in the lexer.

	Also, tag close tokens (>) are explicitly reported.

	sgml_lex -c becomes sgml_lex -n

	@@ problem remaining: erroneous markup is reported out
		of order

* added filter test

* Fixed a bug in main.c reported in:
	From: Joris Roling <joris@altair.nl>
	To: "'Connolly, Dan'" <connolly@w3.org>
	Subject: Remarks on 'A Lexical Analyzer for HTML and Basic SGML'
	Date: Fri, 19 Jan 96 14:16:00 CET
	Message-Id: <30FF9AC4@msmsmtp>

* fixed lex spec bug reported in:

	Message-Id: <v01530502ad25cc1a251b@[206.86.76.80]>
	To: www-html@w3.org
	From: chris@walkaboutsoft.com (Chris Lovett)
	Subject: Re: Daniel Connolly's SGML Lex Specification

* fixed memory leak reported in:

	Message-Id: <01BAEB69.31095AA0@cadc140.cadvision.com>
	From: Simon Watfa <simonw@quadrus.com>
	To: "'www-html@w3.org'" <www-html@w3.org>
	Subject: sgml-lex
	Date: Thu, 25 Jan 1996 21:01:28 -0700


The one remaining major bug is that the case of malloc() returning
NULL treated as a fatal error (i.e. abort() is called).

The python support is still spotty. In fact, I haven't really tested
the python module this time.

A number of higher level APIs are needed, and to some extent planned.
The first thing is just something to reduce an attribute value literal ala:

	"abc&#65;&quot;def"

to its value:

	abcA"def



--part
Content-Type: text/html

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<base href="http://www.w3.org/pub/WWW/MarkUp/SGML/">
<TITLE>SGML and the Web</TITLE>
</HEAD>
<BODY>

<DL>

<DT><A HREF="sgml-lex/sgml-lex"> A Lexical Analyzer for HTML and Basic
SGML</A>

<DD>W3C Tech Report on SGML <A ID=sgml-lex>low-level parsing
details</A>.  Includes <A href="sgml-lex/sgml.l">flex spec</A>, <A
href="sgml-lex/lex-test.sgm">test file</A>, and source distribution:

<PRE>
-rw-rw-r--   1 connolly 69          50650 Feb  7 11:59 <A HREF="sgml-lex/sgml-lex-19960207.tar.gz">sgml-lex-19960207.tar.gz</A>
-rw-rw-r--   1 connolly 69          57182 Feb  7 12:00 <A HREF="sgml-lex/sgml-lex-19960207.zip">sgml-lex-19960207.zip</A>
21f7b70ec7135531bc84fd4c5e3cdf3d  <A HREF="sgml-lex/sgml-lex-19960207.tar.gz">sgml-lex-19960207.tar.gz</A> (<A HREF="sgml-lex/sgml-lex-19960207.tar.gz.asc">pgp sig</A>)
083e21759d223b1005402120cdbf8169  <A HREF="sgml-lex/sgml-lex-19960207.zip">sgml-lex-19960207.zip</A> (<A HREF="sgml-lex/sgml-lex-19960207.zip.asc">pgp sig</A>)
</PRE>
</dl>

--part--
Received on Wednesday, 7 February 1996 12:36:50 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 23 April 2007 18:18:26 GMT