Re: HTML/SGML parsing (re: sgml-lex)

Daniel W. Connolly (connolly@w3.org)
Thu, 11 Jul 1996 02:21:16 -0400


Message-Id: <199607110621.CAA06909@anansi.w3.org>
To: James Tauber <jtauber@library.uwa.edu.au>
cc: Jim Taylor <JHTaylor@videodiscovery.com>, www-html@w3.org
Subject: Re: HTML/SGML parsing (re: sgml-lex) 
In-reply-to: Your message of "Thu, 11 Jul 1996 13:49:51 +0800."
             <Pine.SUN.3.91.960711132659.29532A-100000@docker.library.uwa.edu.au> 
Date: Thu, 11 Jul 1996 02:21:16 -0400
From: "Daniel W. Connolly" <connolly@w3.org>

In message <Pine.SUN.3.91.960711132659.29532A-100000@docker.library.uwa.edu.au>
, James Tauber writes:
>
>The relevant production for a start-tag is
>
>start-tag = "<", gi, att-spec-list, s*, ">"	(modified [14])

By the way, these productions are available at:

http://www.w3.org/pub/WWW/MarkUp/SGML/productions.html

(thanks to Eric Naggum. html-ization by yours truly, with
thanks to perl by Larry Wall et. al.)


>> The flex input file seems to indicate that spaces but not other
>> whitespace can come after the attribute name and the =. Is this part
>> of SGML syntax?
>
>No. See the above production.

James is right, but so is the flex spec. The {s}* means
not just spaces, but s as in section 6.2.1 of the SGML standard:

===================
http://www.w3.org/pub/WWW/MarkUp/SGML/sgml-lex/sgml.l
$Id: sgml.l,v 1.9 1996/02/07 15:32:28 connolly Exp $

/* 6.2.1 Space */
s		{SPACE}|{RE}|{RS}|{SEPCHAR}

...

  /* <a ^href = "xxx"> -- attribute name */
<ATTR>{name}{s}*={ws}
===================

Perhaps what confused you is {ws}, which is an invention of my own for
convenience. I used it in lots of places, but between attribute name
and =, I used {s}* in stead -- for no reason that I can recall. They
are equivalent by definition:

/* trailing white space */
ws		({SPACE}|{RE}|{RS}|{SEPCHAR})*


Dan