Re: HTML/SGML parsing (re: sgml-lex)

In message <Pine.SUN.3.91.960711132659.29532A-100000@docker.library.uwa.edu.au>
, James Tauber writes:
>
>The relevant production for a start-tag is
>
>start-tag = "<", gi, att-spec-list, s*, ">"	(modified [14])

By the way, these productions are available at:

http://www.w3.org/pub/WWW/MarkUp/SGML/productions.html

(thanks to Eric Naggum. html-ization by yours truly, with
thanks to perl by Larry Wall et. al.)


>> The flex input file seems to indicate that spaces but not other
>> whitespace can come after the attribute name and the =. Is this part
>> of SGML syntax?
>
>No. See the above production.

James is right, but so is the flex spec. The {s}* means
not just spaces, but s as in section 6.2.1 of the SGML standard:

===================
http://www.w3.org/pub/WWW/MarkUp/SGML/sgml-lex/sgml.l
$Id: sgml.l,v 1.9 1996/02/07 15:32:28 connolly Exp $

/* 6.2.1 Space */
s		{SPACE}|{RE}|{RS}|{SEPCHAR}

...

  /* <a ^href = "xxx"> -- attribute name */
<ATTR>{name}{s}*={ws}
===================

Perhaps what confused you is {ws}, which is an invention of my own for
convenience. I used it in lots of places, but between attribute name
and =, I used {s}* in stead -- for no reason that I can recall. They
are equivalent by definition:

/* trailing white space */
ws		({SPACE}|{RE}|{RS}|{SEPCHAR})*


Dan

Received on Thursday, 11 July 1996 02:21:22 UTC