Re: HTML/SGML parsing (re: sgml-lex)

James Tauber (jtauber@library.uwa.edu.au)
Thu, 11 Jul 1996 13:49:51 +0800 (WST)


Date: Thu, 11 Jul 1996 13:49:51 +0800 (WST)
From: James Tauber <jtauber@library.uwa.edu.au>
To: Jim Taylor <JHTaylor@videodiscovery.com>
cc: connolly@beach.w3.org, www-html@w3.org
Subject: Re: HTML/SGML parsing (re: sgml-lex)
In-Reply-To: <s1e3f52e.009@videodiscovery.com>
Message-ID: <Pine.SUN.3.91.960711132659.29532A-100000@docker.library.uwa.edu.au>

On Wed, 10 Jul 1996, Jim Taylor wrote:
> The question is, what separates tag names from attribute
> specifications? Does SGML explicitly state that attribute
> specifications must be delimited by whitespace, or can any unexpected
> character act as delimiter? 

It must be a delimiter (in the formal SGML sense of "A character string 
assigned to a delimiter role by the concrete syntax" [4.91]) rather than 
just any unexpected character (which is why <abc_def> will fail).

The relevant production for a start-tag is

start-tag = "<", gi, att-spec-list, s*, ">"	(modified [14])

where

att-spec-list = att-spec*			(modified [31])

and

att-spec = s*, (att-name, s*, "=", s*)?, att-val-spec	(modified [32])

Furthermore, "The leading s can only be omitted from an attribute 
specification that follows a delimiter" [7.9]

> And what exactly is whitespace? SPACE, RE, RS, and SEPCHAR only?

Yes, production [5] defines s to be SPACE, RE, RS or SEPCHAR (ie TAB in 
the Reference Concrete Syntax)

> Does that mean <abc
> def>
> is ok? If not, this could be dangerous if an editor or other process
> hard wraps HTML at spaces. 

This is fine as long as def is "an undelimiter name token that is a 
member of a group specified in the declared value for that attribute". 
[7.9.1.2]

> The flex input file seems to indicate that spaces but not other
> whitespace can come after the attribute name and the =. Is this part
> of SGML syntax?

No. See the above production.

James K. Tauber / jtauber@library.uwa.edu.au
University CWIS Coordination Officer
The University of Western Australia