Re: KISS (was: Parameter entity references in WF docs) from Dave Peterson on 1997-06-06 (w3c-sgml-wg@w3.org from June 1997)

From: Dave Peterson <davep@acm.org>
Date: Fri, 6 Jun 1997 11:16:12 -0400
To: w3c-sgml-wg@w3.org
Message-Id: <v01540b18afbdd1b24378@[206.119.33.177]>

At 11:49 AM 6/5/97, lee@sq.com wrote:

>The simplest way to implement them that I can see is to push the input
>stack on %name; and to pop it on the end of the corresponding string.
>This isn't exactly the SGML way.

It's not?  What's different?  Are you thinking of the Ee?  That's just
a device to help describe "semantic" conditions limiting where entities
can begin and end.

>I therefore think that implementation would be eased if S were removed
>from all productions and replaced by a set of tokenisation rules.
>The input could then be considered as a token stream rather than as
>a byte stream with delimiters -- that is to say, both views would be
>equally conformant and correct.

>People who have implemented parsers -- might that have been eaiser?

In SGML, tokenization beyond the "each character is a token" level
is difficult without feedback information from the state of the parser.
Do we think XML has eliminated the feedback requirement?

There has been some thought to describing SGML parsing as a two-stage
affair, where the tokens read by the second parser are emitted by the
first parser.  This can be used to hide boundaries of marked sections
and entities (by not emitting them from the first parser) and also to
isolate the non-traditional parsing requirements in the second parser
where, because they are not simultaneously dealing with all the parsing
that can be done "traditionally", the construction might be easier.

Yet more of the stuff that needs to be worked out for the revision.
(Gasp, Pant, "Help!")

Dave Peterson
SGMLWorks!

davep@acm.org

Received on Friday, 6 June 1997 11:16:29 UTC