[Prev][Next][Index][Thread]

Re: A17: keep or drop entities?



> BTW, James is totally right that supporting external text entities does
> increase the plumbing complexity in the parser - the one time I did this it
> seemed not to break the grad-could-do-it-in-a-week barrier, but it is
> outside the scope of the 5-line perl hack job. 

This is true.  It's one reason that having the C preprocessor available as a
separate program is so useful -- macros (like SGML genera/character entities)
get expanded before the parser does its job.  This is good because it
simplifies the parser & encourage reuse, and horribly bad because it means
that error handling in the parser is in terms of an intermediate
representation that is alien to the user.

However, the ability to produce an "expanded" form of an XML document, with
all external entities in place, would reduce the onus on parser writers, if
only because it would demonstrate that it can be done in a separate pass.
Let's get this right.  It's hard to do in SGML because of marked sections
and the way that entity definitions work (although I have not written a
validating parser & am open to being corrected here).  It was impossible
in troff -- "soelim" does not work in the general case -- because of the
way that conditional text interacted with file inclusion.

If included files are visible to the parser both before and after inclusion,
error messages can be more helpful -- e.g.
    <include location="myfile.xml" expanded=yes when="1996/07/21@18:21.02">
       if the file is in place, the contents are here, expanded=yes, and
       the optional "when" attribute gives the GMT date of expansion
    </include>

Lee