Re: KISS again

On Fri, 20 Jun 1997 18:12:18 -0400 (EDT) Peter Murray-Rust said:
>Most of the next generation of XML-techies are not going to create DTDs.
>They don't know how, haven't been apprenticed, and don't see the point.
>So the number of DTDs created will be quite small.  Most of these will
>come from the currently SGML-literate.  IMO this represents quite a
>small proportion of the XML community - if it doesn't then XML will not
>be a universal success.

All likely enough.  These are good reasons not to force people to use
DTDs in XML, and it is for reasons like these that DTDs were made
optional in the first place.

When did these become reasons to *forbid* DTDs to those who do know
how to make them, or who do want to use the DTDs made by others?

>Please also accept that some of the things labelled 'simple' in the XML
>spec are not necessarily simple for newcomers.  PE's may now be simple,
>but their implementation is not easy to determine from the spec.  If
>they really are as simple as DEFINE, then highlight the rules, please
>:-)

OK.  I've made one suggestion (define PE resolution as optional for
non-validating parsers).  If that is adopted, PE resolution can be
performed by a pre-processor which knows about nothing except
PE entity declarations, marked sections, and PE references (attending
to marked sections is presumably the one thing Henry Thompson's
preprocessor doesn't do:  if it did, Henry wouldn't be bothering to
expand PE references within ignored sections ...), and any copy of
Software Tools should allow you to build the PE pre-processor.  (You
can extend it to expand all general entities except < etc., too,
if you want to really simplify life downstream.)

Here's another.  I've said that the XML restrictions on where PE
refs are recognized and expanded are easy to enforce if the relevant
entity boundaries are exposed in the grammar (I'm assuming a
yacc/lex-style division of labor; those writing recursive-descent
parsers will need longer look-aheads than a single token, but
should get by with finite lookahead).

To make this process simpler, I've made an experimental version of
the XML spec, in which the % operator appears *only* when it
governs the entire right-hand side of a production.  Each rule
of the form

  lhs ::= %(r1 r2 r3)

can be translated into yacc in the form

  lhs ::= r1 r2 r3
      |   PE_START r1 r2 r3 PE_END
      ;

From there on out, the parser generator should take care of things,
if you're using one.

If you're not bothering to expand the PE refs, then the rule is
translated into yacc as:

  lhs ::= r1 r2 r3
      |   PE_REF
      ;

In which case the lexer is responsible for returning PEREF at the
right times.

If those interested would take a look at the resulting grammar and
see whether it helps or not, I'd be grateful for comments.  The
experimental version of the spec is at:

  http://www.uic.edu/~cmsmcq/non_public/tech/xml/xmlpe.html

(as the path suggests, this is not intended for public distribution;
please don't point at it from public pages, and remember I will
probably take it down without warning.)

Don't bother pointing out that the names of the new non-terminals
aren't very good ones:  I know that.  If you can think of better ones,
I'd like to hear them.

Tim asked whether we could at least forbid PE refs in the internal
subset.  Answer:  not without loss of function.  The customization of
DTDs via PEs uses the internal subset -- at the very least, as in the
TEI, you need to declare a couple PEs with magic names, so the parser
knows where to find your local modification files.  In other DTDs, all
the local customization typically takes place in the internal subset.

So the compromises which seem plausible to me are:
  - make PE resolution optional for non-validating parsers (somehow;
    this will require a language-lawyer's eye to ensure that the
    end of the internal subset can be detected)
  - make PE error-detection optional for non-validating parsers
    (i.e. say 'non-validating parsers can treat this as if it were
    macro expansion -- users, however, should not try to get clever')
  - simplify the grammar by restricting PEs further, if need be, or
    by restricting the % operator to action on entire right-hand-sides,
    as in my test version of the spec (*if* that change seems like a
    simplification to those concerned)

-C. M. Sperberg-McQueen

Received on Friday, 20 June 1997 19:24:12 UTC