Re: In defense of PEs (but not in XML) (was Re: Parameter entities vs. GI name groups)

The prose-impaired may search for SUMMARY.


Weichel Bernhard (K3/EES4) <Bernhard.Weichel@pcm.bosch.de> wrote:
> But it is understandable to claim (XML) complicated if one is confronted
> with this:
      [content model from hell deleted...!]
> But with:
>     <!ELEMENT sic - -(( (%par1; , %part2; ) | %full-text; )*)>
> It looks much simpler - divide an conquer!!

This is of course using PEs as a sop to work around SGML's lack of
explicit non-terminals.

I agree with those who have pointed out that macro string substitution
is not ideologically sound, and leads to semiotic epimorphism without
semantic veracity.  Er, in other words, it _looks_ as if the meaning
has been captured for the computer, but it hasn't really: the shapes
are only on the surface.

I wish my splitting headache was only skin deep.

One of the goals of C++ (according to Bjarne Stroustrop in his "rationale"
book) was to minimise the places where CPP was needed.

But it is no good removing a facility that is sometimes harmful if it
is also more often useful, unless the functionality is supplied elsehow.

SGML does not have user-defined non-terminals.  There are a few
user-settable things like non-terminals, e.g. ERO, but they use
yet another ad-hoc language (in the SGML declaration), are arbitrary
and incomplete.

It does not have abstract data types (%Type.Boolean;, %Type.Maybe).
As it stands, SGML can't handle Boolean attributes anyway, unles you
follow the CALS 0/1 approach and call them numbers, because of the
rule about multiple attribute values, so when designing abstractions
in this way with SGML, you need to be aware of the lower level of
implementation anyway.

It has syntactic file inclusion (%ISOLat1; and also SUBDOC: &myfile;)
but lacks control above a syntactic level -- it's like #include,
not like "module" or Java's "extends".

It has an expression language using PEs as variables and "INCLUDE"
and "IGNORE" as values -- this is the closest SGML comes to Boolean
logic, except that there is no ELSE and no way to combine values.
In a browsing world,
    #if (ProductId > 50) || (Season == Summer)
    #else
    #endif
makes some sense, but is gruesome for an editor.
In C, this was done largely because the compilers weren't advanced
enough to do compile-time optimisation, and since C lacked constants
anyway, the code would be left in until runtime.

For SGML, if there is any such compile/runtime distinction, perhaps
it's viewing/editing, with viewing being like runtime in that you
don't need access to the full source.  Even that may not always be true.

So it seems to me to come down to this:

Parameter Entities are used for lots of things in SGML.

We don't have a replacement mechanism that has anything other than
syntactic control.

Parameter Entities are specified in a way that means they have to be
implemented at the grammar level, not at the tokeniser or input level.

This may be a good thing for graphical DTD editors, but a C IDE has
to cope with it too, and may (like early versions of Near and Far)
simply punt and refuse to deal with it, or automatically expand those
PEs that it can't otherwise handle.

We are in danger of making the language complex for most implementors
in order to simplify a special case that may well turn out to be the
sole preserve of computer scientists and larger programming teams.

But we are here to praise SGML, not to bury it in email!
(leave my fingers alone!)

If a macro facility is needed, let's have one.

If not, we have too many issues to deal with before the end of the year.

SUMMARY

I suggest that either
(1) PEs are defined as pure unconstrained string replacement, with the
    possible single constraint that they can only appear in the DTD,
    internal subset or as a marked section guard;

or

(2) we remove PEs from XML 1.0 along with marked sections, and have
    a separate document describing a macro-processing language that
    may well be more or less the same as PEs.

In either case, all valid XML documents would remain valid SGML.
Even if we went with route (2), the processed file would be valid
SGML.

I don't think we can punt on PEs and remove them without offering a
replacement for the functionality, because that will force people to
derive their own solutions, and their _source_ files will then not
be shareable.  It would be much harder to write an XML editor, or
any other process that updates XML source files, if it had to know
about twelve flavours of preprocessor.

I have no personal preference between these two options I have suggested.

Lee

Received on Saturday, 21 June 1997 14:02:03 UTC