Re: marked sections from R A Milowski on 1996-09-18 (w3c-sgml-wg@w3.org from September 1996)

From: R A Milowski <milor001@gold.tc.umn.edu>
Date: Tue, 17 Sep 1996 23:03:34 -0500 (CDT)
To: w3c-sgml-wg@w3.org
Message-Id: <323f74975543002@gold.tc.umn.edu>
> I'm trying to get a grasp of the group's sentiment on marked sections,
> and am not sure what all the options and positions are.  This morning,
> Charles Goldfarb suggested that there are really two cases:
> 
>   a) conditional marked sections are needed in DTDs, for version
> control and tweaking
>   b) (R)CDATA marked sections are needed in instances, for non-SGML
> notations (and, for that matter, for SGML notation -- CDATA
> marked sections are the simplest way I know to do SGML examples
> in documentation of a DTD
> 
> But there appears to be another position:
> 
>   c) some argue that marked sections just complicate life too much
> for the parser, and can *always* be done away with, possibly by
> improving the tools we use
> 
> Have I left out any major positions?

Yes, we have a client who is successfully using marked sections to
control the all-possible legal case.  They encode SGML document that
model via marked section language that could appear in legal banking
documents (variants for each state and the federal government).  Each
marked section is controlled by a parameter entity.  These parameter
entities are hooked into an expert system using storage managers
(available in James Clark's SP).

Thus, when they parse a document, they get the appropriate language for
the banking transaction at hand.  In short, they using marked sections
to control variants.

There is another solution to this.  The solution involves DSSSL and HyTime
and is what my talk an SGML '96 will be about.  Its basically a 
transformation/merge model.

So, if you have DSSSL, the above goes away.

The reason I bring this up is there is quite a few industries that have
to solve the documentation problem I outline above.  They want to be able
to have a publishing system that allows them to say: "here's my situation,
gimmie my documents!"  ...and out poofs the correct legal language.

> It seems to me we are close to consensus on some things, but not all.
> 
>    - We seem to agree that instances don't need conditional sections.
>    - There is some sentiment, not unanimous, that instances don't
>      *have* to have CDATA or RCDATA marked sections, either; there
>      are other ways to get the job done.
>    - Some of us think DTDs don't need marked sections at all.
>    - Some of us think DTDs need marked sections at all.
>    - Some of us think XML must provide support for versioning in DTDs,
>      through marked sections or by some other means.

My opinion of marked section in instances is that they are nasty.  You get
into this whole issue of computability.  I can never gauranty that my 
documents will always parse under all possible variants.  The combinatorics
of the test cases becomes in-practical very fast.

> C.  Possible Approaches
> 
> What are our options for XML?
> 
> We could keep marked sections on the grounds that they are useful and
> not too hard to parse.  We could simplify the parsing a bit, e.g. by
> insisting that CDATA and RCDATA be literals, not entity references,
> -- or do people switch between CDATA and IGNORE?!
> 
> If we want to lose marked sections, we need to say how to get the
> required function.  For CDATA sections, I think either of techniques
> 2 or 3 above would work, though I suspect someone will want to drop
> the empty comment from the language, leaving us with technique 2 (sigh).
> For conditional sections, it's harder:
> 
>   - we can revise the declaration rules to provide for customization
> by other means.  For example "Multiple element declarations are legal;
> the first one wins" etc.
>   - we can reinvent the C preprocessor, and say that conditional
> sections (and perhaps selective expansion of entity references)
> is handled by a simple pre-processor.  Dividing up the functionality
> this way helps keep it manageable; it also allows us to imagine a
> situation in which the server (normally a Big machine) does the
> preprocessing, to keep parsing simple for the client (often a Little
> machine).  If our motive for keeping XML as light-weight as possible
> is to ensure that we can write light-weight processors for low-end
> machines, then this would allow us to have an XML Extra-Lite for those
> of us with 8086s still on our desks.  The pre-processor can be
> pretty light-weight, too, if we make it easy to detect the relevant
> parameter-entity declarations and ensure that the DTD can otherwise
> be skipped.
> 
> Drawback:  separating some function into a pre-processor could be
> the thin end of a wedge leading to an XML with optional features,
> which would lead to interoperability problems.  We do *not* want
> optional features:  any XML processor handling any XML document is
> a much more attractive model.
> 

Preprocessors have all kinds of hack-like advantages.  My question would
be if we need a preprocessor, what are we doing wrong?

In the case of the client I described above, it turns out that a transformation
model is much more elagant and proveable.  You can parse every document
and be assured that it is valid.  Though, it needs a leap in technology

OTOH, it is very nice to be able to:

<![ IGNORE [
   <FOO>I don't have a clue what I should be doing at this point in
        the document.  So I left it to you.
]]>

There are all kinds of times when you need to just "shut off" a portion
of a document in a production system because it make things explode.

==============================================================================
R. Alexander Milowski     http://www.copsol.com/   alex@copsol.com
Copernican Solutions Incorporated                  (612) 379 - 3608
Received on Wednesday, 18 September 1996 00:03:53 UTC