KISS (was: Parameter entity references in WF docs) from Peter Murray-Rust on 1997-05-31 (w3c-sgml-wg@w3.org from May 1997)

From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
Date: Sat, 31 May 1997 18:57:19 GMT
To: w3c-sgml-wg@w3.org
Message-Id: <7411@ursus.demon.co.uk>
In message <339050AB.549B@hiwaay.net> len bullard writes:
[...]
> > this because, I believe, somebody suggested dropping [PEs] at all).
> 
> I have.  

I did as well.  (With some trepidation, because I'm enormously appreciative
of what Norbert has done.  OTOH, we all know that the ERB proposes and disposes
and that the July1 release of the specs *will* be different from what we are 
looking at now.)
 
> I do because under the murky requirements, no one can 
> state a strong enough case for using them.  At some point, the 
> *expense* of using SGML is trivial next to the cost of trying 
> to maintain an unstable code and content set based on a quickly moving 
> specification.  The content vs features cost curve is making 

I think Tim's request [for partial dispensation from PEs] mirrors the fact 
that:
	- it is difficult to define precisely what the syntax of PEs is.
	- their implementation may be (slightly) error-prone, due to this.
		
I do not believe that - looking at the current spec (Lang970331) - 
a competent programmer *unfamiliar with SGML* will unerringly write 
a parser that treats PEs 100% correctly.  For the newcomer to the spec,
productions such as [45] and [53] are not trivial.  I don't feel these are
consistent with Goal 4 (It shall be easy to write programs...)

> the Web a very difficult place to invest as it remains a 
> caveat emptor market.  This is a concern more serious than the 
> maintenance of a few overly complex DTDs that themselves 
> can be redesigned for a system without PEs.  IMO, this DTD 
> maintenance issue is very much overrrated when compared 
> to the insertion of the technology into the market. 

The rest of my post will re-emphasise that we must constantly guard against
rocket science.  I believe that if/when XML is successful, >90% of the 
developers will never have developed SGML applications before - and I try 
to speak for that constituency.  Personally I believe that the DTD will be
of less value in XML than the SGML community is accustomed to and that
it will neither be understood or required by a large number of applications
and developers.  CML probably falls into this category and some of the
considerations from there may be relevant here.

I am revising CML in the light of XML (and feeling very positive about both).
The intention is to publish a new release in about a month.  CML was developed
with traditional (self-taught) SGML, but involved a lot of hairiness in the
DTDs.  CML included HTML2.0, created a large number of content models and
name groups all defined as PEs and read in from separate files.  All of this
was managed by a CATALOG with many components.  To someone who was not 
highly SGML-literate the formal spec of CML V1.0 is impenetrable.

In the revision (V1.1) the following has emerged:
	- it is impossible for me to put any important structural constraints
on CML documents.  Therefore one of the elements has a content model of ANY.  
An alternative approach is my increasing use of XML-LINK=SIMPLE to transclude 
information.  
	- it is impossible to constrain the possible attribute values by a 
DTD.  So the complex name groups in 1.0 are  now collapsed to CDATA.  All
verification is semantic rather than syntactic and is linked to the existence
of (XML-based) glossaries in human-readable form.
	- a significant part of V1.0 is now directly covered by
XML-LINK and (assuming it flies) XML-TYPE.  This makes me very happy because
CML gets a lot simpler, and the XML community as a whole works out the generic
information objects.
	- an increasing amount of CML will be formally supported by other
disciplines - firstly MathML, and hopefully CGM and other standards.  A
CML document will almost always include some information objects from
another 'DTD'.  Therefore it is either unvalidatable against a DTD or the
DTD is so forgiving validation is no big deal.
	- most of the validation/processing is done outside the parsing
process.  IMO there is a real need for XML to address semantic validation
(both XML-LINK and XML-TYPE require this anyway).  Formalising this for
implementers would be a great help.

	Because of this XML validation of CML documents is less important
than semantic validation.  There will be a CML1.1 DTD, and I hope it's general
enough that it allows any reasonable document to be created, but will
detect really gross errors.  But I'm extremely wary of forbidding someone
doing something they find useful - they'll do it anyway and switch off
validation.  A similar theme is shown in MathML where there is an
attribute OTHER.  This can have any value, including new attributes and
values - the MathML authors hope it will be used wisely :-).

I am not suggesting that the DTD concept should be jettisoned from XML, nor 
validation - but it represents the limit of what can be included under
'easy'.

The point is that DTDs for XML can be constructed and tested for XML with 
free standard tools. They are unlikely to change very frequently (if they do, 
the community has a lot of versions for their documents to contend with).  

XML has a great deal that is expected of it.  IMO the only way that it can
manage it is to be as modular as possible and for those modules to be as 
simple as possible.  Obviously they must have well-defined APIs (or at least
clear terminology) for intercommunication.  IMO XML-LINK will be harder than
XML-LANG and it's still not clear what freedom the implementor has
or should have.  After that we have XML-STYLE, XML-TYPE, etc.  Unless these are
developed with interoperability in mind, a full XML implementation will be a
costly business if individuals have to do the lot.  So I would endorse Len's 
call for simplicity and PEs would be a good place to look critically.

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/
Received on Saturday, 31 May 1997 17:08:10 UTC