- From: Martin Pike <mp@stilo.demon.co.uk>
- Date: Tue, 3 Jun 1997 12:12:30 +0000
- To: w3c-sgml-wg@w3.org
Having implemented PEs in an SGML parser I can appreciate Tim's
feeling about the amount of code that is required to implement them.
However, I feel that XML without them would be a much poorer language,
unable to cope with some of the problems that it will be called upon to satisfy.
Eve Maler and Terry Allen have written about the ability to build extensible
and configurable DTDs by using them. Len Bullard stated that they see the
future of XML in documents written using small neat DTDs, with no need for
PEs. Martin Bryan has argued that their are applications that do
need large, complex DTDs and that PEs are a godsend in writing and
maintaining these.
I would like to add that even in relatively simple DTDs I have found PEs
useful, much in the same way that even in a small computer program
I use subroutines or methods, to split the problem into manageable,
trackable, reusable chunks.
We have all used the argument for markup being important in the re-use of
data. I feel the same way about PEs allowing the re-use of DTD
fragments.
I know that the argument will come back that there are few people
writing DTDs. There are, but they are usually skilled and costly bodies. If XML
is going to take off as we hope it will then it is likely that there will be many
more DTDs required. I have talked to companies that have wanted to
adopt SGML but have been unwilling because of the cost and difficulty. XML will bring the
cost of implementation down and hopefully the difficulty. These companies
I am sure are going to want their data structured in a defined manner, ie. via DTDs.
If common elements of a company's data structure are able to be shared between different
DTDs then the cost of building the corporate structure and maintaining it must be less.
As to adopting PEs at a later stage if necessary, they are out there being used in
XML applications now.
A project that benefits greatly from PEs is MathML. This project has
been mentioned a few times already in this forum. DTD fragments are being
created to represent mathematics in markup on the Web. The fragments are
XML conformant. The removal of PEs will complicate this effort.
Already the inability to use name groups to declare multiple elements with
the same content model makes the DTD much larger and more awkward to
read and maintain than its SGML equivalent. The inability to declare even
model groups as PEs will make it even more so.
The reason is that the representation of mathematics in a structural manner
is highly recursive. Therefore each element has a content model that contains all
the others. To cover all mathematical functions necessitates the declaration of
hundreds, if not thousands, of elements - all with the same content model. If
each element also had to be added to the content model, instead of the well-
structured, comprehensible set of PEs - each representing a topic of
mathematics, that is used at the moment, then the DTD would become
unmanageable and unreadable. This project is not one in which SGML can be
easily used instead. It is for the Web under the auspices of W3C.
Of course pre-processors could be used, but as Eve Maler points out this
'leaves DTDs as intermediate files that it's perilous to edit directly.'
This diatribe has been concerned with the use of PEs within a DTD
and its subset. Am I missing something or is this the only place where they
need to be used, given the demise of all marked section types other than
CDATA in the markup. If this is the case processors that are non-validating
will not have to decipher them anyway, will they? And isn't that the crux of
this argument; that it should be -possible- to build lightweight stand-alone
processors? As Norbert says those people who need to use and verify more
complex DTDs will need more sophisticated tools.
Martin Pike, Stilo Technology
-------------------------------------
Email: mp@stilo.com or mp@stilo.demon.co.uk
Phone/Fax: +44 (0) 1222 483530
WWW: http://www.stilo.com/
-------------------------------------
Received on Tuesday, 3 June 1997 08:16:14 UTC