W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > June 1997

DTD parsing (a proposal)

From: David G. Durand <dgd@cs.bu.edu>
Date: Fri, 13 Jun 1997 10:23:41 -0500
Message-Id: <v03007801afc711c491c2@[]>
To: "XML Working Group (E-mail)" <w3c-sgml-wg@w3.org>
The current discussion reminds me of a idea that I had a long time ago, but
which was not very popular. Since we're now in the process of biting the
bullet the we put in our mouths with the RMD, and attribute value defaults,
perhaps it's time to revisit it.

The problem with the RMD is that it creates a class of syntax errors (DTD
not required, but needed) that are hard to detect with the class of
application that is likely to pay attention to the RMD. Further, the
optimization allowed by the RMD (not parsing the DTD) is exactly the design
decision that makes the error undetectable.

If we have attribute declarations and entity declarations that must be
parsed to get correct output, then we are stuck with them, and the
arguments that led us to put them in stiull seem valid to me. However, some
simplifications could make this pretty easy. So, to the proposal:

We require the parsing of entity declarations and attribute declarations in
the DTD subset for all applications. Default value declarations and
external entities must be placed in the subset if they are to be visible to
applications that only care about WF-ness. The RMD simply goes away. Now we
have eliminated a formal error requirement that has the drawback that the
error will frequently go undetected. In its place, we have added an
authoring responsibility, but it is a simple one that can be explained in
terms of a processing model, and we've saved the explanation of the RMD
(which I personally find confusing).

To make parsing the subset easy, we eliminate marked sections from it
altogether. MS are really useful in a DTD to let a subset control the DTD,
but they are not so useful in the subset itself. (Can anyone come up with a
non-contrived example? I couldn't.) Since we still have parameter entities
in the subset, people can externally store common attribute lists and
entity sets (e.g. XML-Link, ISO entities), and so they are not required to
put amazing redundancy at the top of every document.

This is also good for the DPH because, the stuff in most DTD subsets will
probably be constant (same AF attributes, same standard entity sets) and
the DPH can just hard-code them into the (desperate Perl) processor. To the
extent that authors take advantage of the subset to make documents vary
individually, the DPH is still screwed, but that's likely to be true
wherever wide document format variation is part of the working process.

Now that RE is gone, shall we take aim at the RMD?

  -- David

David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________
Received on Friday, 13 June 1997 11:02:45 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:27 UTC