W3C home > Mailing lists > Public > xml-editor@w3.org > October to December 2007

Problem with syntax of external parsed entity

From: John Boyer <boyerj@ca.ibm.com>
Date: Tue, 2 Oct 2007 08:22:35 -0700
To: xml-editor@w3.org
Message-ID: <OFE07CB2CB.98465F0A-ON88257368.00507660-88257368.005477F2@ca.ibm.com>
Dear Editors,

The syntax for extParsedEnt should allow a prelude to "content" that 
allows declaration of entities using a subset of the DTD notation.

An XML document A.xml may declare an entity 'b' and associate it with a 
SYSTEM literal B.xml.  A.xml may then use an entity reference &b; to 
include the content of B.xml.

Similarly, B.xml may use an entity reference &c; to include the content of 
C.xml.  However, B.xml is unable to *declare* the ENTITY 'c' and associate 
it with C.xml because the syntax rule for extParsedEnt is just "TextDecl? 
content".

As a result, anything that one may want to include by entity reference in 
B.xml or its descendants must be declared in A.xml, which hobbles the 
componentization of the XML.

In practical terms, this means that if one wants to write a 'book' that 
declares entities for chapters and includes those chapters by entity 
reference, the chapters are unable to declare and include their sections, 
and the sections will be unable to declare and include their subsections.

Perhaps the simplest syntactic change that would introduce no significant 
problems would be to introduce a modified TextDecl for use in the 
extParsedEnt product, like this:

TextDeclParsedEnt ::= '<?xml' VersionInfo? (EncodingDecl|S) (EntityDecl | 
DeclSep)* S? '?>'
extParsedEnt ::= TextDeclParsedEnt? content

One upside to this approach is that there would be no confusion between 
the whitespace separating entity declarations and the character content of 
the entity.  Another upside is that it does not disturb the definition of 
TextDecl, which is also used by extSubset.  The only downside is that one 
must create the TextDeclParsedEnt in order to declare entities.  However, 
note that EncodingDecl was changed to optional so that only the leading 
<?xml and whitespace must be written before making entity declarations.

In hindsight, the following additional observations might be made.  First, 
instead of the leading '<?xml', a leading and required declaration of 
<?xmlentity might have been useful because of the differences between an 
external entity and a well-formed document.  Tools are having trouble 
deciding which WFCs to impose on a file containing XML.  Or, put another 
way, although the current design is useful because it allows arbitrary 
content, not just a well-formed XML document, to be included by entity 
reference, the downside is that it is not possible to include a 
well-formed XML document into another document using a simple entity 
declaration and reference.  This means it is not easy to create an 
aggregation of well-formed XML created by others.  In a web 2.0 world, 
that hurts.

Thanks for listening!

John M. Boyer, Ph.D.
STSM: Lotus Forms Architect and Researcher
Chair, W3C Forms Working Group
Workplace, Portal and Collaboration Software
IBM Victoria Software Lab
E-Mail: boyerj@ca.ibm.com 

Blog: http://www.ibm.com/developerworks/blogs/page/JohnBoyer
Received on Tuesday, 2 October 2007 15:22:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:38 GMT