The external text entity problem from Tim Bray on 1996-10-30 (w3c-sgml-wg@w3.org from October 1996)

From: Tim Bray <tbray@textuality.com>
Date: Tue, 29 Oct 1996 17:49:38 -0800
To: w3c-sgml-wg@w3.org
Message-Id: <3.0b33.32.19961029174930.0076918c@pop.intergate.bc.ca>

At 12:47 AM 10/30/96 GMT, Charles F. Goldfarb wrote:

>The point... does concern whether an entity referenced by a syntactic entity
>reference must be accessed and parsed. According to 8879, it must...
>
>The only way to avoid this required access and parsing is to use an entity
>attribute.

Hence the [entirely valid] concern on the WG & ERB about the use
of external text entities in XML.  They are highly useful for authoring,
but given the requirements of 8879, if we include them we are in danger of
requiring that, to be "XML compliant", any browser, no matter how lightweight,
be required to pull in & interpolate text entities from all over creation.

Which is kind of silly since it is clear that doing this is hard in modern
browser architectures, and it is far from clear that this is the right
semantic for providing access to distributed modular documents, and it 
is *extremely* clear that we'd like to make it easy for browsers to
achieve XML compliance.  And since the real reason for having external
text entities is to support authoring anyhow.

The first-cut solution was to make them optional, which carries its own
set of costs.

*BUT*, I think there's another way to avoid this required access.

This tactic, independently proposed by several people, is to 
zero in on the word "parse" and say that an XML Processor is, in 8879
terms, a "parser" only when it's trying to validate against a real DTD.
In which case, of course, it has to go pull in the entities.  When it's
just reading a document and checking for well-formedness not validity
(the natural behavior of a client browser), we can say that it's not,
in 8879 terms, "parsing", and is thus not required to interpolate 
external text entities.  Authors will quickly learn to resolve external
text entities before sending instances down network pipes.

This works well in XML because the presence or absence of a text 
entity can never affect well-formedness - we've already
decided they have to be element-synchronous.

I see this as a solution that allows XML to standardize on SGML's
management architecture at the authoring end, bless browsers as
compliant without placing unreasonable demands on their authors, and
remove optional features from XML.  Also, it's easy to document in
the spec.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-488-1167

Received on Tuesday, 29 October 1996 20:50:02 UTC