Re: A17: keep or drop entities? from Paul Prescod on 1996-10-06 (w3c-sgml-wg@w3.org from October 1996)

From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Date: Sun, 06 Oct 1996 19:34:52 -0400
To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Message-Id: <1.5.4.32.19961006233452.00894aa8@csclub.uwaterloo.ca>

At 06:29 PM 10/3/96 CDT, Michael Sperberg-McQueen wrote:
>On 9 October 1996, the ERB will vote to decide the following question.
>A non-binding preliminary vote indicates the question needs further
>discussion in the work group.
>
>A.17 Should XML have entities, or not?

We can divide entities into parse-time entities and application-request
entities. We should probably discuss those separately, because they have
very different characteristics, especially in a networked environment.

I wonder if we can do away with parse-time entities. I am going to argue
against them, in this article, not because I am confident that we can do
without them, but because I am not sure if we need them or not, and would
like to get some feedback.

I can imagine complications regarding document validition, entity retrival
failures, communication of those problems from the parser to the
application, etc. What do you do if an entity representing a paragraph
cannot be retrieved across the web? Declare the document invalid and not
show it? Report the failure to the application and allow them to put a
"missing paragraph" marker in? Insert a placeholder in the output of the
parser itself? It seems like the application and parser must have a fairly
sophisticated language for communicating these failures to each other and to
the user.

It might be simpler if the application controlled all network transactions
so that the only thing that could make a parse fail is a really invalid
document or a network failure in the middle of a download. (this is the
current model used by HTML browsers)

So, if you wanted to include a paragraph, you would just do something along
the lines of

<INCLUDE TYPE=PARAGAPH SRC=para.xml>
or
<PARA SRC=para.xml>
or
<PARA.INC SRC=para.xml>

The existance or non-existance of PARA.XML would not be the parser's
concern. It would merely report to the application that this element was
referenced, and the application would fetch it or not fetch it depending on
its needs, and the network availability of the object. 

In this way, communications between the entity manager, the parser and the
application could be minimized, which would simplify parser. Since the
application must already be able to handle entity downloading, I do not
think that it complicates the application code at all.

There are downsides:

a) an XML "document" could not be made up of multiple entities in the same
sense that an SGML document is. For instance, ID references across entities
could not be checked by the parser (in fact, you might have to use some form
of HyTime/application notation for these references). Same with
content-model adherance and inclusions/exclusions.

b) an XML "text entity" could not span elements. I think we've already
decided against that feature anyway, so this isn't a major loss.

c) It's not clear where the conventions I described above for including text
from another document would be standardized. In the XML standard itself? In
each application?

d) To what extent would these conventions tie application designers/writers
hands? 

 Paul Prescod

Received on Sunday, 6 October 1996 19:39:35 UTC