- From: David Megginson <dmeggins@aix1.uottawa.ca>
- Date: Wed, 24 Jan 1996 07:15:59 -0500
- To: www-html@w3.org
- Cc: connolly@w3.org
I have just read Dan Connelly's report, and I think that it is quite
well done. He is absolutely right to take a pragmatic and tolerant
approach to parsing SGML-based documents, and I think that his
analyser will provide information at exactly the level of abstraction
required by parser designers. I would however, like to make a few
suggestions:
1) It seems unnecessary to ban the DTD subset altogether, since this
is the logical place to declare entities. Why not allow the subset,
but limit its contents to <!ENTITY..> and <!NOTATION...> declarations?
I realise that there is a danger in allowing authors to define
parameter entities in the subset, since those can affect the structure
of the DTD, but browsers are free to ignore such fiddling. As a
compromise, you could allow only internal entities or (optionally)
external data entities with the URL as the system identifier.
In fact, it would not even be necessary to return any of the DTD
subset information directly to the caller -- instead, you could simply
store it and feed it out when queried
(ie. lookup_general_entity("foo");).
2) Marked sections might be too useful to leave out: all you need to
do is return events for the start and the end of a marked section, and
to keep track yourself of the special cases when you are parsing CDATA,
RCDATA, or an ignored section. If you allow parameter entities to be
declared in the DTD subset, then you will be able to look them up
yourself and decide on the type of the marked section.
If the analyser handles most of this internally (storing entity
values, etc) these suggestions would increase the complexity of the
user interface only _very_ slightly, by introducing functions to
lookup the values and types of entities and by introducing events for
the beginning and end of marked sections (the browser could simply
ignore these, since the analyser will know how to do the right thing
with their contents). Some browsers might even want to allow users to
set their own parameter entities (they could do so at the beginning of
the parse) for certain standard types of marked sections:
<![%nudity;[
<p>Here is a picture of Newt's butt <img src="newtbutt.gif"></p>
]]>
<![%inanity;[
<p>Here are the lyrics to all of the songs from
<cite>Barney</cite>.</p>
]]>
(I'd be more concerned with protecting my daughters from the second
one).
<!ENTITY % inanity "IGNORE">
David
--
David Megginson Department of English, University of Ottawa,
dmeggins@aix1.uottawa.ca Ottawa, Ontario, CANADA K1N 6N5
ak117@freenet.carleton.ca Phone: (613) 562-5800 ext.1203
WWW: http://www.uottawa.ca/~dmeggins FAX: (613) 562-5990
Received on Wednesday, 24 January 1996 07:15:46 UTC