- From: David Megginson <dmeggins@aix1.uottawa.ca>
- Date: Wed, 24 Jan 1996 07:15:59 -0500
- To: www-html@w3.org
- Cc: connolly@w3.org
I have just read Dan Connelly's report, and I think that it is quite well done. He is absolutely right to take a pragmatic and tolerant approach to parsing SGML-based documents, and I think that his analyser will provide information at exactly the level of abstraction required by parser designers. I would however, like to make a few suggestions: 1) It seems unnecessary to ban the DTD subset altogether, since this is the logical place to declare entities. Why not allow the subset, but limit its contents to <!ENTITY..> and <!NOTATION...> declarations? I realise that there is a danger in allowing authors to define parameter entities in the subset, since those can affect the structure of the DTD, but browsers are free to ignore such fiddling. As a compromise, you could allow only internal entities or (optionally) external data entities with the URL as the system identifier. In fact, it would not even be necessary to return any of the DTD subset information directly to the caller -- instead, you could simply store it and feed it out when queried (ie. lookup_general_entity("foo");). 2) Marked sections might be too useful to leave out: all you need to do is return events for the start and the end of a marked section, and to keep track yourself of the special cases when you are parsing CDATA, RCDATA, or an ignored section. If you allow parameter entities to be declared in the DTD subset, then you will be able to look them up yourself and decide on the type of the marked section. If the analyser handles most of this internally (storing entity values, etc) these suggestions would increase the complexity of the user interface only _very_ slightly, by introducing functions to lookup the values and types of entities and by introducing events for the beginning and end of marked sections (the browser could simply ignore these, since the analyser will know how to do the right thing with their contents). Some browsers might even want to allow users to set their own parameter entities (they could do so at the beginning of the parse) for certain standard types of marked sections: <![%nudity;[ <p>Here is a picture of Newt's butt <img src="newtbutt.gif"></p> ]]> <![%inanity;[ <p>Here are the lyrics to all of the songs from <cite>Barney</cite>.</p> ]]> (I'd be more concerned with protecting my daughters from the second one). <!ENTITY % inanity "IGNORE"> David -- David Megginson Department of English, University of Ottawa, dmeggins@aix1.uottawa.ca Ottawa, Ontario, CANADA K1N 6N5 ak117@freenet.carleton.ca Phone: (613) 562-5800 ext.1203 WWW: http://www.uottawa.ca/~dmeggins FAX: (613) 562-5990
Received on Wednesday, 24 January 1996 07:15:46 UTC