- From: Daniel W. Connolly <connolly@beach.w3.org>
- Date: Wed, 24 Jan 1996 11:36:32 -0500
- To: dmeggins@aix1.uottawa.ca
- Cc: www-html@w3.org
In message <199601241215.HAA17799@baeda.english.uottawa.ca>, David Megginson wr ites: > I think that his >analyser will provide information at exactly the level of abstraction >required by parser designers. I've had some feedback on the API. It's going to change a little bit, but not much. Mostly, it needs to provide lossless parsing. Case folding is optional, and whitespace trimming will become optional. This allows for what I call "structured stream editing" -- changing all the links in a document, for example (without changing anything else). >1) It seems unnecessary to ban the DTD subset altogether, On the other hand it's unnecessary to support a DTD subset in the document entity, since it can always be accomodated in a separate entity. This, combined with the "Keep it simple" principle (aka occam's razor) and the fact that the deployed base of HTML user agents don't grok this today convinced me that this is not an issue to tackle just yet. > since this > is the logical place to declare entities. Why not allow the subset, > but limit its contents to <!ENTITY..> and <!NOTATION...> declarations? Note that this is in the "future work" section. I just updated that section, and a few related sections. Have a look: ====== http://www.w3.org/pub/WWW/MarkUp/SGML/ $Date: 1996/01/24 16:34:03 $ Marked Sections Support for marked sections is an integral part of a strategy for interoperability among HTML user agents supporting different HTML dialects[HTMLDIALECT]. It has other valueable applicatoins, and it is a straightforward addition to the lexical analyzer in this report. Internationalization Support for character encodings and coded character sets other than ASCII is a requirement for production use. Support for the X Windows compound text encoding (related to ISO-2022) and the UTF-8 or perhaps UCS-2 encoding of Unicode (ISO-10646), with extensibility for other character encodings seems most desirable. Internal declaration subset support Internal declaration subsets are not expected to become a part of HTML. But the technology in this report is applicable to other SGML applications, and internal declaration subsets are a straightfoward addition to this lexical analyzer. Relavent mechanisms include: General entity declarations with URIs as system identifiers General entity declarations as "macros" Parameter entity declarations for "switches" and "hooks" ================= >2) Marked sections might be too useful to leave out. Agreed. The only reason that they're not in there yet is that I want to concentrate on the bugs in existing HTML parsers before I start adding new stuff. Again, see "future work." Dan
Received on Wednesday, 24 January 1996 11:36:37 UTC