- From: Murray Altheim <murray@spyglass.com>
- Date: Tue, 19 Mar 1996 18:58:54 -0400
- To: "C. M. Sperberg-McQueen" <cmsmcq@uic.edu>
- Cc: www-html@w3.org
C. M. Sperberg-McQueen <cmsmcq@uic.edu> writes: >Excuse me if someone explained this while I was not paying attention, >but why are we talking about adding an INSERT tag with the semantics 'go >find this file or document, and insert it here', when SGML already has >the mechanisms needed for this, in the form of entity references? Why >not just start writing, requesting, or demanding HTTP servers that >actually understand and process references to external entities >as defined by ISO 8879? A very good question. I've raised this several times in the past few months. Amanda Walker had talked about writing up a draft for allowing various SGML features to HTML. Dan and I also have conversed on this topic. Terry Allen continues to advocate SGML as a solution. General response to this topic is sort of like talking into the wind. The Web has a strong 'not-invented-here' problem when it comes to SGML. Almost without fail, every feature currently demanded has been solved in various ways within a ten year old specification: SGML. And not much would have to be changed in HTML to make these things work; we'd simply allow existing SGML features that are currently proscribed in HTML. The problem is as you mentioned, a lot of software would have to be rewritten. Since the entity management happens potentially at both server and client, possibly both client and server rewrites. At the beginning stages it could be simplified. See below. This would really be the big change: not using HTML as the base language of the Web. We'd use SGML (MIME type "text/sgml; level=1|2|3|4"), allowing the DOCTYPE of the document to determine the DTD, just as in SGML. That DOCTYPE could simply specify a dialect of HTML for the current majority of web documents. "Level" describes a migration path, where for example: + Level 1 is HTML as today, only in an SGML 'wrapper'. Documents would be required to be valid (what does this mean?) + Level 2 would handle HTML with minimal SGML features (text literal entity declarations, marked sections for conditional documents, minimal SGML additions, etc.) + Level 3 would a specify a larger character entity list, CDATA marked sections, etc. + Level 4 would be full SGML compliance, allowing SUBDOC, LPDs, etc. as according to ISO 8879. >To illustrate, for those not yet conversant with all of SGML: for >this ... [explanation of SGML entities...] >If it is desired (as some have proposed) that the external entity >be parsed as a completely independent object, the required variation >in the syntax is again already provided by SGML: just declare >the external entity as a SUBDOC (i.e. a free-standing document, to >be parsed on its own, not as part of the current document). > > <!ENTITY myfile SYSTEM "/usr/me/public_html/file.html" SUBDOC> > >N.B. not all SGML software supports the SUBDOC feature, just as >not all SGML software understands URLs, which are not after all >defined by ISO. That shouldn't make too much difference, I think: >we are talking about a change to HTTP and/or HTML, and that means >rewriting at least some software. As you know, SUBDOC engenders a number of rather difficult problems (recursion problems for one). But yes, it does solve the problem. We'd simply put an application requirement that the SGML system understand URLs. >Is there an advantage to inventing a new notation for inclusion >of documents and document fragments, rather than using the >existing notation? Or is it just not widely known that notation >for such inclusion already exists and need only be adopted, instead >of being invented? I don't know if this information is widely known or not. My feeling is that some people think that many of the more 'advanced' SGML features use a syntax that is a little too arcane for 'common HTML authors' (whatever that means). But is seems amazing to me sometimes that we keep trying to reinvent the wheel when a solution is already at hand and written in an ISO standard. Here's some of my rough notes with HTML-to-SGML evolution mapped as levels. Level 1 is HTML 2.0 conformant (RFC 1866). Nothing solid here, just ideas. + Include full corpus of ISO 8879:1986 (SGML) entity sets (Appendix D in Goldfarb). These are already standardized and widely available: _Entity_Set________________ _Level_ Added Latin 1 1 RFC 1866's 'recommended' additions 1? Added Latin 2 2 Diacritical Marks 2 Publishing 2 General Technical 2 Greek Letters 2 Greek Symbols 2 Monotoniko Greek 3 Alternative Greek Symbols 3 Numeric and Special Graphic 3 Added Math Symbols: Arrow Relations 3 Added Math Symbols: Binary Operators 3 Added Math Symbols: Delimiters 3 Added Math Symbols: Negated Relations 3 Added Math Symbols: Ordinary 3 Added Math Symbols: Relations 3 Box and Line Drawing 3 Russian Cyrillic 3 Non-Russian Cyrillic 3 Levels as shown above. A level n conforming application would support all entity references up to that level. + Marked sections Level 1: none Level 2: allowing INCLUDE, IGNORE Level 3: allowing CDATA (non-SGML data) Level 4: allowing RCDATA (replaceable character data: entity references are replaced.) + #DEFAULT entity declaration Level 1: none Level 2: allowed as text literal Level 3: allowing PUBLIC/SYSTEM entity references + Declaration subsets (with limitations based on 'level' as above) Level 1: none Level 2: text literals only Level 3: PUBLIC & SYSTEM entity references Level 4: SUBDOC allowed + Remove/increase quantity limits on SGML (go with dynamically allocated buffers, or advocate using catalogs to declare SGML declarations with limits beyond the implicit 8879 declaration to handle specific document needs.) Level 1: same as HTML (RFC 1866) Level 2: expanded to larger limits Level 3: dynamically allocated or using catalogs with + Change SGML declaration charset (as per i18n draft) to be ISO 10646. This is already proposed. Level 1: same as HTML (RFC 1866). Level 2: same as i18n + Allow NOTATION TeX, etc. Level 1: none Level 2: allowed + Link Process Definitions, particularly for ICADD support. (Stylesheet support could conceivably be handled via LPDs. There are certainly better ways...) Level 1: none Level 2: none Level 3: allowed + Processing instructions (these are typically system-dependent, so I lean away from their use on the web.) Level 1: none Level 2: none Level 3: allowed (with restrictions) Level 4: allowed (no restrictions) The fact that from my own testing, our own (Stonehand) HTML viewer can support most of this list says to me that it can be done. Performance is not a big issue once the DTD is compiled. The question is whether or not large vendors are willing and able to implement this type of radical departure. For browser engines not based on an SGML engine, this may be next to impossible. Those who've been hanging around here awhile have seen a similar list before. I'm still willing to deal with this, but not if nobody wants it. Last time this was brought up, Dan Connolly, Terry Allen, I and a few others talked a bit between ourselves but there certainly wasn't any overwhelming consensus of support. The current push continues in the direction of further complexity of the HTML language and browsers. The next version of HTML only gets thicker and farther away from being authored or read by humans. As above, I think the solution to many of the current language needs comes down to using an SGML MIME type, where the specific SGML application is determined by DOCTYPE. Otherwise, we are simply coming up with another syntactic alternative to features that exist in any SGML application except HTML, where these existing features have been disallowed. But with this, many (or even most) of the current document assumptions made by current HTML browsers go out the window -- we would need SGML-compliant browsers. This would certainly violate Amanda Walker's "Minimum User Astonishment" design principle. Murray ``````````````````````````````````````````````````````````````````````````````` Murray Altheim, Program Manager Spyglass, Inc., Cambridge, Massachusetts email: <mailto:murray@spyglass.com> http: <http://www.stonehand.com/murray/murray.htm>
Received on Tuesday, 19 March 1996 19:08:14 UTC