- From: Daniel W. Connolly <connolly@beach.w3.org>
- Date: Tue, 30 Jul 1996 14:41:01 -0400
- To: holstege@kset.com
- cc: www-html@w3.org
In message <199607301629.JAA05748@athena>, Mary Holstege writes: > >I don't understand the terrible resistance to allowing (encouraging) HTML file >s >to contain SGML prologues and using the power implied by the existence of >that to achieve useful results. Most of my serious HTML *already* >has a <!DOCTYPE section; I just have to run everything through SPAM >before I put it out. The standard HTML DTD can contain some of the popular >notations; if you want to do anything funky, you have to embed some funky >syntax. OK. And the problem with this is...? Nothing. You're free to use SPAM at your site to preprocess your documents, just like other folks use cpp or m4 or perl or..., and some other folks use server-side includes features built into a server. But you seem to be suggesting that the syntax of HTML as transferred over the wire should be allowed to include an internal declaration subset. You'll have to be more explicit for me to evaluate your suggestion. >Why is a concept that comes from SGML always presumed "too hard" but some >random half-backed hack considered "easy enough for the masses"? Easy: free access to the documentation and source code. The cost of just getting the documentation for SGML is about $100. That's MUCH harder than clicking on a link to NCSA's server-side-includes documentation. I have little sympathy for folks who don't do their homework (i.e. folks who don't read the SGML materials that _are_ available on the web, which see: http://www.w3.org/pub/WWW/MarkUp/SGML/) but I have even less sympathy for folks who create something as obtuse and contrived as SGML, and then hoard access to it. >Why is "<!--#include" easy enough for the masses to understand, but "<!ENTITY >foo SYSTEM" is too hard? First, note that <!--#include is _not_ HTML syntax: it's NCSA httpd server-side-includes syntax, which has since been supported by lots of other stuff. And I don't think mass understanding had anything do do with it: it's a simple case of <!--#include is supported by widely available tools, and <!ENTITY is not. I wish it were the other way around. In fact, Elliot kimber demonstrated on comp.text.sgml how to set up a CGI script to process <!ENTITY stuff using spam. But it seems to be a day late and a dollar short. Folks who want to change the landscape are encouraged to hack! For example, make sp (the backbone of stuff like spam) and hack it into an apache module, write some documentation with examples, and see if it takes off. > Why is long distance naming in "<A NAME=foo>...<A >HREF="#foo">" easy enough for the masses to master but that in >"<!ENTITY foo...>...&foo;" too hard? Hang on: the choice is between: <a href="http://foo.com"> and: <!entity foo system "http://foo.com" NDATA> ... <a href=foo> In this case, the object in question has a perfectly good name: http://foo.com. The name foo serves no purpose but to introduce errors etc. (If there were several references to foo.com in the document, the foo might serve as a shorthand, and that might be valuable. But it's not valuable enough to complicate the simple case.) The question regarding "<!ENTITY foo...>...&foo;": is simple: to do it or not to do it (in the client). So far, none of the implementors has seen enough benefit to justify the cost. Given that it can be done on the server side (and often more efficiently), I tend to agree. I don't like <!ENTITY...> as a mechanism for doing compound documents. I like typed links much better. It's like the difference between python/perl/Java style import vs. C/C++ #include: one's a text pasting excercise, and the other is a structural construct. > Why is "// <!-- ... // -->" easy enough >for the masses to understand but "<![ CDATA [...]]>" too hard? I can't begin to defend the //<!-- script syntax. But given the state of affairs, how would you convice information providers to begin to use <![ CDATA [ ... ]]> when it won't work on "70%" of their consumer's desktops, while //<!-- will? > Why do we have >to put up with people inventing "<!--XXX IFDEF FOO-->...<!--XXX ENDIF-->" but >refusing to encourage "<![ %FOO; [ ...]]>" which does the job just as well, >and can be processed by standard tools? You don't have to "put up with" anything. But you don't have to whine either. Write some code. Write a draft. See, for example: http://www.w3.org/pub/WWW/TR/WD-doctypes Note that SGML marked sections can express #if/#endif nicely, but #elsif is very awkward. >I think it's time to fish or cut bait: if HTML is to be an SGML application, >use the features of SGML that are required to make it workable. Why the hypothetical? HTML is an SGML application. Check RFC1866. And there is overwhelming evidence that it is "workable." I think altavista advertises some 30 million pages. > There is >much I would have changed about SGML if I had been its inventor, but the >fact is that it is here, Granted. > it has solutions to a lot of these problems, This has been alleged over and over, but the conjecture is rarely backed by sound arguments, code, specs, etc. > and >if HTML is an SGML application a lot of nice tools can be used to handle it. >Tracking changes from version to version of HTML with these tools becomes a >matter of dropping in a new DTD instead of hacking up the tool to understand >the siginifance of some new semantics embedded in comments or some special >handling required for the FOOBAR element. This is just FUD. The change from HTML 2.0 to 3.2 to cougar is "just dropping in a new DTD". There's nothing "special" about the script/comment syntax as far as SGML is concerned: <script><!-- script --></script> is just an element with content "<!-- script -->". Clearly, in order to interpret the content of the script element, you have to understand the script language syntax. And javascript happens to define <!-- as a comment. > It is very clear to me that we >cannot go much further without putting (allowing, defaulting, supporting) the >SGML prologue into HTML. I disagree. For my argument, please see: http://www.w3.org/pub/WWW/TR/WD-doctypes >In particular: > NOTATION could be used quite nicely for both SCRIPT and MATH (NOTATION=TeX >, >anyone?) It would allow for direct experimentation with other scripting >notations. There's nothing about NOTATION that facilitates this experimentation. You can do it with MIME types in CDATA attributes just as well as SGML notations. > Parameter ENTITYs (particularly if you support URL SYSTEM >identifiers) allows you to very neatly encapsulate common boilerplate or >decorations and ease maintenance. Again: are you suggesting this as a local server-side feature, or an extension of the over-the-wire HTML standard? If it's just a question of maintenance, using SPAM and entities makes a lot of sense for local document management. >While we're at it, can't we at least have a sentence somewhere official >encouraging support of processing instruction syntax instead of random comment >hackery? Please? What would such as sentence say? Would you care to write the draft? Dan
Received on Tuesday, 30 July 1996 14:35:47 UTC