Re: Cougar DTD: Do not use CDATA declared content for SCRIPT

Daniel W. Connolly (connolly@w3.org)
Tue, 30 Jul 1996 00:50:11 -0400


Message-Id: <199607300450.AAA21234@anansi.w3.org>
To: Joe English <joe@trystero.art.com>
cc: www-html@w3.org, "Paul Prescod" <papresco@calum.csclub.uwaterloo.ca>
Subject: Re: Cougar DTD: Do not use CDATA declared content for SCRIPT 
In-reply-to: Your message of "Mon, 29 Jul 1996 19:54:38 PDT."
             <9607300254.AA22380@trystero.art.com> 
Date: Tue, 30 Jul 1996 00:50:11 -0400
From: "Daniel W. Connolly" <connolly@w3.org>


I'm afraid I'm lost in this thread. It started with a pretty
clear suggestion by Joe to please don't use the evil CDATA.

Unfortunately, the responses were a mixture of argument-by-assertion
and misinformation about SGML.

I haven't found anything that I want to forward to the editors of the
draft. I'm not likely to follow this thread any more, so folks might
want to start new, more focused threads.

If folks have questions about SGML and/or it's relationship to HTML,
by all means: ask them in this forum (or on USENET). But change the
subject line or something so that your questions don't muddy the
waters so much. And please take advantage of the online resources
(validation services, etc.) to do a little research before posting!
Start at: http://www.w3.org/pub/WWW/MarkUp/SGML/.

Same goes for advocacy/editorializing about W3C: if you would,
please keep it separate from the technical arguments.


Anyway... Here are some of the arguments, as I see it:

Regarding the "inline scripting is broken. Just Say No." argument: As
a minimalist, I tend to agree. LINKed scripts are a necessary part of
the solution -- so they'll be in the spec -- and they seem sufficient
at first glance.

But under the "keep the simple things simple, and make the tricky
things possible" maxim, I can't in good conscience require folks to
maintain a separate file just to do client-side validation of a form
field.

Regarding mixing languages: HTML was not the first SGML application to
have a need for foreign notations inline with the document. (I should
cite some evidence to back that, but I'm too lazy just now. So take it
with a grain of salt.)

So why don't we use SGML's NOTATION feature? Example:
	<!doctype html public "..." [
	<!notation tcl system "application/safe-tcl">
	]>
	<head><script notation=tcl>proc foo {... } </script>
	</head>

For the same reason that the HREF attribute isn't an ENTITY attribute:
it requires folks to make up a name for the entity/notation, and
declare it far from its use. And it opens the "what's the syntax
of an SGML prologue?" can of worms.

An alternative would be to define some NOTATIONS in the HTML DTD, and
only refer to them in the instances. But then the HTML DTD becomes a
centralized list of script languages -- it would need to be modified
every time a new scripting language was deployed.

Regarding "just use the data: URL scheme": that will work; i.e.  it
will be specified to work (given that the UA groks the data: scheme --
which is currently headed for Proposed IETF Standard, I hear from
uri@bunip.com), though I doubt many folks will go to the trouble to
properly encode their scripts that way.

Regarding <script SRC>: just use <link rel=script href="...">
in stead.

Regarding marked sections: I have always like this idea. However,
information providers would be faced with an interesting dilemma
if we recommended them in the spec: The following doesn't work
on the installed base of browsers that _does_ support javascript:

	<script>
	<![ CDATA [
		stuff
	]]>
	</script>

The symptoms of "doesn't work" are: the browser does an in-your-face
javascript syntax error dialog. (would that they had done the same for
HTML markup errors... even as an option that's off by default... ;-)

So even if the next release of browsers supported that syntax, it
seems unlikely that information providers would use it, given that
many of their consumers (the ones with the current crop of browsers)
would get in-your-face error dialogs.

[General note: Solutions to the problem of graceful deployment of new
features in a distributed system continue to evade me... or maybe I do
have solutions, but I'm just not willing to fight hard enough to get
them adopted.]


That leaves the following candidate, elegant (;-) as it may be:

	<script>
	<!--
	script... if you want two hyphens, the
	script language has to have some escape
	mechanism like -\- or something

	Also watch out for ETAGO: "</" + "xyz>"

	Don't forget to hide the SGML end-comment markup-look-alike
	from the javascript parser using C++ style comments:
	// -->
	</script>

Folks have made claims about the reliability of this mechanism.
Is the claim that people won't adhere to the spec when they write
the markup, or that implementations which follow the spec will
fail to interoperate?

If you're worried about the reliability of people, I share your
concern, but the burden is now on you to provide an alternative (see
above regarding answers along the lines of "don't do that.")
Otherwise, we'll do our best to make the scripting spec clear on the
potential gotchas that people must avoid, and press on.

If, on the other hand, you think the implementations won't get
along, I'd like to see the argument. I don't see any problems.



Those are the arguments as I see them, and the conclusions that the
ERB seems to have reached. You should see more details on this
in working drafts released Real Soon Now.

I can already see folks launching into appeals, making the same
arguments with the same evidence, but with stronger rhetoric. That
won't help. Unlike an IETF working group (e.g. the HTML Working
Group), the forum where the decisions about HTML 3.2 and Cougar are
made is not an open mailing list.

I (and some others) monitor this group to watch for evidence,
examples, ideas, etc., not for consensus or decision making (though
it's useful if there is consensus in this forum). That reminds me: I
don't recommend you try to get the last word in this forum. It doesn't
count for anything. If you make a clear, well-thought-out argument,
and a whole bunch of people disagree but don't provide sound arguments
and counterproposals, then there's nothing to be gained by shouting
them all down.

If you have evidence, examples, or reasoning that's really novel and
different from what I've outlined above, it's welcome and I'll forward
it to the editorial review board. And as time permits, I'll let you
know what the ERB decides. But usually, you'll just have to watch
the DTD and specs change.

Appeals based on the same evidence and logic will likely be ignored,
probably without comment.

Dan

In message <9607300254.AA22380@trystero.art.com>, Joe English writes:
>
>"David Perrell" <davidp@earthlink.net> wrote:
>
>> I read in another message that--except in marked sections--a parser is
>> expected to end an element when it encounters the corresponding ETAGO.
>> That's not strictly true. With CDATA declared content, the recognition
>> of ETAGO is further constrained to occur only when immediately followed
>> by an SGML name start character.
>
>
>Sort of, but not quite.
>
>ETAGO is a _delimiter-in-context_, which means that it is only 
>recognized *at all* when it is followed by a name start character
>(or, in some cases, a GRPO delimiter).  This is true for all cases
>where ETAGO is recognized, not just in CDATA declared content.