Re: HTML comments in <title> elements - valid or not? from Dan Connolly on 1999-11-09 (www-html@w3.org from November 1999)

From: Dan Connolly <connolly@w3.org>
Date: Tue, 09 Nov 1999 00:17:39 -0600
To: Arjun Ray <aray@q2.net>
CC: www-html@w3.org
Message-ID: <3827BC83.8F68E650@w3.org>
Arjun Ray wrote:
> 
> On Mon, 8 Nov 1999, Dan Connolly wrote:
> > Arjun Ray wrote:
> 
> > > Yet another place where the 4.0 spec's "friendly prose" fails to state
> > > the exact requirements.
> >
> > What's not exact about it? Comments are markup[1].
> 
> Yes, and the content model of TITLE is (#PCDATA).

I'm not sure what your point is. If you're assuming that all documents
that validate per the DTD conform to the spec, that's just not the
case. The DTD doesn't express all the constraints.

>  Clause 4 "Definitions"
> of ISO 8879 (see p.277 in the Handbook):
>
> : 4.228 parsed character data: Zero or more characters that occur in a
> : context in which text is parsed and markup is recognized.  They are
> : classified as data characters because they were not recognized as
> : markup during parsing.
> :
> : 4.229 PCDATA: Parsed character data.
> 
> The issue is "a context in which text is parsed and markup is recognized".
> The operative concept here is *recognition* of markup.  Simply because
> something looks like markup doesn't make it so.

No, but matching the relevant productions in the SGML grammar does.

>  In some ways, this is a
> problem with SGML itself, but either the spec's normative reference to ISO
> 8879 counts for something, or it doesn't.

You may have a point[6Oct] about whether the constraints in the HTML
spec
like "don't include an internal subset" conflict with the SGML
conformance
clause. But there's so much in the SGML spec that seems like it
was designed to be tested in court rather than by software... sigh...

[6Oct] http://lists.w3.org/Archives/Public/www-html/1999Oct/0034.html

> > Perhaps we should have added a NOTE about why this restriction is there:
> > it's there because older HTML implementations treated <!--...---> as
> > character data, and I think some versions of the HTML spec declared
> > the TITLE element as CDATA.
> 
> It might have been better to specify RCDATA declared content.

Perhaps. But the whole RCDATA/CDATA/PCDATA thing was confusing a lot
of people, and having just the one RCDATA exception for TITLE seemed
like more trouble than it was worth.

But, as evidenced by deployment of the <SCRIPT>...</SCRIPT> syntax,
implementors weren't really scared of specially hacked parsing
modes after all :-{


> > Let's see if I can find the original IETF html-wg discussion of CDATA
> > vs. PCDATA for TITLE... nope; but
> 
> > [...] reviewing the changes to of html.dtd[2], I see that TITLE was
> > RCDATA for a while,
> 
> AFAIK, the original spec had (#PCDATA).
> 
>   <URL:http://lists.w3.org/Archives/Public/www-talk/1992JulAug/0020.html>

Wow... 15 Jul 92; that's before I started using RCS. pre-history ;-)

> So when did it change?
> 
> > then changed to %title-content which could be either CDATA or PCDATA
> > in v1.8, date: 1994/04/09 01:02:10.
> >
> > [2] http://www.w3.org/MarkUp/html-spec/html.dtd
> > (hm... the ,v file isn't available via HTTP. bummer. see:
> > http://www.w3.org/MarkUp/html-spec/ChangeLog )
> 
> The Changelog goes back to only v.1.7.2.1, dated 1994/04/01.

Look again; that's a branch. Brances are listed after the earliest
versions.
It goes back to

revision 1.2
date: 1992/12/03 02:04:29

I lost the original ,v file when I moved from Convex to W3C, and I only
found copies back to v1.2.

I just copied the whole ,v file (as reconstructed) to:
http://www.w3.org/MarkUp/html-spec/html.dtd,v

>  The v1.8
> entry just says:
> 
> | * Revamped HTML, HEAD, elements in light of feature test entities

There's a lot of detail hidden behind that changelog entry.

> > So if you can find html-wg archives from around there (we have them
> > somewhere at W3C, I think) you'll probably find it discussed.
> 
> I have my own copy of the html-wg list.  I suppose I'll have to slog
> through megabytes of it...
> 
> But 1994/04/01 is too early for the html-wg anyway.  The welcoming letter
> (from Stu Weibel) is dated 1994/07/29:

Good point... maybe it was in www-html or www-talk...

surf surf... here's a little gem that I seem to have taken only half to
heart:

	"My new motto is: just describe it; don't prescribe it."
	-- yours truely, Fri, 04 Dec 92 13:11:32 CST 
	http://lists.w3.org/Archives/Public/www-talk/1992NovDec/0155.html

Ah... here's a relevant bit:

"6. I changed XMP and LISTING back to RCDATA. I was messing with
the MidasWWW browser, and I couln't figure out how, when I'm
dumping the SGML out of the data structures into a file, to
tell whether I should change '<'s to "&lt;" or not. If we avoid
CDATA, we can use entities everywhere, and processing is simpler.
How's that sound?"
	-- Fri, 04 Dec 92 13:11:32 CST 
	http://lists.w3.org/Archives/Public/www-talk/1992NovDec/0155.html


Ah... another relevant bit:

"But if you're going to go to that trouble, you might as well
use mixed content. That's why I changed my mind about using RCDATA
for XMP and LISTING elements."
	-- Tue, 01 Dec 92 11:41:34 CST 
	http://lists.w3.org/Archives/Public/www-talk/1992NovDec/0136.html



-- 
Dan Connolly, W3C
http://www.w3.org/People/Connolly/
Received on Tuesday, 9 November 1999 01:17:58 UTC