Re: HTML 3.2 Content Models

Joe English (joe@trystero.art.com)
Fri, 10 May 1996 10:37:00 PDT


Message-Id: <9605101737.AA24984@trystero.art.com>
To: www-html@w3.org
Subject: Re: HTML 3.2 Content Models
In-Reply-To: <199605101607.CAA25952@oznet02.ozemail.com.au>
Date: Fri, 10 May 1996 10:37:00 PDT
From: Joe English <joe@trystero.art.com>


"Dianne Gorman" <dkgsoft@oznet02.ozemail.com.au> wrote:

> 2.  <!ELEMENT (DIR|MENU) - -  (LI)* -(%block)>
> How can you exclude %block from the LI element?
>
> In, for example:
> <!ELEMENT PRE - - (%text)* -(%pre.exclusion)>
> <!ENTITY % pre.exclusion "IMG|BIG|SMALL|SUB|SUP|FONT">
> it is clear that the PRE element can contain all the elements in %text
> except those in %pre.exclusion.
>
> But DIR and MENU can contain LI elements. There are no %block
> elements to exclude, so (LI)* -(%block) is surely meaningless?


Exclusion exceptions apply recursively to all descendants of an
element.  (The same is true for the PRE content model above --
for example, even though an <A> can normally contain an <IMG>,
an <A> inside a <PRE> can _not_: <pre><a><img></a></pre> is illegal
in Wilbur.)

> 1. Forms
>
> HTML Features at a Glance at
> http://www.w3.org/pub/WWW/MarkUp/Wilbur/ states " INPUT, SELECT and
> TEXTAREA are only allowed within FORM elements. "  But according to
> the DTD, these are merely text-level tags which are in no way
> confined to the FORM element:
>
> <!ELEMENT FORM - - %body.content -(FORM)>
> <!ENTITY % text "#PCDATA | %font | %phrase | %special | %form">
> <!ENTITY % form "INPUT | SELECT | TEXTAREA">
> The HTML 2.0 DTD has:
> <!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)>
> Is there a logical reason for INPUT|SELECT|TEXTAREA being moved to
> %text in the HTML 3.2 DTD?


HTML 2 used an inclusion exception to enforce the rule that
%form; elements could only appear inside a <FORM> (or rather,
it used an inclusion exception to _allow_ them inside <FORM>,
and enforced the rule by not putting them in any other
content models.)

The problem with that approach is that SGML inclusion exceptions
are broken.  (There was a thread on this subject in comp.text.sgml
a few weeks ago, in fact... Robin Cover has archived some of the
articles at

	<URL: http://www.sil.org/sgml/topics.html#inclusion >

Robin summarizes: "The dominant [current] wisdom seems to be this: 'use
them very sparingly, provisionally, selectively, or not at all.'")


The problems with using inclusion exceptions for %form; elements
is that they can appear _anywhere_ -- even in places where they don't
make any sense -- and they interact badly with with record-ends
("newlines").   Run the following through SGMLS or the Webtech's
validator for a demonstration:


--Cut here--
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<head>
<title>Problems with inclusion exceptions</title>
</head>
<body>
<form>
<pre>
An INPUT element appears
at the beginning of the next line:
<input type=checkbox name=i1>
But the parser moves it to the _end_
of the previous line!
</pre>
<ul>
<li>Included elements can appear anywhere</li>
<li>even in places where they don't make sense</li>
<input checkbox name=i2>
<li>like <em>between</em> two list items!</li>
<li>This is even more problematic with complex elements like TABLES.</li>
</ul>
</form>
</body>
--Cut here--

With the 3.2 DTD, the problem with the first INPUT element doesn't
occur, and the parser will complain that the second one is out of
place.


At any rate, INPUT, SELECT, and TEXTAREA are _still_ only allowed
as descendants of a FORM element; it's just that this rule is
not enforced by the DTD.   In SGML terms this is called an
"application convention".  (Another example of an HTML application
convention that isn't enforced by the DTD is the rule that
HREF attribute values are legal URLs.)

An advantage of doing it this way instead of with inclusions is
that an SGML parser can test the hard condition ("do these
elements appear in a 'sensible' place with respect to their
neighboring elements?") and the application only has to test the
easy one ("are they somewhere inside a FORM element?").


> 3.  Murray Altheim has confirmed that PLAINTEXT is outside HTML (it
> is in the DTD but has no parent element).  Can a DTD for HTML
> contain elements that are outside HTML (I don't think the fact that
> PLAINTEXT is severely deprecated is relevant to this question)?

Yes; it's legal to have a declaration for an element type that
isn't used anywhere else in the DTD.  (It's normally an error to
use such an element in a document instance, of course, since it
won't be legal in any content model.)

One reason that you might want to do this is to allow multiple
"top-level" elements for a document type: for example, with the
Wilbur DTD you could say:

    <!DOCTYPE PLAINTEXT PUBLIC "-//W3C//DTD HTML 3.2//EN">
    <PLAINTEXT>
    ...
    </PLAINTEXT>

although I suspect this is more likely an oversight on the part
of the DTD authors than an intentional feature...

As an aside, it's also legal (in a DTD) to use an element type name that
_isn't_ declared anywhere.  (Again, it's illegal to use such an element
type in a document _instance_.)

This makes it easier to  create "subset" DTDs.  For example, one
could (picking an element at random :-) comment out the declarations:

<![ IGNORE [
    <!ELEMENT FONT - - (%text)*     -- local change to font -->
    <!ATTLIST FONT
	size    CDATA   #IMPLIED
	color   CDATA   #IMPLIED
	>
]]>

in a local copy of the 3.2 DTD, and -- without changing anything
else -- use it to verify that documents don't contain any FONT
elements.




--Joe English

  joe@art.com