SGML, HTML, and whitespace [was: 24 Oct Release of HTML 4.0 spec]

Thanks for the detailed change report! It makes
reviewing the changes MUCH more efficient!

Now the bad news...

I notice many errors at the boundary beteween HTML and
other specifications. It's important that each time
the HTML spec says "as per XXX, blah blah blah" the
XXX spec really says blah blah blah.

I think that adding section numbers to such XXX references
(and in the process, checking the HTML 4 spec against
the XXX spec for consistency) would be cost-effective
in improving the quality of the HTML 4 spec. Please
do it if you can find the time before 31 Oct; otherwise,
maybe before the HTML 4 REC. For example,
I suggest the spec read:
	"SGML rules for and line
	breaks (c.f. [ISO8879] section 7.6.1) ..."

Specifically...

Ian B. Jacobs wrote:
> > http://www.w3.org/MarkUp/Group/9710/WD-html40-971024/

> SGML tutorial
> 
>    * Added section about white space and layout

This still needs work:

"A complete discussion of SGML parsing, e.g. the
mapping of a sequence of characters to a sequence of tags and data,
is left to the SGML standard. This section is only a summary. "

That text appears in 3.1, but applies to all of section 3,
esp elements, attributes, ...

Please subordinate the sections "HTML syntax " and
"How to read the HTML DTD" under "Introduction to SGML"


"SGML (and HTML) rules for white space characters and line breaks
allow authors to write legible documents with white space and extra
lines that will not be rendered by a user agent. "

That just totally muddies the waters. There are SGML rules
about ignored record start/record end characters (c.f. ISO8879,
section 7.6.1), and then there are HTML rules (suggestions,
actually) about collapsing whitespace during rendering.

The distinction between normative and non-normative information
is crucial, and has been repeatedly stressed by members of
the WG. We MUST not lose it in our efforts to make the spec
more readable.

The HTML rules MUST NOT appear in this non-normative section.
Please move them to "10.1 White space" ala:

>    * Removed block/inline section (moved to global.html#block-inline)

And I notice 10.1 still says must as in "Thus, the following
two examples must be rendered identically: ". Change it to should.


More notes as I review this section:

"SGML applications conforming to [ISO8879] are expected to recognize
a number of features that aren't widely supported by HTML user
agents. "

That seems to confuse the term "SGML application" (which means
something more like "SGML profile" than "SGML implementation)
with "SGML system" which means "SGML implementation".

In short, please s/applications/systems/.

Another:

"Document Type Declaration Subset"

Strike this whole subsection, and change the "should"
in "8.2 HTML version information" to "must" per my
earlier message.


"3.3.3 Element definitions"

Change it to "Element Declarations"

And it confuses "element" with "element type." An element
declaration declares an element type; an element is
a particluar start-tag/content/end-tag sequence within
the instance, not a class of things.

One way to effect this change whithout sounding too pedantic
is to *not* refer to elements at all in 3.3.3, but only
to element names, content models, etc. For example,

s/The element being defined is UL/The element name is UL/

Occasionally you have to speak of element types. But
we can still be kinder, gentler without being imprecise,
ala:

s!both the start tag <UL> and the
     end tag </UL> for this element !both the start tag <UL> and the
     end tag </UL> for this type of element !



-- 
Dan Connolly, W3C HTML Working Group Chair
http://www.w3.org/People/Connolly/
phone://1/512/310-2971

Received on Saturday, 25 October 1997 12:55:30 UTC