Re: Thoughts on a new charter for HTML

Daniel B. Austin wrote (4:36 PM -0700 5/20/98):

" 	Sectioning and layout are somewhat different than text level style
" for instance,
" the most common abuse of tables on the web is to produce columnar format
" (not for floats!)
" and it is difficult to see how this could be done effectively in CSS or
" something similar.

Columnsets are in DSSSL/XSL. I hope CSS3 incorporates them, too.

Tables are abused to arrange blocks of nontabular material side-by-side
(floats). You can call these nontabular blocks columns, but they're
poor-man's columns at best. Columns (of running copy) are not blocks, but
components of columnsets, through which copy *flows* based on the
constraints of the rendering area (window height?) or other parameters.
Chopping document flows into columns with table markup (or arbitrary DIVs)
gives you columns that cannot respond to varied rendering constraints. They
also defeat the purpose of markup to describe document elements: supposing
my CSS1 user stylesheet has the initial display property of TBODY set to
"none", so I can scan the THEAD or CAPTION elements of your table before
deciding to have the table cells rendered. In a table layout, everything
would be hidden.

It is no more appropriate to delimit columns with markup than to delimit
linebreaks with markup - not impossible, but ideally unnecessary (look what
happened to the hard-wrapped section I quoted above: similar thing happens
when I view your "columns" in fewer vertical pixels than you designed, or
when my font metrics are different than yours; there's clipping, the last
line exits are wrong, and the columns don't balance!). All you should need
to do is specify the desired layout behavior for the document element (wrap
lines, set in n columns) and the stylesheet interpreter should do the rest
based on its knowledge of the rendering constraints or other stylesheet

" Tags that divide the document into sensible parts for layout (and for the
" application of
" style) are better left in HTML; if we take *everything* out of HTML, then
" we are stuck
" supporting 2 very difficult languages for even small devices like cell
" phones (HTML+CSS).
" We need to balance the need for seperation with the need for simplicity.

Very difficult? My discussions with implementors support the thesis that a
full CSS implementation for *certifiably clean, nonpresentational markup*
would be much smaller and easier to implement than a "garbage disposal"
structural/presentational HTML error recovery and rendering system with
hard-coded stylesheets.

" 	I agree that all text-level formatting should be removed from HTML
" (eg <B> etc). I said this in my paper presented at the conference;
" But we can't leave
" layout up to the browser vendors: this is what we are doing now, and
" it is a total disaster - not even a simple DIV acts the same in both.
" Should these things be independent of tag boundaries? I don't think so.
" This does not allow us to support intelligent internal organization of
" the document.

You think columnar layout is part of "intelligent internal organization of
the document"? Consider that the print publishing world got along for
centuries with "intelligently organized" documents being delivered in
manuscript form, for the typographer to set into columns if appropriate.

DIV is not a layout device per se, though it may serve as little more than
a handle for CSS styling. In which case your beef is with the quality of
CSS implementations. The reason things are a disaster is that HTML
rendering is generally undefined (for good reason), yet the proposed
definition system (CSS) is thus far poorly implemented. I see no reason to
believe that more committee work to truss up HTML's default rendering will
result in more consistent rendering - we will always have to "leave layout
up to the browser vendors" until a rigorous layout model is defined for
HTML. It's called CSS, and for the first time you can say "Vendor X - your
rendering is not to spec." What's more, adding layout primitives to HTML
(or continuing to abuse tables) would be at cross purposes with stylesheets
and against the direction set in HTML 4.0 toward nonpresentational markup.

" 	Currently most sites don't use CSS. It isn't cost effective, it is
" difficult to learn,
" can't be checked, and only works haphazardly...even the, in your own words,
" 'least-broken implementation', is pretty bad.
" Moving all the problems to CSS from HTML doesn't solve anything.

Style sheets aren't used much, aren't cost effective, and are difficult to
learn because there are no really good CSS1 implementations yet, and, more
importantly, there are a few really bad ones. There's also the matter that
it doesn't work well with presentation-oriented HTML - the only kind most
authors know. Cajoling vendors into shipping conformant implementations is
a shorter path to solving Web presentation problems than proposing new HTML
presentation extensions, hoping that they might be better implemented.
Isn't this finished business - stylesheets for rendering, HTML
for...something else?
" >What is worth trying to create, IMO, is a markup language for
" >general-purpose structured hypertext. HTML 4.0 Strict is not bad, though
" >its SGML (rather than XML) profile makes it hard to support as designed,
" >particularly in view of HTML's current use as a hand-hacked display format,
" >which effectively forbids real parsing. (Already some "web designers" are
" >using HTML 4.0 Strict doctypes, thinking this will somehow provide better
" >pixel control in "4.0" browsers.)
" 	Um...this is more or less the path I proposed here, with some solutions
" that attempt
" to fix the problems you mention, which are indeed hacks and result in rapid
" hair loss
" for those poor souls who hand author their code (like nearly everyone who
" writes HTML).

I'm confused because you say that HTML should be for display and layout,
CSS for "style", and - where's the document in this scheme? The content
"intelligently organized?" I say something very different, and then you say
that you agree with me.... Let's say the document should be in XML. Using
an HTML namespace. HTML 5.0. If you want to use some other kind of HTML for
display, you can go on using the bones of 3.2+extensions, and you don't
need to hand-author it. There are even printer drivers to crank out this
kind of HTML these days. But these aren't documents; they're pictures of
documents. They're not reusable.

" 	I'm not sure of the use of the term 'profile' here, I think you mean to
" use it in the loose sense.
" (In the technical terminology, HTML is an SGML 'application' not a
" 'profile' - an important distinction.
" I proposed in a previous email that it be made a profile - of XML.)

I'm probably being sloppy. I mean that I think HTML 5.0 should be an XML
application. I've seen "profile" used only in ways that seemed synonymous
with "application." Perhaps you can clarify the distinction to me in
private email? Much obliged.

" Does any user agent actually pay any attention at all to DOCTYPE?

Not the big ones, and it's a fine thing - they'll sniff incoming XML for
recognizable tagnames and do their rendering thing. MIME type/suffix needs
to change, but that's easy. Real XML browsers will actually parse.

" 	HTML should be generated only by machinery, for machinery. Humans
"have no
" business writing HTML.

Humans will stop needing to author in text editors when the languages they
write in are suited to their applications, and browser/editors have time to
develop and support this rightness. The history of HTML's browsers and
editors (and the hairlessness of its paid practitioners) should make it
pretty clear that HTML is ill-suited to its most common applications:
document layout and styling.

	* * *

To business, though. How much does the group agree upon for a new charter?
Am I batting at shadows?

My main objections to Daniel's charter proposal have been strewn throughout
this exchange. To summarize:

* The extensibility problem is described as "a general lack of standardized
results for web pages." I take this to mean that the rendering behavior is
not standardized. This, to my mind, is no problem for HTML to solve at all,
but for stylesheets. Certainly the CSS&FP WG sees it this way.

* Scalability. The problem is again defined as one of suitability to
multiple display devices. CSS's media-specificity addresses this - separate
stylesheets or even documents for separate device classes. Changing HTML to
address these problems will subvert the stylesheet efforts.

* Conformance. Yet again, the problem is cast as "a given HTML document will
display remarkably differently even on the same platform when displayed in
different clients". Yet again, to the extent that this is really a problem,
it is for stylesheets to fix.

	* * *

Now that "HTML/CSS flow objects" have been hijacked as the standard
*output* of at least one XSL implementation, and now that everybody admits
that no major browsers are really going to pay attention to SGML DTDs for
HTML (i.e., parse according to doctype, but instead just sniff for tags and
render *output*), we're left only with documents in hypothetical XML DTDs
as general purpose document *inputs*, or source. And this is the hole I
hope HTML 5 can fill. HTML 5 instances might be converted to some
lower/hybrid/hacked form of HTML for display until stylesheets are better
implemented, but it will capture document data in a maximally reusable,
minimally formatted way, and in a more parse-friendly syntax than HTML 4.0
Strict (XML, not SGML).

Todd Fahrner

The printed page transcends space and time. The printed page, the
infinitude of books, must be transcended. THE ELECTRO-LIBRARY.
	- El Lissitzky, 1923

Received on Wednesday, 20 May 1998 23:38:57 UTC