Re: [OT] XHTML and form elements from David Woolley on 2003-11-08 (www-html@w3.org from November 2003)

From: David Woolley <david@djwhome.demon.co.uk>
Date: Sat, 8 Nov 2003 12:35:34 +0000 (GMT)
To: www-html@w3.org
Message-Id: <200311081235.hA8CZY514875@djwhome.demon.co.uk>
This isn't really the place to give tutorials on structured structural
markup languages, but...
 
> Your code validates. Good. Then I tried a variation of your code: I placed
> everything from <form> to </form> in another table around the entire

What your words say would validate, but I don't think that you are being
precise in your use of words.  As we are talking XHTML, XML rules
apply.  In XML open and close tags are like left and right parentheses,
and, moreover, the tag name in a close tag is completely redundant, so
you cannot use it to match up the corresponding open tag.  If you,
for instance, try

<td><form></td>..<td></form></td>,

and destroy the information about tag name in the close tags, you get
something like:

<td><form></??>..<td></??></??>

Applying the rule that, like parentheses, tags must balance, this would
have to be parsed as:

<td><form></form>..<td>  ABORT, td is not a valid direct descendant of td.

Giving the tag name in the closing tag allows an earlier abort as one
knows that the first </??> must be </form> but you've actually found a
</td>.

The same rules apply in HTML (as against XHTML) but you are allowed to
miss out some of the tags (you are actually allowed to miss some opening
ones, but I'll only consider closing ones).  The ones you are allowed to
miss are based on element type and not context, but they are chosen so
that it is always possible to know where the tag would have been.  That
means that some elements must always have a closing tag.  When that tag is
found, you know it matches against the corresponding opening one and you 
know that tags must be balanced like parentheses, so you know that any
open elements with optional close tags must have those close tags immediately
before the explicit closing tag.

Unfortunately, some early browsers, particularly Netscape upto and including
version 4, seem not to have interpreted HTML this way, but to have turned
on bold on <b> and turned it off on </b> regardless as to whether or not
they were properly nested.  This sort of behaviour is often described as
"tag soup" because there is no logic in the placement of the tags.

However, CSS and the document object models that underly powerful use of
scripting rely on tags being properly nested and modern browsers only work
well when that is true.

Your basic problem is that you are trying to create two incompatible
structures: the logical structure that HTML was designed to describe
and an artificial one to produce a gridded layout.  The only way of
resolving such a conflict is by breaking things up into the smallest unit
that is common to both structures, but that would mean a form that only
occupies some rows of a table would have to be a form per table cell,
but you want the whole form to submit at once.

Tagged PDF handles this by making the dominant form the layout, then
providing a parallel description (partly embedded, but partly out of
line, that allocates parts of the physical layout to places in the
logical structure).  HTML on the other hand, takes the position that
the meaning of the document is what matters, not how it appears, and
uses style languages to overlay an appearance on the logical structure.
Vendors realised that the likely buyers of web page design products
weren't actually interested in logical structures, so HTML became
polluted with things to directly control appearence.  About five years
ago, with HTML 4, an attempt was made to remove these, but has been
largely unsuccessful in commercial use.

> OK, I guess some of us are a lot less familiar with what the specs specify
> versus what the browsers do on the user's end. CSS fixes this, but it seems
> strange to have to "fix" something that was never specified.

There is only one rule here:  HTML does not specify how a document should
be presented to the user, only how it's content is structured and the
general nature of the meaning.

> The designer's only medium of communication is the browser window, which

The person responsible for communication aspects should not be concerned
about how browsers will display things.  It is confusing form with 
content that causes this sort of problem in the first place.

I accept that commercial decisions by browser writers mean that the ideal
is difficult to achieve without compromising structure or layout, but
in an ideal world, the information content should be written, in HTML,
by someone who is only concerned about the information and the graphic
designer should not touch the HTML, but work purely in a style language.

You work on governmental sites.  In the past these have actually maintained
good separation: you could usually use them in a text only browser without
being aware of limitations.  This is breaking down in the UK.
Received on Saturday, 8 November 2003 07:35:38 UTC