Re: XHTML 2.0: Suggestion for <addr/> and <blockaddr/> to replace <address/>

On Fri, 5 Dec 2003, Karl Dubost wrote:

> Le 02 déc. 2003, à 08:11, Lachlan Hunt a écrit :
> >  No, it's not presentational, its the result of not being able to
> > specify content models that change depending on the parent element.
> > If there were no distinction between inline an block, then you could
> > have things like paragraphs inside span elements
> > (<span><p>...</p></span>), so the distinction is necessary.
>
> I have thought a lot about it. And I unfortunately do not agree.

I'm not sure I understand what you disagree with, but I presume it's the
necessity of distinguishing between inline and block. To call it that way
is basically a shorthand for describing certain syntax rules, of course,
but there's an overall idea behind it, and the wording reflects this: some
elements are regarded as "block elements", and this _basically_ reflects
the idea of rendering an element as a rectangular box (which occupies the
entire available width by default). It's fundamentally a presentational
idea that has been converted into a _formally_ structural principle, in
the sense of writing syntax rules accordingly.

But paragraphs inside span elements can hardly be a counterargument for
removing the distinction between block and inline, since the very
existence of span as distinct from div is only caused by that
distinction. If you remove the distinction, you only need one general
grouping element with no defined semantics, instead of two.

> But I guess the usual confusion is between
>
> - Semantics
> - Structure
> - Presentation

There's even more confusion. Any of those can mean rather different
things. For example, has the "good" old <b> element got semantics? Many
people say it doesn't, and make this the crucial difference between <b>
and <strong>. But I would say that <bg> has the very exact semantics of
specifying bold face font. It's purely presentational semantics, but
that's a different issue.

"Structure" often means nothing but formal syntax these days. People talk
about XML as describing "structure", perhaps even thinking that XML per se
defines structures in some other sense. For example, the HTML <table>
element is of course a structure in the formal syntax: it contains <tr>
elements that contain <td> elements, to mention the simplest case. But
that's not all that there is in a table structure. A table, as tabular
data, implies that cells in corresponding positions in different rows
somehow relate to each other as a column. (For various reasons, it's
currently difficult to "work with" columns, but the need surely exists and
reflects the fact that there's more in the structure than the SGML or XML
"structure".)

To put it in another way, "structure" could mean just the nesting
rules and other syntax rules. In some cases, it is apparent that this has
little do with what we intuitively regard as structure. For example,
whether the elements <caption>, <col>, <thead>, <tbody> and <tfoot>
appear in this order or some other inside a <table> element is irrelevant
to a tabular structure as a logical concept but surely relevant to SGML
or XML "structure" (since a DTD must take a position, either by imposing a
fixed order, or allowing any order, or something between these alternatives).

Presentation is theoretically the least confusing of the concepts, but in
practice, people tend to confuse it with semantics. It's so easy to think
that some visual effects _are_ semantical (or structural), instead of
being just some possible ways to expressing semantics (or structure).

> Structure and Presentation are very difficult to distinguish and it
> seems sometimes overkilling to have two tags for the same semantics
> when only the structure has changed.

If we consider the distinction between block and inline elements to
consist of formal "structure" only, and specifically of rules for allowed
nesting, then the verdict is obvious: the distinction is to be
judged as reflecting a presentational idea*), unless proven otherwise.
And a proof would consist in explaining in logical terms, and not
referring to any nesting rules (since it is they that now
call for a reason), what it means to be a block element as opposed to an
inline element.

*) Besides, the presentational description of the distinction is purely
visual. What is a rectangular box in speech, or in Braille?

> The structural interpretation IMHO comes from the HTML old times and
> should not matter so much. Though come the difficulty to understand
> what is structure and what is semantics. A list is a structure or
> semantics. It depends on who's looking at it.

Well, yes, in a sense. And the distinction between <ol> and <ul> is
essentially presentational. In both cases, the idea surely is that the
list items are in a specific order. The distinction is that for <ol>, the
order is made more explicit. A genuinely structural (i.e., not purely
syntactic) list concept could, for example, contain markup that indicates
that the items are in a priority order, or in time order, or in a random
order.

> You could achieve the same presentation and the same meaning with:
- -
> <p>If you need more information write to: <address>Acme Inc. 42, Main
> Street, Douglas City</address>.</p>
- -
> <p>If you need more information write to:</p>
> <address>Acme Inc. 42, Main Street, Douglas City</address>

In the latter case, the <address> element is not part of a paragraph, and
there is nothing that suggests that they are in any way semantically
related to each other. Considering a hypothetical search engine that finds
<address> elements and shows them in context, it is obvious in the first
case what the immediate context is, whereas in the latter, there's no
indication.

But regarding to the <address> element in particular, it seems pretty
obvious to me that it was included into HTML for mainly presentational
reasons, to present addresses as blocks. No attempt has been made to
define its internal structure, despite that fact that most addresses are
structured in some way or another.

What's worse, in a sense, is that some common default renderings, like the
use of italics in a widely used browser, are rather unsuitable for
presenting addresses.

The question arises: what's the _use_ of <address> elements? Currently and
in the past, authors have used it for a variety of purposes in presenting
addresses, somewhat oddly, since if you want line breaks, you need to put
them there anyway, and if you want italics, you had better use <i> and not
rely on _some_ browsers rendering <address> in italics, and if you don't
want italics, you need to use CSS to "switch off" the default rendering
(to the extent you _can_). Logically, on the other hand, there are many
_potential_ uses of structural markup of addresses. But a mere <address>
element is not of much use. Just extracting (purported) addresses doesn't
help much, when they might in fact be any types of addresses written in
any way. So _if_ there is an <address> element, it should have
- a clear definition that says whether it is for the document author's
  contact address (as currently defined in HTML specs) or any
  address information
- a definition that says what types of addresses could be included
  (e.g. to avoid confusions like the old idea that HTML 2.0 <address>
  element was not postal addresses!)
- some minimal internal structure that at least makes it possible
  to separate major types of addresses, such as postal, E-mail, and
  Web addresses (and telephone "addresses", if permitted).

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Saturday, 6 December 2003 05:19:23 UTC