Re: "Presentational" vs. "Legacy" from Jonny Axelsson on 2000-04-04 (www-html@w3.org from April 2000)

From: Jonny Axelsson <jonny@metastasis.net>
Date: Tue, 04 Apr 2000 22:11:46 +0200
To: www-html@w3.org
Cc: www-html@w3.org
Message-Id: <3.0.6.32.20000404221146.008a2710@mail.linpro.no>
We all agree that BLINK and FONT is bad, but if XHTML is to be a rethink of
HTML, we should go back to the start, a lot that made sense a decade ago,
isn't how it would be done today.

(*) I and B is imperfect markup, but having it is better than not having it. 
    It does have a use.
(*) If we imagined that I and B didn't exist, EM might have been created, 
    STRONG wouldn't. STRONG is there because B is there (I said with perfect 
    telepathic sense. Correct me if I'm wrong, but I think Dan Conolly is the 
    inventor of STRONG. In which case this wasn't the best of his creations).

<offtopic> <? if you want to react to this, do it in a separate message ?>
The original messages also had these two points:
(-)Move CODE, VAR, KBD and SAMP into a separate module. While the above two 
   can be considered polite conversation under the barrage of device upload, 
   this one is serious and realistic. These four attributes are admirably 
   clear *for a programmer*, they are also clearly a separate module. For
fair-
   ness and regularity, it probably should be an optional module. For backward
   compatibility and cowardice it possibly should be an obligatory module.
(-)Some *structural* markup is currently as imperfect as I and B, probably
worse.
</offtopic>

As for structural vs presentational, it might be clearer to talk about
device independent and device dependent markup, these concepts are roughly
equivalent. The crux of your argument is that there is some necessity that
I/B must be italic/boldface and italic/boldface only. It is natural to use
italic/boldface in printed text, but if they are not available or proper,
some other effect (or none) could be used. A CRT could use inverse text, a
teletype underline, the Teletext system another colour and so on. "I" could
have been short for INTENSE, and "B" for BRUTAL, and you could've used "X"
and "Y" for all that I care. The main point is that italic is largely used
to represent stress in spoken language, especially emphasis of course, but
not only. Experiment: Read out loud a text with italics, then read it
without italics. It will be read differently.

If the rules had been just a little more consistent than they are, I would
seriously have suggested <i type="emphasis">, <i type="title">, <i
type="citation"> (many structural tags were made by deconstructing italic,
this scheme would have had the advantage of the "catch-all" I's without a
type).

At 15:24 03.04.00 -0700, Tantek Çw==elik wrote:
>From: Jonny Axelsson <jonny@metastasis.net>
>Date: Mon, Apr 3, 2000, 1:55 PM

>Here are some of my tenets (working assumptions):
I should have seen this coming, as I grabbed two numbering schemes for
myself. I renumber the points I'll reuse like this:

[JA:1] There are relatively clear typographical rules for when to use
I[TALIC] (in languages using italic)
[JA:3] Underline is primarily "poor man's italic" (from the age of the
typewriter), but is also used for special effects (like hypertext)
[JA:B] It is important to discern between representation and presentation.
EM /represent/ an emphasis, it might be /presented/ using an italic font,
or by having "/" on each side of the content.
[JA:D] People are inconsistent coders. No matter how structured XML
becomes, you can't avoid this.
[JA:E] Automated translations to/from XML is desirable, and so is
minimization of information loss in the process.


>[1]. If something is described as "typographic" or "typographical", it is
>likely to be presentational, rather than semantic or structural.
Rarely. Usually typographical rules are there to convey an idea in a
regular way. The look is presentational (like which quotes to use), but the
idea structural (short story titles should be in quotes). Typographical
rules are not standardized (Norwegian typographical rules are similar but
different to English rules, and the further away the language, the more
different the rules) and they are not one-to-one. Still they give valuable
metainformation. And you would want the final (print) result to be
presented according to typographic rules.

>[2]-[6]. [on the rottenness of word processors, and the greatness of the
HTML4 + CSS combo]
I have no beef with these ones.

>And I'll use these statements in my arguments.
>[A]. It has been clearly established by W3C Recommendations that B/"bold"
>[B]. It has been clearly established by W3C Recommendations that I/"italic"
>[C]. It has been clearly established by W3C Recommendations that
U/"underline"
This is the ortodoxy, and for HTML 4.0x the rules. I don't fully agree,
hope it is clear where I agree and where I disagree.

>[E]. Automatic translation of presentational documents (such as typical word
>processor documents) to/from XML documents is best done using inline
styles on
>the spans of text that are styled.
A word processed document is semi-structural. I'd like to take care of the
"semi", but I know I can cop out with SPAN/CLASSes as needed. 

>> C. Non-HTML documents are semi-structured (as are HTML/XML documents).
>Semi-structured might as well mean unstructured.  This semi-structure is
>typically ascertained by white space and styling, which can only be said
to be
>presentational, and certainly not necessarily structural [2].
There *is* structure in thar documents, enough that it is worth keeping.

>> D. People are inconsistent coders. No matter how structured XML becomes,
>> you can't avoid this.
>Agreed.  But it is much harder to code "tag soup" when your code must be well
>formed.
Wellformedness is immensely valuable for interoperability, but if you want
your ADDRESS to have the same meaning as my ADDRESS more than
wellformedness is needed. Almost dregging up another age-old discussion,
structural elements with no consensus of meaning are worse than the
"presentational" elements.


>> I is used to represent a half-dozen meanings [JA:1], one of which is
emphasis.
>This is backwards.  "italic" is one way of styling emphasis.

Call it reverse engineering if you want. When in a normal text a word or a
phrase is italicized, it is so for a reason. Some people overuse italic,
but if they are in a publishing company or in a similar role, house rules
will encourage them to stick to the standards, or editors may proof the
formatting.

>A better approach is to avoid presentational media-dependent tags, and to add
>new semantic tags instead, e.g. use <shiptitle> in your DTD for the above
>example, and then style them as appropriate for the audience, e.g.

It is the best alternative, and also in the general context the least
realistic one (a shipping company might). It would be nice to have coding
like <person class="politician"><firstname>...</firstname>
<lastname>...</lastname></person> in a free text, but I don't know if it is
possible.


>> Even the catch-all is useful, and often at the limit of what
>> authors can handle (if they don't understand when to use italic, they won't
>> understand how to use any other markup) [1CD].
>But where does it end?  Do we replicate all presentational styling as markup?
>Do you propose the FONT tag mess all over again?
Some nineteenth century texts used gothic and roman typefaces for different
kinds of texts, and <g> and <r> might have been suggested if HTML had been
defined then. Otherwise typefaces have no semantic value, and neither does
big/small (which I'm happy to see dropped from the XHTML 1.1 proposal). 

>Yes, and authors should only ever use a single exclamation point (!), but
>there are certainly plenty of examples of people using double exclamation
>points (!!) or more.  The reality is that there are more than just two levels
>of "em"phasis (none or some), and allowing EM EM acknowledges that.

And it gives me the opportunity to "stylesheet away" this kind of
overemphasizing. Now, if there were a way to remove superfluous exclamation
marks using CSS...

>HTML4 *by itself* is very poor at representing even what simple ten-year old
HTML4 by itself (with class and span) is very good at representing, but
poor at presenting.

At 17:55 03.04.00 -0400, Jelks Cabaniss wrote:
>Jonny Axelsson wrote:
>> BI(TT) are in a different category. My thought about this here:
>> <http://lists.w3.org/Archives/Public/www-html/2000Feb/0250.html>
>
>I read that when you first posted it and was mystified.  I just re-read
it, with
>a similar reaction.  Your summary:
>>	... Of all HTML elements, I and B are the only truly universal ones.
>is astouding.  Terms such as Italic, Bold, Underline, Strike-through, and
Font
>apply to _visual_ media, such as a printed page or your PC's screen; to
braille
>and audio devices they are *meaningless*.

Non-visual presentation benefits from clear and well understood rules too.

Old typewriters couldn't represent italic (they could however represent
bold by overtyping, fortunately this was rarely done), and they used
_underline_ instead. [JA:3] Your argument says that since typewriters can't
represent italic, italic shouldn't be used. But the mapping italic <-->
underline is unambigious, as may any other mapping, like italic <--> female
voice.
Received on Tuesday, 4 April 2000 16:17:25 UTC