Re: Cleaning House

Murray Maloney wrote:

> I do agree that sometimes I want to highlight text because convention
> suggests that it is good form to do so and my aim is to subtly signal
> the reader that this phrase is different somehow, and at other times
> I want to catch the reader's attention more overtly.

Hurray! :)

>> Now its very existence suggests that <em> has some purpose beyond
>> <i>; and the early discussion from www-talk I quoted demonstrated
>> that this difference between stress emphasis and other uses of
>> italic and bold was recognized by the correspondents. So I don't
>> think associating <em> with the stress emphasis is unreasonable.
> No, not unreasonable on the face of it, but misleading. I don't think
> that most people think of <em> that way.

I'm not sure. I'd /guess/ the number of people who think of <em> at all
is quite small (most people who aren't markup geeks seem to either use
WYSIWYG tools that generate <em> automatically or hand-code <i>). Again,
I'd /guess/ people who hand-code <em> are likely to have been influenced
by the web standards movement. How many of them actually understand the
distinction drawn by thought-leaders in the web standards movement
between ambiguous expressions of stress emphasis (bold and italic,
sometimes) and stress emphasis in the abstract is more questionable.

> They have been told to use <em> instead of <i>, not just with 
> emphatic phrases, but because <em> is somehow 'more semantic' than 
> <i> and therefore its users are more virtuous. The problem is now
> people who use <em> are just as likely to be using it virtuously as
> they are to be trying to avoid hell.

I suspect the fundamental problem has been that the web standards
movement has a clearer line on avoiding <i> than on what to replace <i>
with for non-stress-emphasis uses of italic, like ship names and book
titles. Always propose workable replacements across the entire range of
common use-cases!

> I think that the point I was trying to make was that it was
> misleading to claim that <i> could only be rendered with an italic
> typeface.

Can't remember now who claimed that, but apologies if I did or seemed
to, because it's wrong.

> What has come up through this discussion, but has not much been
> followed up is my suggestion that CLASS attributes (or pick your
> fave) could be used to provide layers of useful semantics onto
> primitive elements like <i>, <b> and <span>.

Well, this is part of the HTML5 draft, under consideration by W3C in the
form of GRDDL, the basis of the microformats movement, and currently
being compared to the option of using role or coming up with new
elements in this and other threads. What additional discussion would you
like that isn't being had?

> I find that the presence of <em> offers me no advantage in
> understanding the reason that some phrase was marked up. Context and
> precedence serve as a much better guides because the subtlety of <em>
> is lost on most authors.

But can't easily be used by machines like screen readers.

> What I am saying is "Stop picking on <b> and <i>, accusing them of
> being non-semantic, when <em> and <strong> are barely semantic and
> certainly not in a way that is proving especially useful to anyone."

Okay, even if we disagree about whether <i> is non-semantic, we agree
that <em> /is/ "semantic" even if only barely. We also agree that the
distinction isn't "proving especially useful to anyone". My personal
approach is less to pick or indeed stop picking on <i>, but more to try
and fill the markup void left by <i>'s attempted replacement with <em>.
Also, I'm beginning to think it might help if we had an element like
<stress> because it's a less confusing name, at least in English.

>>> I am qualified to say that you can redefine <b> to red and <i> to
>>> green and aural and Braille readers can ignore or re-map them
>>> too.
>> Of course. Although because <i> can be used without stress and <em>
>> is often misused without stress, many aural and braille remappings
>> will be erroneous. I doubt most authors know about such remappings
>> though.
> Let's give authors more and better choices then. Let's give them a
> way to say what they mean. And if they want to say <i>, then let's
> let them.

Let me see if I follow this. Do you want HTML-next to say that <i> may
be used when you want to distinguish text with italics in print, italics
in braille, and stress in voice? Given that braille standards are
stricter than print conventions and that most authors won't have read
the braille standards, how would they know whether the piece of text
they are distinguishing /should/ be italic in braille? Since they
presumably wouldn't and would use <i> anyhow, surely that would actually
frustrate their attempt "to say what they mean"? And what about users
who can have visual impairments such that they can read ordinary text
but find italic text difficult to read? This is a trivial case, but it
seems to be a microcosm of how markup geared for visual presentation
fails to communicate what authors mean across multiple media unless they
have unusual expertise.

> And if they want to say <span class="stressEmphasis> then let's let
> them do that too.
> And no, I do not want to standardize CLASS names. I want a reliable
> way to define profiles and have them understood, along with links to
> related resources.

What way would you suggest for them to be understood by user agents?

> And that may not scale perfectly, and yes there will be collisions,
> but it can work quite well in controlled settings.

What controls would you suggest?

>> Maybe (since you were there) you'd care to recollect for us /why/ 
>> <em> was introduced in the first place, seeing as <i> was already 
>> allowed to fallback to non-italic representations?
> Ha! Ha! Is your intent to draw attention to the fact that I am a 
> dinosaur? :-) I am actually just a bit younger than Tim Berners-Lee
> and older than Bill Gates.

Hehe. :) No, actually, I just like taking a historical approach to
things and find it illuminating to see debates from 1993 replayed in
2007. Unfortunately, the documentary sources are rather patchy,
especially since much of the www-talk archives were lost. And even if we
had fuller archives, anecdotes would make an interesting supplement.

> I cannot speak to the moment that a decision was made to include <i>
> and <em>.

Oh well.

> I can only say that as an author and as a manager of writing
> departments, I want to ensure that authors can write easily and
> freely, and come back to encode with semantic tags or attributes
> later. It is a question of convenience.

Can't one already do this with something like RTF, ODF, or PDF? Isn't
this just a browser support problem?

Also, wouldn't "encod[ing] with semantic tags or attributes later" be
impossible with much of the user-generated content that is a major
use-case of modern HTML?

> I think that there should be a full set of document publishing
> primitives.

I'm not personally persuaded that these primitives should be typographic
in nature, but nonetheless would you care to propose at least the
members of this set (beyond <i>, <b>, and <indent>) so people would have
a clearer idea what we're talking about here?

> With these in place, the rate of semantic pollution may decrease over
> time.

PDF is an open publishing format which can be enhanced with semantic
tags. As far as I know, only one PDF production tool implements support
for those tags. Very few PDF documents seem to be tagged. Far more PDF
documents consist only of scanned images. /Partly/ as a consequence, PDF
has an even worse reputation for accessibility than HTML or Word
documents. I realize the cultural position of PDF is somewhat different
to the culture around HTML. Nonetheless, the example of PDF perhaps
suggests that we'd risk ending up with an even more tunnel-visioned
"presentational" web, even less of which could be accessibly navigated
or represented.

I think having lots of documents that are mostly accessible is a more
useful goal than having a few documents that are totally accessible. In
so far as semantic markup intersects with accessibility, the same goes
for unpolluted semantics. I suspect something similar is true of
semantic markup used for data mining.

Benjamin Hawkes-Lewis

Received on Monday, 7 May 2007 22:47:18 UTC