Re: [whatwg] Various HTML element feedback from Jukka K. Korpela on 2012-06-06 (public-whatwg-archive@w3.org from June 2012)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Wed, 06 Jun 2012 07:39:40 +0300
To: whatwg@lists.whatwg.org
Message-ID: <4FCEDF0C.4040208@cs.tut.fi>
2012-06-06 2:53, Ian Hickson wrote:

>> I have rather been optimistic about future developments for markup
>> elements that have been defined exactly enough to warrant meaningful
>> semantics-based processing. For example, most of the uses mentioned in
>> current text imply that <var> element contents should be kept intact in
>> automatic language translation.
>
> That continues to be the case, so I don't know why you conclude that using
> it is now pointless.

It is worse than pointless, if the definition of <var> covers "a term 
used as a placeholder in prose". Such expressions should definitely not 
be kept intact in automatic language translation.

The definition of <var> is so broad that it is questionable whether 
*anything* useful can be assumed in automated processing. If it were 
defined more technically, without that placeholder idea, we could fairly 
certainly say that the content should be treated as a technical notation 
that should be left untranslated (as such notations are normally 
international), ignored in spelling checks, treated as equivalent to 
unknown nouns in syntax analysis of human language text, etc.

>> So why not simply define <i> recommended and describe <var>,<cite>,
>> <em>, and <dfn> as deprecated but supported alternatives?
>
> What benefit does empty deprecation have?

Declaring some features as "obsolete" is effectively deprecation; I just 
used the term "deprecate" as per HTML 4.01 because I find it more 
descriptive. Anyway, defining those elements as deprecated/obsolete 
would be no less and no more "empty" than the current statements about 
obsolete status. Validators/checkers would issue messages (hopefully 
just warnings) about them, and tutorials would probably describe them as 
secondary if at all.

Reducing alternatives, from five to one in this case, makes the 
recommendations simpler and helps authors because they need not spend 
time in making choices between the elements. Such choices can be tough, 
if you try to play by the declared "semantics", especially if it is 
vague (to a normal reader of a spec).

My point is: either make elements like <var>, <cite>, <em>, <dfn>, <i> 
defined so that the differences can be utilized in automatic processing, 
or just bundle them together, to <i>.

> It's not like we can ever remove
> these elements altogether.

Oh, in 20 or 30 years, I think browsers could support to some of them.

> What harm do they cause?

Unnecessary complication to the language, artificial "semantics" that do 
not actually define meanings, and confusion among those authors who try 
to take semantics and specifications seriously. Oh, and pointless 
variation in markup and added complexity of styling.

> If we have to keep them, we are better served by embracing them and giving
> them renewed purpose and vigour, rather than being ashamed of them.

I think this summarizes well the idea behind some of the most contrived 
"semantic" definitions. It was a brave attempt, but it failed. No normal 
author will ever get your idea of the new meaning for <b> and <i>, for 
example.

And since, for example, the <font> markup needs to be supported for a 
long time, how come *it* has not got a new, semantic definition?

If <var>, <cite>, <em>, <dfn> would be obsoleted/deprecated in favor of 
<i>, they would still need to be defined in the spec, of course. But the 
definition could simply state that they are outdated elements that 
should not be used by authors and should be treated by browsers as 
equivalent to <i>.

>> This would make authoring simpler without any real cost. There’s
>> little reason to tell authors to use “semantic markup” if we don’t
>> think it has real effect on anything.
>
> It does have an effect. It has many effects. It makes maintenance easier,
> it makes it easier to transition from project to project, it makes it
> easier to work on other people's markup, it makes it significantly easier
> to dramatically change a site's appearance, it makes it easier to create
> apply custom tools to extract information from the documents, it makes it
> easier for search engines to guess at author intent, it makes it easier
> for the documents to be repurposed for other media, it makes it easier for
> documents to be "remixed", it makes it easier for JavaScript libraries to
> be used and mixed...

I've often seen such arguments, even in situations where it is 
strikingly obvious that they don't apply. The argumentation sounds like 
a matter of faith or principle rather practical considerations.

Many of the arguments relate to authoring style, coding principles, and 
organization of work, rather than something that belongs to a general 
specification. For example, the ease of working on other people's markup 
in a collaborative environment depends on a large number of factors, 
including the overall structures, appearance of markup (lower vs. upper 
case, use of quotes, omission of omissible tags, indentations, empty 
lines), principles of choosing id and class names, use of comments, etc. 
General specifications cannot and need not handle such issues. And, say, 
the use of <b> vs. <strong>, given their current definitions, is quite 
comparable to regulating the use of class attributes.

The other major part of the argumentation refers to assumed automatic 
processing. This is mostly just assumptions, or wishes, often presented 
if they were facts. But they *could* be turned to reality, in part. This 
is just the reason why I have asked for semantic clarifications. No one 
can reasonably base automatic processing on definitions like those for 
<var>, <b>, etc. now.

Let legacy be legacy, instead of trying to convert it to "semantics". 
The semantics of physical markup is the visual appearance. It is best to 
describe it simply and openly (and accurately - for example, what <i> 
really means in legacy markup, and will mean in browsers in the 
foreseeable future, is italic *or* oblique *or* algorithmically slanted 
font).

>> What is _compelling_ about markup for misspellings?
>
> It's a feature that is necessary in text editors, for which we previously
> did not have a good solution.

I would not call it a solution to say that the <b> markup, which 
actually means bold face to any existing relevant software, should be 
used for specialized meanings. How could anyone, or any software, 
reading markup guess whether <b> means "misspelling", or "Chinese name", 
or some entirely different "unarticulated, though explicitly rendered, 
non-textual annotation"? Such things can be resolved via classes, to 
some extent, but then the artificial "semantic" definition for <b> is 
pointless.

Yucca
Received on Wednesday, 6 June 2012 04:40:14 UTC