Re: Objection to HTMLWG ISSUE-144 Change Proposal #2 (keep u non-conforming) from Ian Hickson on 2011-04-04 (www-archive@w3.org from April 2011)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 4 Apr 2011 05:51:20 +0000 (UTC)
To: www-archive@w3.org
Message-ID: <Pine.LNX.4.64.1104031720450.25791@ps20323.dreamhostps.com>
On Fri, 1 Apr 2011, Aryeh Gregor wrote:
>
> The primary use-case for <u> is presentational markup, such as in 
> WYSIWYG editors, or when your client tells you "I want that text 
> underlined over there" and you aren't being paid to give him a lecture 
> on the importance of media independence.

This use case is already handled by CSS.

Underlining should not be encouraged since it conflicts with the 
de-facto hyperlink affordance. Therefore we should not have an element 
whose entire purpose is this presentation, even if we should have 
presentational markup, which we should not for reasons discussed 
elsewhere.


> The claim that if we make <u> conforming we should also make <font> 
> <big> etc. etc. etc. conforming does not address the extensive rebuttal 
> of this argument that I wrote for the other change proposal (beginning 
> "<u> should not be invalid just because . . .").  I won't repeat it 
> here, but to summarize the most important points: the length savings of 
> <u> are greater than for most of the listed elements

<u> as proposed has essentially the same meaning as <i>, and there is no 
length saving between the "u" and "i".


> <u> is much more similar to <b> and <i> than to the non-conforming 
> elements listed

Indeed it is so similar that it is an unnecessary addition. The use case 
of "stylistically offset" is already entirely handled by <i>.


> and maybe we should make other presentational markup valid, but that can 
> be dealt with in a separate bug/issue and isn't relevant here.

This is a frequently claimed position, but making decisions like this on a 
case-by-case basis is seriously damaged language design. We have to take a 
holistic approach to the language or we will be forced to create a 
"compromised by committee" language that is internally inconsistent. 
Either we should decide we are making a presentational language, or we 
should decide we are making a media-independent, semantic-focused 
language. We cannot in good faith do both.


> The fact is, <b> is presentational markup too.

This is not a fact. As currently defined, the <b> element is 
media-independent: it is not just "bold" it is a definition that applies 
to multiple media in a way that an author can clearly distinguish when 
this element should be used vs other elements such as <i>, <strong>, 
<dfn>, et al.


> the specification says that <b> is to be used for "spans of text whose 
> typical typographic presentation is boldened", so it's defined solely in 
> terms of whether you want it to look bold.

This statement is provably false. It isn't defined "solely in terms of 
whether you want it to look bold", since the mention of bold is literally 
only one of 3 non-normative examples of the actual definiton', which is "a 
span of text to be stylistically offset from the normal prose without 
conveying any extra importance" (as contrasted with <strong> and <i>, 
which in the first case conveys importance and in the second case conveys 
text that is stylisticly offset but without the "no importance" rule).


> Yes, you can quibble that "b { font-weight: normal; font-variant: 
> small-caps }" would be correct according to the official definition, but 
> that's a purely academic point, because nobody in his right mind is 
> going to do that, ever.

That's false. I've done it quite often, exactly as per the spec's 
definition.


> Just because there might be someone out there who's using 
> <b> (or <i> or <sub> or <sup> or <small> or anything like that) for an 
> effect other than the normal one, and the spec technically allows this, 
> doesn't mean that it's somehow less presentational in the real world.

What matters as to whether it's presentational is what the spec defines it 
as.


> The fact that the definition allows room to use <b> other than for 
> bolding things is not meaningful -- it's playing word games in an 
> attempt to make it not look like it's presentational

No, the spec's definitions are not "word games". If they were, then it 
wouldn't matter what the spec said anyway. The spec's authoring 
conformance definitions are normative descriptions of best practices for 
authoring and not games. The spec's semantic definitions are normative 
descriptions of the best practices for authoring and not games.


> because of a long-standing crusade in the web standards community 
> against presentational markup.

It's not an arbitrary crusade; presentational markup has hurt 
accessibility, maintainability, performance, device-independance, machine 
processing, et al, a great deal. Furthermore it's an issue where authors 
have shown an inability to deal with subtlty.


> Presentational markup is not bad per se.  Some typographical effects are 
> commonly required but have no particular meaning.  Sometimes authors 
> just want some text to be bold or italic or underlined, and don't want 
> to have to reason about *why* they want it in some abstract fashion.

With bold an italics, there is less harm caused by such thinking than by 
underlining, and the elements can be defined in such a way that the 
majority of uses will be conforming even if the author doesn't really know 
why they desire that presentation.

However, underlining is not a common typographic effect except for 
indicating hyperlinks, and for this reason <u> is quite different than <b> 
and <i>. Thus the parallel in the above argument is inappropriate.

To put it another way: misuse of <i> and <b> usually does no more harm 
than removing semantic information from the markup -- e.g. using <i> 
instead of <em> does no more harm than simply not having used either. But 
using <u> can cause practical usability difficulties.


> WYSIWYG editors are the only way that almost anyone edits any rich-text 
> format, including HTML, because presentation does not require reasoning 
> about anything you can't see before your eyes. Everyone can understand 
> the difference between "this makes things bold" and "this makes things 
> italic".  But would *you* be able to tell when you should use something 
> that "represents stress emphasis of its contents" vs. "represents strong 
> importance for its contents", if you didn't already know one was <em> 
> and one was <strong>?

Yes, of course. Stress emphasis and importance are very distinct concepts.

That the state of the art in editor development is unable to currently 
provide the full gamut of HTML semantics to users of editors is not a 
reason to add <u>, since the state of the art is at least more advanced 
than that (if not so widely deployed). (It _is_ a reason to add <font 
style=""> in a manner only allowed for WYSIWYG editors, which we had done 
many years ago and removed for various reasons again many years ago.)


> So the real use-case here is presentation, and that's a completely valid 
> use-case.  Without <u>, we have to use <span style="text-decoration: 
> underline"> or <span class=u> or something like that.

Or just <i> and a simple style rule in the CSS.


> Since underlining is one of the commonest stylistic effects available, 
> there's no reason to declare an existing, fully-functioning shortcut 
> verboten.

Underlining is an archaic stylistic effect in general, and is not one to 
encourage on the Web.

The first hit for the word "underlining" on Google:

   http://englishplus.com/grammar/00000111.htm

...explicitly indicates that italics and underlining are equivalent. This 
argues strongly that <i> should be used for those semantics and that CSS 
should be used to control the style.

The first hit for "fashionable typography" on Google:

   http://stewartdesign.com/2009/04/basic-tips-for-more-fashionable-typography/

...is a diatribe on how underlining is antiquated and should only be used 
for hyperlinks.


> On the other hand, if <b> and <i> are actually semantic and not
> presentational, then so is <u>.

Whether they are semantic or presentational depends entirely on their 
definition. If we add <u>, we could define it either way. That's a given. 
We could define <u> as meaning "text spoken by a baby", or as representing 
"names of fruit".


> The proposed text closely follows the pattern set by <b> and <i>'s 
> current definitions.

This is merely evidence that <u> adds nothing that isn't handled by those 
elements.


> Even if it were the case that "a span of text to be stylistically offset 
> from the normal prose" is already represented by <i>, there's no reason 
> given why we can't have two elements with the same semantics.

There's no benefit to doing so either, and there are multiple negatives; 
for example it leads to arguments about which element is appropriate (this 
is why we dropped <acronym>, which was redundant with <abbr>), and in the 
case of <u>, it has default styles that are considered "antiquated" and 
confusing for users.


> Confusing authors is an important concern here, but it's one that speaks 
> in favor of making <u> conforming.  Authors are all familiar with word 
> processors and other formatting systems in which bold, italic, and 
> underline are all prominently available right next to each other.  
> Allowing only bold and italics, but not underlining, is sure to be 
> extremely confusing to authors.

All are already allowed, as is "blue" and "small-caps" and "sans-serif", 
and so forth, via CSS.


> As far as underlines being confusing on the web because of links, I 
> agree with that, but authors still want them.  This is clear when you go 
> to any WYSIWYG web editor, since in my experience, they all have 
> underline buttons.

They also have "text color" buttons, but we don't support that. In 
general, just because a feature is commonly provided doesn't mean it's 
widely used, just because a feature is widely used doesn't mean it's 
widely desired, and just because it's widely desired doesn't mean it's 
good practice. Authoring conformance criteria should reflect best 
practice, not existing practice. (It's implementation conformance criteria 
that should match existing practice.)


> Nobody has provided any reasoning to suggest that making <u> invalid 
> will discourage the use of *underlining* -- if we look at how major web 
> applications are written, it seems much more likely that people will 
> just switch to <span style="text-decoration:bold"> or <span class=u>.

It's hard to provide numbers either way, but in general the assumption 
behind having authoring conformance criteria at all is that they are a way 
to convey best practices to authors. Authors can naturally ignore these, 
either by using other mechanisms like style="" or class="" in manners that 
are equally media-specific, or indeed by using <u> despite it not being 
conforming. If the fundamental assumption that authoring conformance 
criteria can influence author behaviour for the better is false, then our 
whole approach to how we write HTML should change. IMHO, such a decision 
is out of scope of this issue.


> I know that when the issue of moving MediaWiki to future HTML versions 
> came up once a few years ago, then-lead developer Brion Vibber said 
> something to the effect of "If some future version of HTML is stupid 
> enough to ban <b>, we'll just automatically rewrite it to <span 
> style='font-weight: bold'> to keep the validator happy."

This is a natural problem in a situation where a presentational format (in 
this case the MediaWiki format) is mapped to a semantic format (in this 
case HTML). If we want to make it possible to use HTML as a media-specific 
presentational language, then we have much work to do beyond just adding 
<u>. Again, this is not something we should do piecemeal; this kind of 
overarching change in design direction should be made in conjunction with 
its implications, otherwise we end up with a highly inconsistent, 
comprised-by-committee design.


> Authors who care about validation are not deterred by having to write 
> longer markup to achieve the same effect.

This is a massive generalisation. While there are indeed such authors, and 
such authors cannot be taught about best practices solely through 
validators and conformance requirements, there are also those for whom 
conformance requirements are used as a stepping stone to a fuller 
understanding of the basic design of the language.


> So ruling out <u> will only discourage the use of the <u> element 
> itself, not the use of underlining.

It is, as they say, a start.


> Given that underlining will happen anyway, there's no reason to make 
> markup more bloated so that web pages are harder to read.

One could easily say "given that pages will be made inaccessible anyway, 
there's no reason to make the language unintuitive in an effort to make 
authors write more accessible content". There _is_ a reason, indeed many 
reasons, which have been discussed at length.


> "The best practice (for accessibility, maintainability, and semantic 
> analysis) is widely recognised to be the separation of semantics and 
> styles, which argues against presentational markup such as in this 
> proposal."
> 
> No evidence or reasoning is provided to back this statement up.

Accessibility: semantics are easier to map to media-specific presentations 
(e.g. speech synthesis) than are media-specific styles (e.g. visual 
styles) because to map a media-specific style to another medium's styles 
one has to first determine the meaning of the styles, which is an unsolved 
computer science artificial intelligence problem. For example, does the 
underline indicate importance, which should be mapped to a more deliberate 
speech pattern, or is it merely an aethetic effect, which should not map 
to anything? Does it indicate a link, which should be clearly denoted 
(e.g. with audio icons), or does it indicate a stress emphasis, which 
should merely be mapped to a slightly altered voice? Given the state of 
the art, separating semantic markup from styles is therefore the best 
practice for accessibility.

Maintainability: Should the author (or the author's employer/client) 
decide that actually underlining all the headings was a mistake and they 
should instead be italics, the change can be trivially implemented if the 
markup is semantic rather than stylistic: simply change headings to be 
italics rather than underlined. If, instead, a stylistic element is used 
within the pages each time an underline is required, the author is going 
to have to go through every part of every page changing just the 
underlines that correspond to headings. This would take orders of 
magnitude more time. Given this, separating semantic markup from styles is 
therefore the best practice for maintainability.

Semantic analysis: As with accessibility, the ability for a computer to 
distinguish underline when used for a proper name mark, when used to 
indicate a hyperlink, when used to indicate emphasis, when used to 
indicate italics in a manuscript, when used to indicate a spelling error, 
and so forth, requires artificial intelligence at the cutting edge of 
natural language research (or beyond). To allow semantic analysis to be 
performed by those who do not have access to the latest and greatest 
research, and indeed to enable semantic analysis to be done at all in many 
cases given the state of this research, the input markup must include at 
least basic hints as to the meaning implied by the presentation. As such, 
separating semantic markup from styles is therefore the best practice for 
enabling semantic analysis.

(Note that the above are specifically problems with the <u> element!)

These principles (and others that don't necessarily apply specifically to 
the case of the <u> element, such as performance) have long been 
recognised. The Web Standards Project, for instance, has been saying this 
since before 2001:

 "Each layer of a Web document was designed as part of a whole framework 
 to achieve this balance. This is why the separation of structural HTML 
 or XML from the presentation of a document is so important"
 -- http://archive.webstandards.org/mission.html

Wikipedia:

 "Separation of presentation and content (or "separate content from 
 presentation", a special case of the form and content principle) is a 
 common idiom, a design philosophy, and a methodology applied in the 
 context of various publishing technology disciplines, including 
 information retrieval, template processing, web design, web development, 
 word processing, desktop publishing, and model-driven development."
 -- http://en.wikipedia.org/wiki/Separation_of_presentation_and_content

As far back as 1998, this was being explained in tutorials:

 "One of the nifty little concepts that HTML inherited from its rich 
 daddy SGML, is the idea that document structure and document presentation 
 should be separate."
 -- http://www.webreference.com/html/tutorial5/1.html

Even people who would probably agree with the proposal to add <u> to the 
language agree with the principles laid out above:

 "It is absolutely a best practice to separate your content, presentation, 
 and behavior layers as much as possible."
 -- http://jeffcroft.com/blog/2007/aug/09/myth-content-and-presentation-separation/

The best practice (for accessibility, maintainability, and semantic 
analysis) is widely recognised to be the separation of semantics and 
styles, which argues against presentational markup such as in this 
proposal.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 4 April 2011 05:51:51 UTC