Re: Objection to HTMLWG ISSUE-144 Change Proposal #2 (keep u non-conforming)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 4 Apr 2011 05:51:20 +0000 (UTC)
> On Fri, 1 Apr 2011, Aryeh Gregor wrote:

...
> <u> as proposed has essentially the same meaning as <i>, and there is no 
> length saving between the "u" and "i".

It is good to have more than one, short element in the palette. 

>> <u> is much more similar to <b> and <i> than to the non-conforming 
>> elements listed
> 
> Indeed it is so similar that it is an unnecessary addition. The use case 
> of "stylistically offset" is already entirely handled by <i>.

There is no problem in having synonyms. 

>> and maybe we should make other presentational markup valid, but that can 
>> be dealt with in a separate bug/issue and isn't relevant here.
> 
> This is a frequently claimed position, but making decisions like this on a 
> case-by-case basis is seriously damaged language design. We have to take a 
> holistic approach to the language or we will be forced to create a 
> "compromised by committee" language that is internally inconsistent. 
> Either we should decide we are making a presentational language, or we 
> should decide we are making a media-independent, semantic-focused 
> language. We cannot in good faith do both.

The kind of media independence you seem to have in mind is the XML-like 
media independence, where everything has a default, generic stylesheet, 
and is as useless in all medias.

>> The fact is, <b> is presentational markup too.
> 
> This is not a fact. As currently defined, the <b> element is 
> media-independent: it is not just "bold" it is a definition that applies 
> to multiple media in a way that an author can clearly distinguish when 
> this element should be used vs other elements such as <i>, <strong>, 
> <dfn>, et al.

We are not very convinced with this. And the I18N WG has already 
written a note saying that one should avoid <b>, despite what HTML5 
says.

...
>> Even if it were the case that "a span of text to be stylistically offset 
>> from the normal prose" is already represented by <i>, there's no reason 
>> given why we can't have two elements with the same semantics.
> 
> There's no benefit to doing so either, and there are multiple negatives; 
> for example it leads to arguments about which element is appropriate (this 
> is why we dropped <acronym>, which was redundant with <abbr>), and in the 
> case of <u>, it has default styles that are considered "antiquated" and 
> confusing for users.

Speaking about antiquated, there are media in which underlining is 
useful, such as colour constrained media.

> Accessibility: semantics are easier to map to media-specific presentations 
> (e.g. speech synthesis) than are media-specific styles (e.g. visual 
> styles) because to map a media-specific style to another medium's styles 
> one has to first determine the meaning of the styles, which is an unsolved 
> computer science artificial intelligence problem.

You have said the greatest danger is confusion with links. In which 
medium is there a danger that <u> can be confused with links? Certainly 
not in screenreaers, at least.

> For example, does the 
> underline indicate importance, which should be mapped to a more deliberate 
> speech pattern, or is it merely an aethetic effect, which should not map 
> to anything?

As you said: this is up to the definition. Clearly it should be defined 
as aesthetic. (And aesthetics can be subsequently be used to express 
semantics.)

> Does it indicate a link, which should be clearly denoted 
> (e.g. with audio icons),

This is an irrelevant problem for <u>. Those AT/UA which present the 
page as sound, have no problem with <u>.

> or does it indicate a stress emphasis, which 
> should merely be mapped to a slightly altered voice? Given the state of 
> the art, separating semantic markup from styles is therefore the best 
> practice for accessibility.

> Maintainability: Should the author (or the author's employer/client) 
> decide that actually underlining all the headings was a mistake and they 
> should instead be italics, the change can be trivially implemented if the 
> markup is semantic rather than stylistic: simply change headings to be 
> italics rather than underlined. If, instead, a stylistic element is used 
> within the pages each time an underline is required, the author is going 
> to have to go through every part of every page changing just the 
> underlines that correspond to headings. This would take orders of 
> magnitude more time. Given this, separating semantic markup from styles is 
> therefore the best practice for maintainability.

This is a highly artificial example. Whether <u> is part of the 
language of not will not affect the slightest on how authors style 
headings.

The practicality of <u> is that it can be used instead of <span>. 
Indeed, it is a semantic synonym, but a stylistic variation. The author 
is of course free to style it without underline.

> Semantic analysis: As with accessibility, the ability for a computer to 
> distinguish underline when used for a proper name mark, when used to 
> indicate a hyperlink, when used to indicate emphasis, when used to 
> indicate italics in a manuscript, when used to indicate a spelling error, 
> and so forth, requires artificial intelligence at the cutting edge of 
> natural language research (or beyond).

This also an artificial analysis: The author uses <u>, and so the 
computer knows what <u> means - it is written in HTML5.

> To allow semantic analysis to be 
> performed by those who do not have access to the latest and greatest 
> research, and indeed to enable semantic analysis to be done at all in many 
> cases given the state of this research, the input markup must include at 
> least basic hints as to the meaning implied by the presentation. As such, 
> separating semantic markup from styles is therefore the best practice for 
> enabling semantic analysis.
> 
> (Note that the above are specifically problems with the <u> element!)

Absolutely not. It sounds as if you describe what happens when you find 
a piece of paper with lots of underline, which you want to analyse. If 
<u> is defined as near the same as <span>, then a computer knows that.

> These principles (and others that don't necessarily apply specifically to 
> the case of the <u> element, such as performance) have long been 
> recognised. The Web Standards Project, for instance, has been saying this 
> since before 2001:
> 
>  "Each layer of a Web document was designed as part of a whole framework 
>  to achieve this balance. This is why the separation of structural HTML 
>  or XML from the presentation of a document is so important"
>  -- http://archive.webstandards.org/mission.html

Separation of structure and presentation is many different things, 
including the fact that an element can be restyled.

> Wikipedia:
> 
>  "Separation of presentation and content (or "separate content from 
>  presentation", a special case of the form and content principle) is a 
>  common idiom, a design philosophy, and a methodology applied in the 
>  context of various publishing technology disciplines, including 
>  information retrieval, template processing, web design, web development, 
>  word processing, desktop publishing, and model-driven development."
>  -- http://en.wikipedia.org/wiki/Separation_of_presentation_and_content
> 
> As far back as 1998, this was being explained in tutorials:
> 
>  "One of the nifty little concepts that HTML inherited from its rich 
>  daddy SGML, is the idea that document structure and document presentation 
>  should be separate."
>  -- http://www.webreference.com/html/tutorial5/1.html

And thanks to CSS, <u> can be restyled.

> Even people who would probably agree with the proposal to add <u> to the 
> language agree with the principles laid out above:
> 
>  "It is absolutely a best practice to separate your content, presentation, 
>  and behavior layers as much as possible."
>  -- 
> 
http://jeffcroft.com/blog/2007/aug/09/myth-content-and-presentation-separation/
> 
> The best practice (for accessibility, maintainability, and semantic 
> analysis) is widely recognised to be the separation of semantics and 
> styles, which argues against presentational markup such as in this 
> proposal.

However, your main argument against it is that it can be mistaken for 
links. Back in the days when the tutorials you cite where written, then 
image links were as well presented with a blue border. Today there are 
no blue borders, and yet we have little problem finding clickable 
images.
-- 
leif halvard silli

Received on Tuesday, 5 April 2011 02:56:35 UTC