[whatwg] sic element from Ian Hickson on 2012-01-23 (public-whatwg-archive@w3.org from January 2012)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 23 Jan 2012 23:18:28 +0000 (UTC)
Message-ID: <Pine.LNX.4.64.1201232013130.16982@ps20323.dreamhostps.com>
On Sat, 30 Jul 2011, Jukka K. Korpela wrote:
> 30.07.2011 01:39, Ian Hickson wrote:
> > On Sat, 30 Jul 2011, Jukka K. Korpela wrote:
> > > 29.07.2011 23:56, Ian Hickson wrote:
> > > > > 
> > > > > Anyway, aren't you saying that<u> says "this text is annotated 
> > > > > but no annotation is given"? In that case, saying that<u> draws 
> > > > > attention to its content might be more appropriate.
> > > > 
> > > > The physical line is an annotation. It's just not articulated.
> > > 
> > > I think you are using the word "annotation" to mean something 
> > > different from "a note added by way of comment or explanation" 
> > > (http://www.merriam-webster.com/dictionary/annotation). What would 
> > > an annotation be without a note - without a comment or explanation?
> > 
> > I was using it in the sense used in Wikipedia:
> > 
> >     "An annotation is a note that is made while reading any form of text.
> >     This can be as simple as underlined or highlighted passages."
> >      -- http://en.wikipedia.org/wiki/Annotation
> > 
> > If you have a word that better conveys this meaning, I'd be happy to 
> > use that instead.
> 
> I don't think Wikipedia can be relied on as indicative of meanings of 
> words. International specifications should use English words in their 
> meanings as described in dictionaries (or use them as technical terms 
> for which exact definitions are given).

I don't see any reason to consider "dictionaries" as any more 
authoritative than Wikipedia. However, if you want to go down this line of 
argumentation: the spec is written in Hixie English, the normative 
definition of which agrees with Wikipedia on the definition of 
"Annotation", so I think the word is fine.


> Since <u> as such lacks any comment or explanation, I think the wording 
> "draws attention to" describes the intended meaning.

I think "draws attention to" is a different thing altogether. There's no 
reason to assume that an annotation should be drawing attention to what it 
annotates. On the contrary, it could be something that is only visible if 
you go looking for it. For example, the tooltip given by a title="" 
attribute is also a kind of annotation, and it definitely doesn't draw any 
attention to the content it annotates.

With <u>, it would be perfectly fine for the underline to be hidden by 
default and only shown, say, on :hover. This shouldn't be non-conforming 
(which it might arguably be to some extent if we used the definition 
"draws attention to").

Again, though, I'm happy to consider alternative wordings if you have 
something better.



> > > There's nothing to be gained from adopting this new semantics for 
> > > <u>.
> > 
> > Sure there is. Device independence, for one.
> 
> I mentioned that there is no obvious, intuitively understandable way of 
> rendering <u>. This includes all devices. Maybe it's device independent, 
> but only in the sense that there is no natural implementation for any 
> device - but of course visual browsers will keep underlining it, but 
> this just means that the real meaning is "underline".

Braille readers would mark it with the italics sign, probably. Speech 
synthesis likely wouldn't do anything by default.


> > > Presentational markup may convey useful information, for example 
> > > that a quotation from printed matter contains an underlined word.
> > 
> > HTML is the wrong language for this kind of thing.
> 
> Such things have been possible in HTML since the beginning. What's the 
> tangible benefit of telling authors that they must not do so?

Moving authors from a visual presentation mindset to a semantic mindset 
increases the likelihood that content will be accessible across multiple 
media, it reduces the maintenance cost, raises the ease of site-wide style 
changes, increases caching ability... do I really need to still explain 
this in 2012? 


> > > Underlining also has specialized usage e.g. in some transliteration 
> > > systems.
> > 
> > Could you elaborate on this?
> 
> For example, a few transliteration systems use underlining of letters or 
> letter pairs, see http://transliteration.eki.ee/pdf/Arabic_2.2.pdf (and 
> before saying that combining underline characters be used instead, 
> please try to see whether it actually works and how convenient it is).

I couldn't find to what you are referring in the document above.


> > Inflexion changes, pauses, changes in timbre, pitch, speed, and other 
> > aspects of one's voice, are the ways all meaning is conveyed in aural 
> > conversations.
> 
> All meaning? Hardly. Sounds matter, too. But the point is that no such 
> method is intuitively recognized as meaning what you want <u> to mean.

Correct. In speech one typically does not convey "this word is mispelt". 
Similarly, in writing one typically does not convey exasperation. 
Different media have different characteristics. The point is that HTML 
allows you to mark up all these characteristics, augmented with the class 
attribute if you need it to control the styling more carefully. It doesn't 
generally prioritise one medium over another.


> Besides, people who use speech browsers on a daily basis (as opposite to 
> developers who use them for testing) have told me that nuances are 
> inevitably lost - the speech rate is high (it takes time to learn to 
> listen to it, but it's more or less necessary). So even if there is some 
> subtle change, different from those already in established use (for a 
> user base), it hardly gets noticed at all, still less understood as 
> meaning something specific.

Yes, current speech synthesis is unfortunately not as developed as visual 
medium. It makes sense, given the relative population sizes.


> > Well sure, just like some people will keep treating <blockquote> as 
> > meaning "indent" and just like how some people will persist in failing 
> > to understand how scripting APIs work.
> 
> Undoubtedly, and <blockquote> is effectively a lost cause. But the <u> 
> issue (as well as <i> and <b>) is about creating new problems.

I disagree.


> > I think you are confused as to the goals here. The presentational 
> > markup that was <u>,<i>,<b>,<font>,<small>, etc, is gone.
> 
> It hasn't gone anywhere and won't go anywhere, and specs cannot change 
> this.

I was referring to its existence in the spec.


> > However, there are certain use cases that did not yet have elements 
> > yet were important enough to warrant us supporting them.
> 
> Was this based on an analysis of needs for semantic markup, or rather on 
> trying to assign meanings that would be somehow compatible with the 
> default rendering implications? I think the latter.

You are incorrect. <u>, for instance, was only added after rather 
compelling use cases were presented. If your thesis were true, then it 
would have been added long before then. It would have been easy to come up 
with a fake meaning if that had been the goal.


> > By reusing existing elements, we are able to support them without 
> > having to wait for new elements to be implemented.
> 
> Several new elements have been added without such concerns.

Again, you are incorrect. The concerns were very much present.


> Any new semantic elements can be reasonable well used by authors if they 
> add a few CSS rules and use the document.createElement() trick to make 
> even IE style unknown elements.

Were it only so.


> > By doing so in a way that closely matches how those elements were 
> > actually used in practice (at least to the same extent as other 
> > elements have been correctly used in practice) we can not only have 
> > older UAs support these new elements automatically, but we can do so 
> > in a way that does not introduce an undue volume of invalid pages.
> 
> Either <u> means the same as before, or it means something else.

The world is more subtle than this. There is the meaning the specs assign 
an element. There is the meaning authors assign it intentionally. There 
are the meanings that are compatible with what the authors did. With <u>, 
many of the actual uses of the element can be seen as uses of both the old 
presentational meaning and the new media-independent meaning without 
conflict. This is to what I refer to above.


> One does not need to be particularly pessimistic to predict that pages 
> using <!doctype> won't generally meet even the syntactic requirements, 
> still less the semantic definitions. This is something we can well live 
> with, and the requirements on user agents and error processing are 
> supposed to keep things that way. So why worry about some pages not 
> being "valid" in the simple sense of using physical markup that has been 
> defined in HTML (with requirements on browsers to keep supporting it) 
> but declared forbidden?

Design aesthetics.


On Tue, 2 Aug 2011, Henri Sivonen wrote:
> On Fri, 2011-07-29 at 22:39 +0000, Ian Hickson wrote:
> > > 
> > > Presentational markup may convey useful information, for example 
> > > that a quotation from printed matter contains an underlined word.
> > 
> > HTML is the wrong language for this kind of thing.
> 
> I disagree. From time to time, people want to take printed matter an 
> publish it on the Web. In practice, the formats available are PDF and 
> HTML. HTML works more nicely in browsers and for practical purposes 
> works generally better when the person taking printed matter to the Web 
> decides that the exact line breaks and the exact font aren't of 
> importance. They may still consider it of importance to preserve bold, 
> italic and underline and maybe even delegate that preservation to OCR 
> software that has no clue about semantics. (Yes, bold, italic and 
> underline are qualitatively different from line breaks and the exact 
> font even if you could broadly categorize them all as presentational 
> matters.)

This is not a high-priority use case, IMHO. We should prioritise the 
opposite direction, online matter being printed. To prioritise printed 
matter going online would be to look backwards, not forwards.


> > I think you are confused as to the goals here. The presentational 
> > markup that was <u>, <i>, <b>, <font>, <small>, etc, is gone.
> 
> I think the reason why Jukka and others seem to be confused about your 
> goals is that your goals here are literally incredible from the point of 
> view of other people. Even though you've told me f2f what you believe 
> and I want to trust that you are sincere in your belief, I still have a 
> really hard time believing that you believe what you say you believe 
> about the definitions of <b>, <i> and <u>. When after discussing this 
> with you f2f, I still find your position incredible, I think it's not at 
> all strange if other people when reading the spec text interpret your 
> goals inaccurately because your goals don't seem like plausible goals to 
> them.

Well, I respect your inability to believe me, but with all due respect, I 
think that's more your problem than mine. :-)

I don't see what's so hard to believe about a desire to make a language 
medium-neutral. There's nothing magical about the visual medium that makes 
it more important to the Web than any other medium, IMHO.


> If if the word "presentational" carries too much negative baggage, I 
> suggest defining <b>, <i> and <u> as typographic elements on visual 
> media (and distinctive elements on other media) and adjusting the 
> rhetoric that HTML is a semantic markup language to HTML being a mildly 
> semantic markup language that also has common phrase-level typographic 
> features.

The whole point is to have a media-independent language. It doesn't matter 
what you _call_ it.

I would no more think we need an element for "bolder" than I would think 
we need an element for "louder" in speech synthesis or an element for 
"bigger hand gestures" in sign-language interpretation (not that I'm aware 
of a sign-language HTML UA, but there's no fundamental reason one couldn't 
exist in the future). When you start from the fundamental position that 
these media are no more important than each other, it is really hard to 
see why we would ever introduce "phrase-level typographic features".


On Wed, 3 Aug 2011, Silvia Pfeiffer wrote:
> 
> I don't see why we need to throw out the baby with the bathwater. In my 
> mind,, HTML5 is good both for semantic markup (i.e. application 
> development) and for content presentation (i.e. document publication). 
> Some elements serve one purpose better than the other (such as <u>, <b>, 
> <i> being mostly presentational), others serve both purposes equally 
> (like <ul>, <ol>). It's been a mix from the start and both a blessing 
> and a curse. Trying to ignore that history will only give us confused 
> users, not better markup.

I think we can quite confidently say that we're not ignoring the history, 
given how much discussion it has received...


On Sun, 7 Aug 2011, Jukka K. Korpela wrote:
> 
> This isn't about suggesting, this is about reproducing aspects of printed
> material that may be essential.

If they are essential, then it is *even more important* to mark them up in 
a medium-indpedent manner, or "essential" meaning will be lost. At the 
point where we're talking about reproducing "essential" aspects, we've 
crossed into the land where accessibility is a concern.


> It is comparable to making a distinction between lowercase and 
> uppercase, which may be purely presentational or may carry essential 
> information.

Case is not a medium-specific feature, though. Just like the contemporary 
definitions of <u>, <b>, <i>, <em>, etc, it is a media-independent 
feature. In speech, for example, capitalisation is typically expressed via 
different emphasis (very similar to what <em> expresses in speech 
conversation when capitalisation isn't appropriate, in fact). In some 
cases it is purely presentational (e.g. if one's headings are in 
all-caps); for those cases IMHO it would be better (more correct) to use 
normal mixed caps in the markup and CSS to change the formatting.


> The case distinction can be made by the simple choice of letters at the 
> character level, or it may be delegated to CSS if it is regarded as 
> purely presentational. For bolding etc., the character-level alternative 
> does not exist or it is highly impractical [...]

It depends why you are bolding, but generally, I would say it isn't 
impractical at all, let alone "highly" impractical.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 23 January 2012 15:18:28 UTC