[whatwg] sic element from Jukka K. Korpela on 2011-07-30 (public-whatwg-archive@w3.org from July 2011)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Sat, 30 Jul 2011 16:58:19 +0300
Message-ID: <4E340DFB.8010206@cs.tut.fi>
30.07.2011 01:39, Ian Hickson wrote:

> On Sat, 30 Jul 2011, Jukka K. Korpela wrote:
>> 29.07.2011 23:56, Ian Hickson wrote:
>>>>
>>>> Anyway, aren't you saying that<u>  says "this text is annotated but
>>>> no annotation is given"? In that case, saying that<u>  draws
>>>> attention to its content might be more appropriate.
>>>
>>> The physical line is an annotation. It's just not articulated.
>>
>> I think you are using the word "annotation" to mean something different
>> from "a note added by way of comment or explanation"
>> (http://www.merriam-webster.com/dictionary/annotation). What would an
>> annotation be without a note - without a comment or explanation?
>
> I was using it in the sense used in Wikipedia:
>
>     "An annotation is a note that is made while reading any form of text.
>     This can be as simple as underlined or highlighted passages."
>      -- http://en.wikipedia.org/wiki/Annotation
>
> If you have a word that better conveys this meaning, I'd be happy to
> use that instead.

I don't think Wikipedia can be relied on as indicative of meanings of 
words. International specifications should use English words in their 
meanings as described in dictionaries (or use them as technical terms 
for which exact definitions are given).

Since <u> as such lacks any comment or explanation, I think the wording 
"draws attention to" describes the intended meaning.

>> There's nothing to be gained from adopting this new semantics for <u>.
>
> Sure there is. Device independence, for one.

I mentioned that there is no obvious, intuitively understandable way of 
rendering <u>. This includes all devices. Maybe it's device independent, 
but only in the sense that there is no natural implementation for any 
device - but of course visual browsers will keep underlining it, but 
this just means that the real meaning is "underline".

>> Presentational markup may convey useful information, for example that a
>> quotation from printed matter contains an underlined word.
>
> HTML is the wrong language for this kind of thing.

Such things have been possible in HTML since the beginning. What's the 
tangible benefit of telling authors that they must not do so?

>> Underlining also has specialized usage e.g. in some transliteration
>> systems.
>
> Could you elaborate on this?

For example, a few transliteration systems use underlining of letters or 
letter pairs, see http://transliteration.eki.ee/pdf/Arabic_2.2.pdf
(and before saying that combining underline characters be used instead, 
please try to see whether it actually works and how convenient it is).

> Inflexion changes, pauses, changes in timbre, pitch, speed, and other
> aspects of one's voice, are the ways all meaning is conveyed in aural
> conversations.

All meaning? Hardly. Sounds matter, too. But the point is that no such 
method is intuitively recognized as meaning what you want <u> to mean. 
Besides, people who use speech browsers on a daily basis (as opposite to 
developers who use them for testing) have told me that nuances are 
inevitably lost - the speech rate is high (it takes time to learn to 
listen to it, but it's more or less necessary). So even if there is some 
subtle change, different from those already in established use (for a 
user base), it hardly gets noticed at all, still less understood as 
meaning something specific.

> Well sure, just like some people will keep treating <blockquote>  as
> meaning "indent" and just like how some people will persist in failing to
> understand how scripting APIs work.

Undoubtedly, and <blockquote> is effectively a lost cause. But the <u> 
issue (as well as <i> and <b>) is about creating new problems.

> I think you are confused as to the goals here. The presentational markup
> that was <u>,<i>,<b>,<font>,<small>, etc, is gone.

It hasn't gone anywhere and won't go anywhere, and specs cannot change 
this. The reason why <font> has become less common has little to do with 
its "deprecation" - rather, authors have moved to using CSS because it's 
more powerful and more convenient to use in most cases. The simple 
elements like <u> have clumsy CSS counterparts

> However, there are
> certain use cases that did not yet have elements yet were important enough
> to warrant us supporting them.

Was this based on an analysis of needs for semantic markup, or rather on 
trying to assign meanings that would be somehow compatible with the 
default rendering implications? I think the latter. And this means that 
some semantic needs were more or less _invented_ rather than recognized.

> By reusing existing elements, we are able
> to support them without having to wait for new elements to be implemented.

Several new elements have been added without such concerns. Any new 
semantic elements can be reasonable well used by authors if they add a 
few CSS rules and use the document.createElement() trick to make even IE 
style unknown elements.

> By doing so in a way that closely matches how those elements were actually
> used in practice (at least to the same extent as other elements have been
> correctly used in practice) we can not only have older UAs support these
> new elements automatically, but we can do so in a way that does not
> introduce an undue volume of invalid pages.

Either <u> means the same as before, or it means something else. You 
seem to be saying that it means almost the same, but is the difference 
between physical markup and semantic markup really that small?

Your last statement suggests that you really want to call existing usage 
of <b> as "valid" without really caring about how it is used (as 
physical markup or something else). All this "validity" issue is rather 
artificial, since the vast majority of web pages aren't valid even under 
the fairly trivial (pure-syntax, and even just the DTD-definable part 
thereof) meaning for "valid".

One does not need to be particularly pessimistic to predict that pages 
using <!doctype> won't generally meet even the syntactic requirements, 
still less the semantic definitions. This is something we can well live 
with, and the requirements on user agents and error processing are 
supposed to keep things that way. So why worry about some pages not 
being "valid" in the simple sense of using physical markup that has been 
defined in HTML (with requirements on browsers to keep supporting it) 
but declared forbidden?

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/
Received on Saturday, 30 July 2011 06:58:19 UTC