Re: HTML5 script start tag should select appropriate content model according to src

On Apr 23, 2007, at 23:22, Patrick H. Lauke wrote:

> Henri Sivonen wrote:
>
>> It's the same thing on other visual media, including screens, when  
>> the semantics are presented by italicizing. It's not like J.  
>> Random reader views source to see if a given run of text was  
>> marked up as <i>, <em>, <cite>, <dfn> or <var>.
>
> ...
>
>> If the UA doesn't present the distinctions to the reader, marking up
>> semantics is useless as far as the human reader is concerned.
>
> So, it's a shortcoming in user agent support.

Only if you assume that people reading from screens need more  
disambiguation than people reading from paper.

> Moving beyond the visual, screen readers for instance can  
> (depending on settings) differentiate between <i> and <em>, and  
> treat them differently (the latter resulting in a change of volume  
> and/or inflection of the spoken output).

I don't have personal experience to comment on that, but I wasn't  
surprised about T.V Raman having the same rule for both <i> and <em>:
http://lists.w3.org/Archives/Public/public-html/2007JanMar/0668.html

After all, there are a lot of notable creation tools that map italics  
to <em>. (See below.)

>> It isn't particularly useful to try to make moral right/wrong  
>> arguments about the behavior of Web authors on the aggregate. To  
>> get the masses do something, there need to be good incentives.  
>> There's no point in bearing the cost of marking everything up  
>> diligently if there isn't a payoff that is reasonable compared to  
>> the cost.
>> Honestly, I can't make the case to my mother why she should bother  
>> to mark up anything as <cite> instead of just pressing command-i  
>> in Dreamweaver.
>
> The masses will use authoring tools/environments. As long as those  
> tools offer access to <i>, but not to the more semantic  
> alternatives, it's obviously futile to expect the masses to use  
> more appropriate markup.

Dreamweaver MX, by default, maps command-i to <em>. I guess it makes  
the output "more semantic" to some. Of course, as far as markup  
consumers are concerned, it makes sense to treat <em> as an alias for  
<i>.

> The payoff is the usual chicken and egg conundrum: tools to further  
> extract and manipulate semantic data can be built right here, right  
> now, but until a sizeable amount of web content out there is  
> actually semantic, they won't be built...and vice versa.

Yes, but some of the theoretical use cases aren't just realistic for  
productization anyway and in some cases heuristics would work more  
often.

> This is the argument you hear from AT manufacturers when they say  
> that their tools rely on heuristics for many things, rather than  
> structural markup.

The key question is whether the heuristics work well enough and what  
the marginal benefit of more authoring effort would be. I don't know  
if they work well enough. But if they do, what's the problem?

>> It isn't the same. Headings are more common than e.g. taxonomical  
>> names and are related to things like intra-document navigation  
>> using outlines, etc. Therefore, it is quite reasonable to include  
>> markup for headings but leave markup for taxonomical names on the  
>> other side of the cutoff.
>
> Hmmm...sounds like we may need a markup language that is  
> extensible, as the cutoff point may be different for different  
> audiences/purposes.

HTML5 allows the class attribute to be used for communicating  
granular semantics within special-interest communities.

>> No, don't ask them. See what they actually do. In the latter case,  
>> the is actually an HTML element (<dfn>), so the usage frequency  
>> could be measured.
>
> Is <dfn> readily and clearly available in authoring environments?

In OpenOffice.org Writer/Web it is, for example. Not as conveniently  
as command-i, though, of course.

Anyway, <dfn> has been available in the HTML spec for years so  
technical writers who see its value could use it if they cared to.

According to a survey of several billion pages done at Google in  
September, <i> is used on 178 times larger number of  pages than  
<dfn> and <em> is used on 80 times larger number of pages than <dfn>.  
Curiously, <dfn> is used on a larger number of pages than <var>, even  
though e.g. Nvu make only the latter available in the UI as far as I  
can tell. But the most interesting statistic is that <dfn> is used on  
fewer pages than <zeroboard> (don't ask), <st1:place> (no clue) and  
<ilayer>!

>> Even though the editor of the spec may mine this mailing list for  
>> feedback from time to time and even though Lachy and I are now  
>> engaging in this thread, posting to the WHATWG list is still a  
>> better way to get heard.
>
> The fundamental discussions around whether or not <i>, <b>, <sub>,  
> <sup> etc are presentational or not have been going around for  
> years...not just in this particular thread.

Since 1993 if not 1992. From the 1993 IIIR draft:

          This text contains an <em>emphasized</em> word.
          <strong>Don't assume</strong> that it will be italic!
          It was made using the <CODE>EM</CODE> element. A citation is
          typically italic and has no formal necessary structure:
          <cite>Moby Dick</cite> is a book title.

http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Tuesday, 24 April 2007 08:34:27 UTC