W3C home > Mailing lists > Public > www-html@w3.org > April 2007

Re: HTML5 script start tag should select appropriate content model according to src

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 23 Apr 2007 22:52:14 +0300
Message-Id: <FFB0099E-6EBF-4E53-9254-9D6D67C2375A@iki.fi>
Cc: XHTML-Liste <www-html@w3.org>
To: Patrick H.Lauke <redux@splintered.co.uk>

On Apr 23, 2007, at 20:06, Patrick H. Lauke wrote:

> Henri Sivonen wrote:

> The use of italics to denote certain "special" things (names of  
> ships etc) comes from a print tradition. In print, there is no  
> other way to "mark" something up than to use some visual,  
> presentational signal. So yes, on paper, italics denote that  
> there's something special going on with those words.

It's the same thing on other visual media, including screens, when  
the semantics are presented by italicizing. It's not like J. Random  
reader views source to see if a given run of text was marked up as  
<i>, <em>, <cite>, <dfn> or <var>.

> Now, in machine-parseable languages like HTML (whichever number),  
> you don't have to rely on a purely visual way to denote what  
> something is.

If the UA doesn't present the distinctions to the reader, marking up  
semantics is useless as far as the human reader is concerned.

> You have a far more unambiguous way to denote things with markup.  
> However, the necessary markup is not present in HTML at the moment,  
> so content authors (mostly coming from print tradition) use the  
> thing that is closest to their experience...the <i> element. Again,  
> this does not make them right.

It isn't particularly useful to try to make moral right/wrong  
arguments about the behavior of Web authors on the aggregate. To get  
the masses do something, there need to be good incentives. There's no  
point in bearing the cost of marking everything up diligently if  
there isn't a payoff that is reasonable compared to the cost.

Honestly, I can't make the case to my mother why she should bother to  
mark up anything as <cite> instead of just pressing command-i in  

> Imagine if HTML didn't contain H1-H6 elements...would you argue  
> that <font size="+3"> carries meaning because bigger text is a  
> heading? Same argument here!

It isn't the same. Headings are more common than e.g. taxonomical  
names and are related to things like intra-document navigation using  
outlines, etc. Therefore, it is quite reasonable to include markup  
for headings but leave markup for taxonomical names on the other side  
of the cutoff.

>> Semantics in and of themselves are not interesting unless they  
>> address problems posed by real use cases.
> Automatic aggregation of content, possibility of tools such as  
> screen readers and similar assistive technology to understand the  
> different semantics and provide their users with better  
> information, etc.


>> If you've got all conceivable media covered, what would you use  
>> the semantics for?
> Because your "all conceivable media" still doesn't cover user  
> choice and user control over the content.

Right. My bad. Let's try again: if the spec gives reasonable default  
presentation for a given element for all conceivable media, it isn't  
necessary to nail down the exact semantics of the element further  
than saying that it is for stuff for which the default presentations  
are acceptable. This may preclude theoretically interesting  
processing such as "extract all biological taxonomical names", but on  
the scale of the Web, it isn't feasible for a general purpose spec to  
cater for such a specialized use case. Moreover, if you are doing  
data mining for let's say Google Biologist, chances are that  
heuristic methods that do not rely on the cooperation of authors will  
work better for Web content in general.

>> Do you have realistic data mining use cases in mind where the  
>> content producers would have the incentive to help the data miner  
>> and not lie?
> Leave your little "they just want to use it to boost their search  
> engine ranking" dig out. Think of a library/archive resource that  
> wants to offer smart access to its contents to users.

I've pondered this stuff as my job in an archival organization (the  
National Archives of Finland). The reality of what kind of stuff  
archives have to ingest and the theory that semantic markup advocates  
tell don't match at all.

>> To sprinkle disguising semantic pixie dust to sooth the concerns  
>> of anti-presentationalists, I guess.
> Ask a biologist if they'd rather say "just make it italic" or "this  
> is an animal genus", or whether a technical writer would rather say  
> "this is italic" or "this is the defining instance of this  
> term"...you simply assume that all authors don't give a damn about  
> semantics, without proof.

No, don't ask them. See what they actually do. In the latter case,  
the is actually an HTML element (<dfn>), so the usage frequency could  
be measured.

> For him, a generic span with italics styling via CSS would be most  
> appropriate.

Why on earth would <span> plus CSS be any better than <i>?

>> How do you expect the spec to have been shaped to your liking  
>> without you participating in the process on the WHATWG list?
> The usual "if you don't like it, join the list" gambit.

Please note the context to which that was a reply.

> When shaping a supposed standard, should the standards body  
> (official or not) look at the community at large, and gather  
> requirements there, or should the community make sure that it's  
> involved in the standards process?

Yes, the spec development group should look at the community at  
large. The WHATWG has especially strived to do some. (Curiously, when  
discussing the move to the new HTML WG, it has been suggested that  
this shouldn't be done due to patent concerns!)

Even though the editor of the spec may mine this mailing list for  
feedback from time to time and even though Lachy and I are now  
engaging in this thread, posting to the WHATWG list is still a better  
way to get heard.

Henri Sivonen
Received on Monday, 23 April 2007 19:52:27 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 30 April 2020 16:21:02 UTC