- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 23 Apr 2007 22:52:14 +0300
- To: Patrick H.Lauke <redux@splintered.co.uk>
- Cc: XHTML-Liste <www-html@w3.org>
On Apr 23, 2007, at 20:06, Patrick H. Lauke wrote: > Henri Sivonen wrote: > The use of italics to denote certain "special" things (names of > ships etc) comes from a print tradition. In print, there is no > other way to "mark" something up than to use some visual, > presentational signal. So yes, on paper, italics denote that > there's something special going on with those words. It's the same thing on other visual media, including screens, when the semantics are presented by italicizing. It's not like J. Random reader views source to see if a given run of text was marked up as <i>, <em>, <cite>, <dfn> or <var>. > Now, in machine-parseable languages like HTML (whichever number), > you don't have to rely on a purely visual way to denote what > something is. If the UA doesn't present the distinctions to the reader, marking up semantics is useless as far as the human reader is concerned. > You have a far more unambiguous way to denote things with markup. > However, the necessary markup is not present in HTML at the moment, > so content authors (mostly coming from print tradition) use the > thing that is closest to their experience...the <i> element. Again, > this does not make them right. It isn't particularly useful to try to make moral right/wrong arguments about the behavior of Web authors on the aggregate. To get the masses do something, there need to be good incentives. There's no point in bearing the cost of marking everything up diligently if there isn't a payoff that is reasonable compared to the cost. Honestly, I can't make the case to my mother why she should bother to mark up anything as <cite> instead of just pressing command-i in Dreamweaver. > Imagine if HTML didn't contain H1-H6 elements...would you argue > that <font size="+3"> carries meaning because bigger text is a > heading? Same argument here! It isn't the same. Headings are more common than e.g. taxonomical names and are related to things like intra-document navigation using outlines, etc. Therefore, it is quite reasonable to include markup for headings but leave markup for taxonomical names on the other side of the cutoff. >> Semantics in and of themselves are not interesting unless they >> address problems posed by real use cases. > > Automatic aggregation of content, possibility of tools such as > screen readers and similar assistive technology to understand the > different semantics and provide their users with better > information, etc. http://lists.w3.org/Archives/Public/public-html/2007JanMar/0644.html http://lists.w3.org/Archives/Public/public-html/2007JanMar/0668.html >> If you've got all conceivable media covered, what would you use >> the semantics for? > > Because your "all conceivable media" still doesn't cover user > choice and user control over the content. Right. My bad. Let's try again: if the spec gives reasonable default presentation for a given element for all conceivable media, it isn't necessary to nail down the exact semantics of the element further than saying that it is for stuff for which the default presentations are acceptable. This may preclude theoretically interesting processing such as "extract all biological taxonomical names", but on the scale of the Web, it isn't feasible for a general purpose spec to cater for such a specialized use case. Moreover, if you are doing data mining for let's say Google Biologist, chances are that heuristic methods that do not rely on the cooperation of authors will work better for Web content in general. >> Do you have realistic data mining use cases in mind where the >> content producers would have the incentive to help the data miner >> and not lie? > > Leave your little "they just want to use it to boost their search > engine ranking" dig out. Think of a library/archive resource that > wants to offer smart access to its contents to users. I've pondered this stuff as my job in an archival organization (the National Archives of Finland). The reality of what kind of stuff archives have to ingest and the theory that semantic markup advocates tell don't match at all. >> To sprinkle disguising semantic pixie dust to sooth the concerns >> of anti-presentationalists, I guess. > > Ask a biologist if they'd rather say "just make it italic" or "this > is an animal genus", or whether a technical writer would rather say > "this is italic" or "this is the defining instance of this > term"...you simply assume that all authors don't give a damn about > semantics, without proof. No, don't ask them. See what they actually do. In the latter case, the is actually an HTML element (<dfn>), so the usage frequency could be measured. > For him, a generic span with italics styling via CSS would be most > appropriate. Why on earth would <span> plus CSS be any better than <i>? >> How do you expect the spec to have been shaped to your liking >> without you participating in the process on the WHATWG list? > > The usual "if you don't like it, join the list" gambit. Please note the context to which that was a reply. > When shaping a supposed standard, should the standards body > (official or not) look at the community at large, and gather > requirements there, or should the community make sure that it's > involved in the standards process? Yes, the spec development group should look at the community at large. The WHATWG has especially strived to do some. (Curiously, when discussing the move to the new HTML WG, it has been suggested that this shouldn't be done due to patent concerns!) Even though the editor of the spec may mine this mailing list for feedback from time to time and even though Lachy and I are now engaging in this thread, posting to the WHATWG list is still a better way to get heard. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Monday, 23 April 2007 19:52:27 UTC