W3C home > Mailing lists > Public > www-html@w3.org > March 2006

Re: XHTML 2.0 - dfn : Content model and usability (PR#7832)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 27 Mar 2006 20:21:20 +0000 (UTC)
To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>
Cc: "'www-html@w3.org'" <www-html@w3.org>
Message-ID: <Pine.LNX.4.62.0603271858020.315@dhalsim.dreamhost.com>

On Sat, 25 Mar 2006, Jukka K. Korpela wrote:
> > 
> > According to my studies it's used in around 0.1% of the Web's pages. 
> > One in every thousand pages isn't bad, given how few pages could be 
> > expected to be defining terms; In particular, it's used more than 
> > <ins>, <del>, <var>, <samp>, <bdo>, etc.
> 
> I can't argue with your statistics - the Google analysis 
> http://code.google.com/webstats/2005-12/elements.html does not cover the 
> <dfn> element.

The 0.1% number comes from the data that also produced the Google 
analysis.


> Assuming that the figure 0.1% is representative, is it small or large as 
> compared with the expected frequency of pages that actually contain 
> definitions of terms?

I don't have data on that frequency. I would expect that it is low, 
though.


> After all, what matters - for purposes like developing browsers and 
> search engines - is the probability that you can actually locate 
> defining occurrences by looking at markup for them (at present, <dfn> 
> and <dt>). Even if you get a large amount of information that way, is it 
> enough if it is just a small fraction of pages that actually define 
> things?

Most of the Web is presentational. You can't use _any_ of HTML's semantics 
to unambiguously get data out of the Web in the manner you describe.


On Sat, 25 Mar 2006, Jukka K. Korpela wrote:
> > > 
> > > How is the reader expected to know whether italics is used in 
> > > printed matter to indicate a defining occurrence, or to emphasize, 
> > > or to indicate
> > 
> > The reality is that, in general they do,
> 
> I'm afraid that's wishful thinking. Anything that can be understood in 
> two or more ways will be understood in the wrongest way.

This seems like a reason to provide a way to unambiguously mark such spans 
of text, rather than requiring authors to use one element for all these 
cases. That way, at least there is a way to disambiguate if necessary.


> If browsers used _different_ default styling for <dfn>, <cite>, and 
> <var>, the message would be much clearer, and authors might have been 
> more interested in using such markup.

Authors can set different styles in a stylesheet.


> > <dfn> etc., give the potential for machine processing
> 
> But it has not been used.

It's been used on millions of pages.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 27 March 2006 20:21:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:16:05 GMT