Current HTML ruby markup usage

Given the recent work on adding ruby markup to HTML5, I thought I'd look 
at how it's currently used (mainly to (hopefully) confirm the data Hixie 
based the design on), so I've put some examples up at 
<>, extracted 
from all the sites using ruby elements from a random sample of 130K 
pages from

I've not looked in any detail, but some simple observations:

* Ruby is used on about 0.3% of the 3286 .jp pages in my sample. (For 
comparison, <acronym> and <sub> and <csaction> are each used on about 
0.3% of all (not just .jp) pages.) (I only have 315 .cn pages so I don't 
have enough data to compare.)

* End tags are often omitted. (Corollary: this is not XHTML.)

* <ruby>, <rb>, <rt> and <rp> are frequently used. <rbc> and <rtc> don't 
come up at all, but my sample is too small to know how rare they are.

* Ruby elements are sometimes used accidentally. (See - <rb></br>)

* <ruby> is always (in this sample) used whenever there are other 
intentionally-used ruby elements.

* The 'lang' attribute is not always to be trusted. (See - <span lang="EN-US" style="font-family: 
新細明體"><font size="4">僅</font></span>)

Philip Taylor

Received on Monday, 26 May 2008 23:40:37 UTC