W3C home > Mailing lists > Public > public-html@w3.org > October 2012

maincontent extension - data on use of id=main and id=content in web pages

From: Steve Faulkner <faulkner.steve@gmail.com>
Date: Wed, 17 Oct 2012 01:47:00 +0200
Message-ID: <CA+ri+VndtQWCav0UZt7pp4YUT6GO4CwBh7J1wHMnc=FBGYy42w@mail.gmail.com>
To: HTMLWG WG <public-html@w3.org>
Hi all,

In the process of developing the <maincontent> element spec [1] I looked at
data from a number of sources [3] on frequency of usage  of id values to
indicate the main content area of a web page.

I  also used data [2] I gathered in April 2012 based on a URL list of the
top 10,000 most popular web sites.

In preparing the data [2] I subsetted the total usable HTML documents
(approx 8900 pages - the home pages for sites in the top 10,000 URLs list )
by searching for the use of the HTML5 doctype (approx 1545 pages). I
figured that documents using the HTML5 doctype would provide the freshest
code.


What is apparent from the home page data in the sample:
*  use of a descriptive id to value to identify the main content area of a
web page is common.
(id="main"|id="content"|id="maincontent"|id="content-main"|id="main-content"
used on 39% of the pages in the sample [2])

 * There is a strong correlation between use of role='main' on an element
with id values of 'content' or 'main' or permutations. (when used = 101
pages)  77% were on an element with id values of 'content' or 'main' or
permutations.
* There is a strong correlation between use of id values of 'content' or
'main' or permutations as targets for 'skip to content'/'skip to main
content' links (when used = 67 pages) 78% of skip link targets # were
elements with id values of 'content' or 'main' or permutations.
* There appears to be a strong correlation in the identification of content
areas (with id values of 'content' or 'main' or permutations.) as what is
described in the spec as appropriate content to be contained with a
<maincontent> element [1]:

"The maincontent element
represents<http://dev.w3.org/html5/spec/rendering.html#represents>the
main
content section of the body of a document or application. The main content
section consists of content that is directly related to or expands upon the
central topic of a document or central functionality of an application.
...
The main content section of a document includes content that is unique to
that document and excludes content that is repeated across a set of
documents such as site navigation links, copyright information, site logos
and banners and search forms (unless the document or applications main
function is that of a search form)."

I have prepared approx 440 sample pages [4] from the same URL set with CSS
to outline and identify use of container elements with id values of
'content' and/or 'main' and role=main, these samples can be used to
visually assess how closely the spec text matches the reality of element
usage with the stated id values.

The first link in each list item links to the original page the second link
prefixed with "copy" is the same page with the CSS added.
http://www.html5accessibility.com/tests/HTML5-main-content/



[1]
https://dvcs.w3.org/hg/html-extensions/raw-file/tip/maincontent/index.html

[2]
http://www.paciellogroup.com/blog/2012/04/html5-accessibility-chops-data-for-the-masses/

[3] http://triin.net/2006/06/12/CSS#figure-34,
http://westciv.typepad.com/dog_or_higher/2005/11/real_world_sema.html,
http://dev.opera.com/articles/view/mama-common-attributes/#id
-- 
with regards

Steve Faulkner
Technical Director - TPG

www.paciellogroup.com | www.HTML5accessibility.com |
www.twitter.com/stevefaulkner
HTML5: Techniques for providing useful text alternatives -
dev.w3.org/html5/alt-techniques/
Web Accessibility Toolbar - www.paciellogroup.com/resources/wat-ie-about.html
Received on Tuesday, 16 October 2012 23:48:08 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:39:35 UTC