HTML5 Standard pieces of information should be automation-friendly

Hi! The following has been added to the wiki at http://www.w3.org/WAI/UA/work/wiki/HTML5_review_by_UAWG_notes#Standard_pieces_of_information_should_be_automation-friendly:


    Standard pieces of information should be automation-friendly

As a general rule, when it's common for pieces of content to serve a conventional purpose, the markup should allow this purpose to be identified in an unambiguous and automation-friendly way. This allows user agents and assistive technology to make use of it to provide users with advanced navigation and customization features. HTML5 does this in many areas, many more in fact than HTML4, but there are still common purposes that are not yet addressed.

For example, in order to let people find and navigate content using a wider range of mechanisms, markup languages could provide a way to add tags (also called keywords) to any or almost any content elements. Currently keyword tagging is supported at the HTML document level, but not at smaller granularities. As many web sites support content tagging on images, posts, or articles, and HTML5 puts a lot of effort into facilitating aggregation of these, the new feature would also be an example of providing a standardized way for content to expose information that is currently handled in an ad hoc or site-specific fashion, thus allowing user agents and assistive technology to make use of it in novel ways.

*Use case:* Nadia uses a screen reader to explore a blogger's web site. The page shows the first paragraph of each recent post, each in an HTML5 article element, and each includes the title, poster, posting date, keyword tags, and a link to the full article. Because scanning the entire page searching for articles that match her criteria is much more difficult and time consuming for her than for most users, Nadia uses a browser add-in that lets her navigate to the most recent article before or after a given date. Each article's title and publication date are marked up in automation-friendly ways, as <article><header><h1>The Very First Rule of Life</h1><header><p><time pubdate datetime="2009-10-09T14:28-08:00">..., allowing user agents and accessibility aids to use this information for navigation, filtering, color-coding, etc. However, HTML5 defines no standard way to identify the content that represents the poster's identity or keyword tags, so the author has to put those in a 
plain text. Because of this, her add-in cannot let quickly navigate to or between articles that are posted by a particular person or that are tagged with specific keywords.

*Use case:* Marge is browsing a web page with a hundred images, but she wants to find the one of a sailboat. While many users can make a quick visual scan of each screenful of content, paging down until they find the correct image, this is much more difficult for her, so she activates an add-in for her browser, selects the check box for "Images", and a list box is populated with the keywords for all the images on the page. She types the first characters of "boats" to move the focus to this entry, presses Space to select it, and presses Enter, at which point the extension temporarily hides all the content except the two images that have the boat keywords.

*Use case:* Randy is easily distracted so he uses a browser add-in to filter each page, hiding information that it can recognize as not relevant to his current task. Using standard markup it can, for example, hide all content on a page excepting articles that meet certain criteria.

*Use case: *In order to help him understand and navigate a collection of HTML documents, Joshua runs a tool that generates and presents him with an index and table of contents. If the author has no way to recommend keywords and phrases for portions of content, the functionality can only provide a simplistic guess at what are the primary topic of each section, but if the author can explicitly associate key words or phrases with headings, tables, and even paragraphs, the tool can do a much better job. This can also provide tools such as search engines to data and hints to work with.

*Recommendation:* HTML5 should allow a keywords or tag element to be added to all or nearly all elements. This could be a set of space-separated tokens, and although this precludes including spaces in the keywords or key phrases, it would allow it to be processed in standard ways already used by other attributes. For example:

    * <img *keywords="flags swan union-jack"* src="western-australia.png" alt="Flag of Western Australia">.

    * <article *keywords="computers apple news"*...>

Alternatively, a keywords element could be defined that could be associated with another element, either by referencing an element ID or by surrounding content with which it's associated.

*Recommendation:* HTML5 should define a text-level author element that would identify its content as identifying the author of the associated content. The typical use would be in cases like <article><header><h1>The Very First Rule of Life</h1><header><p><time pubdate datetime="2009-10-09T14:28-08:00"><p>Posted by: *<author>*<a href="/users/gwashington">George Washington</a>*</author>*...

Received on Thursday, 28 July 2011 15:19:02 UTC