HTML5 Structural elements, ARIA and the user experience from Ian Sharpe on 2013-04-04 (w3c-wai-ig@w3.org from April to June 2013)

From: Ian Sharpe <themanxsharpy@gmail.com>
Date: Thu, 4 Apr 2013 10:55:39 +0100
To: "'Steve Faulkner'" <faulkner.steve@gmail.com>, "'Ian Hickson'" <ian@hixie.ch>
Cc: "'Steve Green'" <steve.green@testpartners.co.uk>, 'Léonie Watson' <tink@tink.co.uk>, <w3c-wai-ig@w3.org>
Message-ID: <A2DC5627C3C74191B125C5BBE1B6C298@sharpyPC>
 Inspired by the recent thread concerning the use of markup to identify a
pages main content, I thought it might be helpful to consider an example
explaining how the HTML5 structural elements, ARIA and semantic   markup
might be used to present content in a more effective way to improve the
current user experience for somebody using a screen reader. I am very aware
that blind people are not the only ones who experience problems using the
web but addressing issues for screen reader users is also likely to improve
accessibility for many other groups as well. It's also the area in which I
personally have most experience and for the purposes of this email, i want
to focus on one use case. 
 
Note that I am specifically thinking about the user experience here, not
accessibility although the two are clearly related. In particular, I'm
interested to understand how the existing guidelines and markup in
conjunction with user agents and assistive technology might be used to
provide the user experience described below.
 
I'm not particularly concerned with whether user agents or assistive
technology actually implement this functionality today, just what are
people's views on the viability, achieveability and reasonability of the
desired user experience if all current guidelines and recommendations were
adopted / implemented appropriately.
 
I did look for an example as you will see below but struggled to find
anything which I felt met these criteria.
 
This email is a little longer than I initially intended so please bare with
me as I hope this will be a useful exercise.
 
Consider a web page which displays a blog post and associated comments,
social bookmarks and advertising such as:
 
http://www.netmagazine.com/features/truth-about-structuring-html5-page
 
The irony of using this particular site here is not lost on me but it was
the first search result I came across when googling for an example using:
 
"how to mark up a blog using the new html5 structural elements"
 
Note that I'm not specifically interested in the content of this post here,
just the structure. In this context it seemed a good example so I've used it
but it made me smile anyway. 
 
Apart from maybe the absence of a section containing social bookmarks before
the start of the blog post, I hope we can generally agree that the logical
structure of this blog post is not unusual and found widely across the web.
 
As a screen reader user myself, these kind of pages can be pretty cumbersome
to use. They typically involve having to skip over the header, site
navigation, page navigation, search section as well as through a section
containing social networking links inside iframes before I reach the start
of the article. 
 
In this sense, this blog is actually a very bad example as I can go directly
to the start of the article by hitting the move to next heading hotkey which
takes me to the first heading on the page and the content follows
immediately after. 
 
However, usually, this would involve upwards of 5 hotkey presses and
sometimes many more. The following will hopefully clarify this as I describe
my own experience using a screen reader to read another blog post at:
 
http://odetocode.com/blogs/scott/archive/2013/02/18/starting-with-angularjs.
aspx
 
The first time I visited the page I hit the hotkey to move to the first
heading and went straight to the title of the blog post. So far so good.
 
 I hit the same key again hoping to find the introduction or beginning of a
subsection and was taken directly to the comments section. 
 
OK, I've missed the post and my confidence in how the blog post itself is
structured has been dinted a little. I go back to the first heading again
and start arrowing down to try and find the start of the article. 
 
It takes me another 18 down arrow presses until I find the start of the
article. My screen reader does provide a skip blocks of links key which I
could have used insteade of arrowing down but I have found that using the
down arrow is more reliable and typically quicker because I don't have to
skip back and forth to make sure I'm at the start and haven't missed
anything.
 
So far that's taken me 22 hotkey presses, knowledge of how the content might
be presented, as well as 5 years experience using a screen reader to help me
get to where I want to be.
 
Having read an article, I am then often interested to read some of the
comments relating to the post. I'm generally not interested in who made the
comment, when it was made, their gravitar, website link etc at this stage
and just want to read the comments. I may want to find out who made the
comment and when after reading the comment but not before.
 
This particular blog is actually not too bad in this regard as the author,
their gravitar and the  date are all contained in a single block so I can
use the move to the next block hotkey to quickly skip this information and
read the comment. 
 
This is not the case in the first example however which involves several
more keypresses to find the next comment after reading each comment.
 
In both cases, I need to continuously use hotkeys to skip over the author,
gravitar, date etc which  is very tedious, particularly if there are many
comments, particularly if the structure from the first example above is
used. 
 
I could use continual reading mode to avoid the keypresses but it takes ages
to get through the comments using this method and one soon loses the will to
live. 
 
So while it is clearly possible to access the information contained in these
blog posts, the experience is not what I would call particularly user
friendly.
 
In an Ideal World
 
In this particular scenario, what I would like to happen and feel isn't too
unreasonable or unrealistic to expect could be achieved,is as follows:
 
1. I want to be able to move directly to the start of the blog post when the
page loads and for my screen reader to start reading the post from the
beginning of the article.
 
2. I want to be able to identify and skip any obtrusive adverts or
irrelevant content and be confident that I'm not missing important content
using a single hotkey.
 
3. I want to be able to quickly skip through and hear the comments
associated with the post without having to hit hotkeys many times or listen
through lots of information in which I am not interested.
 
4. I want to be able to perform the above tasks in a reliable and consistant
way, giving me confidence that I have not missed any important information.
 
The first example above goes some way to meeting these objectives using an
appropriate heading structure which works very well.
 
Obviously in itself, this can't achieve point 4 above but if all blogs used
a similar structure, my life and suspect many others as well, would be
significantly simplified.
 
It does not however address point 3 above as there is no obvious or quick
way to find and read each comment. 
 
The second blog post is more difficult to read and navigate due to the lack
of any structure within the post itself, but goes some way to making it a
little less tedious to read through comments.
 
The new HTML5 structural elements
 
I read through the W3c wiki on the new HTML5 structural elements:
 
http://www.w3.org/wiki/HTML_structural_elements
 
This uses the blog post example to show how these new elements could be used
to more appropriately reflect the semantic nature of a blog post rather than
using HTML4 elements.
 
Based on this article, one possible implementation of the logical structure
of the second example above might be:
 
<section role="main">
<article id="blog post">
<section id="social stuff"/>
<section id="article contents"/>
<section id="comments">
 <section id="comment block">
  <section id="comment author, website, email, gravitar etc"/>
  <section id="comment"/>
 </section>
</section>
</article>
</section>
 
However, I cannot currently see how this alone, or any variation using the
new HTML5 elements for that matter, is going to make any real difference in
terms of my user experience.
 
I will still need to hit hotkeys to skip over the social section, find the
comments section and then skip blocks containing information in which I am
not immediately interested in order to read comments. Yes the number of
hotkeys I need to press would be significantly reduced which is helpfulbut
it's still not great. 
 
ARIA
 
One possible enhancement could be to apply the generic role="region" to each
comment which would make it possible for me to use the next landmark hotkey
to quickly skip to the next comment.
 
While this is helpful, it still falls short of my objectives:
 
1) It relies on the user knowing about landmarks and using them as the
primary means for navigation on the assumption that the page author has
implemented them sensibly. In the long term, I suspect screen reader users
will start to use landmarks to navigate web pages, but currently headings
are still the primary mechanism used by screen reader users for navigating a
page.
 
2) It still doesn't make it possible to enable me to just listen to the blog
in continuous reading mode without having to listen to a lot of unnecessary
information. The markup just doesn't contain enough semantic information.
 
The solution
 
One solution might be to structure the markup in such a way that the blog
post content and all comments flow after each other in the page markup.
Styling could be used to position the less interesting header and social
content visually where it makes sense for non-screen reader users, but out
of the main flow.
 
While this should work in theory, I'm less convinced of the viability of
this approach and there's still the issue of how a screen reader user would
then be able to find out who wrote a particular comment and when, if they
did want to know this information having just read a comment.
 
Another approach might be to build upon the first example above by wrapping
each comment in a aria region to identify each comment for example. In
conjunction with the use of headings to give structure, this approach would
seem to provide the closest user experience to that stated above. But it's
still not there.
 
Apart from these ideas, I'm struggling to think of a way of achieving my
desired user experience using existing techniques.
 
I'd be very interested to hear other people's thoughts on any of the above
or suggestions on how I might be able to achieve my stated user experience
using any combination of HTML, ARIA, user agent and assistive technology in
a sensible, reliable and consistant way.
 
In particular, do people feel the objectives outlined above are unreasonable
in this specific use case? Am I expecting too much? 
 
Actually, the simplest way I can think of currently to achieve this is based
on comments made on another thread but would require a modification to ARIA
or some other specification.
 
In short, provide a mechanism to enable web authors to identify content as
important.
 
Assistive technology could then provide a mechanism to either read through
the important content on a web page or skip to the next block of important
content.
 
This could be used with ARIA landmarks in a similar way to the use of
important in CSS for example. ie:
 
<section role="region!important">
 
I obviously do appreciate that "important" is a rather subjective term and
what is important to me is not necessarily going to be what other people
might think of as being important to them. I did spend a little time
thinking about whether I should even use this term on that basis.
 
However, I decided to stay with important as I think that for the most part,
authors of content should be able to make a reasonable stab at what they
feel their users are most likely to be interested in reading which should
cover the majority of visitors. 
 
A slightly less reliable, yet possibly more realistic, approach might be for
either user agents or assistive technology to establish some rules which
could potentially go some way to automate the significance of the content
contained within a section. For example, if the section contained several
descrete small pieces of information (one or two words maybe) then it is
likely that the content provides social bookmarks or header information for
a comment etc and so could be deemed "less important".
 
A feature could be provided which enabled these "less important" sections to
be ignored when using continuous reading mode.
 
The user could then decide whether they wanted this feature enabled or not
and would still obviously be able to review the content manually.   
 
Clearly this isn't going to be as reliable as an author designating content
to be "important" but maybe with improvements to content analysis this might
be entirely viable. I believe there are already systems which claim to be
able to identify the "main" content on a page today. 
 
Finally, I've used what I believe to be a very common yet simple use case
above of a blog post for the purpose of this discussion. However, it's worth
pointing out that the user experience i describe above is far worse when
reading through online forums or mail archives which while accessible, can
be unbelievably tedious and time consuming to use with a screen reader
alone. I'm sure many people are already aware of this but it doesn't appear
to be often said, certainly not in terms of the user experience anyway.
 
Any thoughts welcome.
 
Cheers
Ian
Received on Thursday, 4 April 2013 09:56:15 UTC