Real world evaluation process from RichardWarren on 2012-02-26 (public-wai-evaltf@w3.org from February 2012)

From: RichardWarren <richard.warren@userite.com>
Date: Sun, 26 Feb 2012 19:49:30 -0000
To: "Eval TF" <public-wai-evaltf@w3.org>
Message-ID: <BBB3DDD8043B420B972EF7CA0ADBBB39@DaddyPC>
Dear TF

At Thursdays conference Eric asked how many of us do evaluations for real. Along with others I contributed a brief summary, but I think it worth providing a fuller explanation. So, for those of you who do not do it for real here is the basic method used by Userite to evaluate real web sites for real clients. <G> . It is a bit long so those of you who do testing for real you can ignore this mail, but any comments are, as always, welcome.

The following describes just the normal testing process. Some sites might require a different approach. We are looking for accessibility issues (non-compliance with WCAG etc.). Our evaluation test is aimed at the needs of the website’s practitioners (engineer, content author and stylist). We start with the assumption that these people (or person) have a good working knowledge of the site’s structure and content. If we find frequently occurring errors we only need to identify a few instances and leave it to the responsible person to ensure that the issue is addressed throughout the entire site.

We normally use two people, one to do the overall testing and one to do the screen-reader test and proof read the final test report. First we set up a computer with two screens. On one screen we have Microsoft Word with our accessibility score card. This card has three columns, 1) WCAG reference number, 2) Guideline/Success criteria text, 3) Comments.   As we work through the evaluation we add comments to the comments column by inserting endnotes. These are organised automatically by Word and provide the basic information we need when writing up the test report. The comment column thus only contains the links to the endnotes. Each comment (endnote) should start with the relevant page title. As we work through the evaluation the comments column gets more full. If a particular success criteria or guideline gets four comments we have to decide if this is a global (common) problem, in which case we stop looking for this issue. If it only affects some areas/departments we might make a note of one area where it is successfully managed to use as an example of good practice. Either way we do not spend time trying to deliver a page by page analysis of the whole site

During the testing process we have in mind four conceptual levels of scoring. Pass, Fail, Near and Not Applicable. If we are not sure about an issue we will use the word Near in the comments column so we can discuss it later and decide whether to consider it a Pass or Fail. In addition there is an extra section at the end of the score card for things that are not specifically WCAG but can affect disabled people (spelling, grammar, language, metadata, very large images (MB), large images in headings or links, poor quality HTML.) 

On the second screen we have Firefox browser with the Jon Gunerson Accessibility tool bar plug-in. We also have other browsers available to use later. 


**Step one is to visit the site and obtain a very brief overview to decide what it looks as though the site is trying to do (sell something, inform groups etc.) we write our initial impression at the top of the score card. Later we will compare that to what the client thinks it is doing (sometimes quite different !!!). We will visit the Home Page, Site Map (ref 2.4.5 a & c), Contact/About, Products/Services landing page/s. Check if the site is using a standard CMS (Joomla, Wordpress, Spearhead etc.) if so we will use a simplified test process. This step should take no more than five minutes. 


**Step two is the detailed look at the site. Here we are looking for a range of issues. There is no particular WCAG order, we are just seeing what we find as we follow particular paths through the site (in other words we mimic what we think is a typical user and see what we find). There will be some random searching to see if there is anything interesting but we will mainly try to use the site for it’s intended purpose. If we find a payment section we go through to the payment point, if they have a bespoke payment (i.e. not PayPal, WorldPay, Webmoney) we use a dummy Visa number and check the error message for accessibility. We also keep an eye out for unusual behaviour such as flashing images (2.3.1 ) instructions that say things like “click on left” or “click on green button”(1.3.3) (1.4.1 b), automatic redirection ((2.2.2 c) and updating (2.2.2 b).

Initially we use the accessibility toolbar to check each page as it loads to see if it has structured headings (1.3.1 a & b), (2.4.1 b) , (2.4.6 – headings) (4.1.2B) and proper links text (2.4.4). 
Then we are looking out for examples of the following: 
1. Consistent navigation (2.4.5 b & d) (2.4.3) (3.2.3) 
2. Do sensible page titles appear above the browser window (2.4.2) 
3. Colour contrast (as a user – we will do the science later) (1.4.3) 
4. Identifying link text in content areas (1.4.1a) 
5. Use of colour in general (1,4.1b) 
6. Can text be enlarged properly (Ctrl +) (1.4.4) 
7. Do the navigation links (including menus) take us where we expect to go (2.4.4) (3.2.4) 
8. If rollover buttons – do they change size (3.2.1 a), if pull-down menus do they delay when focus is lost, are the submenus duplicated on the relevant section pages.(?wcag)
9. Is the language used appropriate for the target audience (we might use a Gunning Fog Index later) and are supporting (explanatory) images provided (?wcag)
10. Does the multimedia work, can we stop and start it . Can we stop animations (2.2.2), audio (1.4.2).
11. Do pop-up windows appear (3.2.1 b) can they be closed easily from within the pop-up
12. If we find a form - do form labels tie into the relevant input box (click on label – does cursor go to correct box – or any box?) (3.3.2 a & b). If fieldset obvious note it (3.3.2 c). Is the form submit button clear (3.2.2 a). Are there radio buttons (3.2.2 b), (1.1 e) 

We then check that we can navigate the site and complete one example of each type of processes using just the keyboard. 
13. If no “Skip to Content” is visible does it appear on first Tab press (2.4.1)
14. Is each link obvious (do rollover buttons work, does link text change on focus. (2.4.7) If focus is not clear does the link target – bottom left of browser – indicate useful target file name) 
15. Is the Tab order logical and easy to follow visually. (2.4.3)
16. Check each navigation bar, top, side and footer.  the site map and any accessibility tools such as text resize are keyboard friendly (2.1)
17. Can pop-ups be closed, are there any keyboard traps (2.1.1 a) (2.1.2)


**Step three is more technical and detailed. We now have a good idea of how the site is structured and where we might find particular areas of interest. We will be often be looking at the code on a page so we can check a number of issues at the same time. 
we start with the Home Page. 
Use the accessibility toolbar to : 
1. Disable the style sheet – does page still make sense (1.3.2 b) . Check colour contrast without and with style sheet (1.4.3).
2. Disable table layout – does it make sense (1.3.2 a) 
3. Check navigation bars have titles (we will look at code as well later) (1.3.1 d) and use list format (1.3.1 d)
4. Hide images by “show text equivalents”. Is the page still as informative, do the images have RELEVANT alt tags (1.1 a) (1.1 b) Are any image of text (esp. section headings, promotions) (1.4.5)
5. Hide background images. Is the colour contrast etc. still good, (1.4.3) Are there still “decorative” images showing (1.1 c) . 
6. Check if the page uses frames. If so check for title and relationship, also highlight for comment later (1.1 g) 
7. Validate the HTML code (copy & paste code samples in endnotes if relevant) (4.1.1 a), if not HTML 5 is there a DOCTYPE (4.1.1 b)
8. Validate the CSS code (find a:link, visited, hover, focus active – are they in the right order – or even used. Do they use proportional dimensions – not pixels) (1.4.4)

Now view the source code to check: 
9. Is meta data (description) suitable (wcag?) is there accesskey code (2.1.1 b)
10. If no “skip links” identified above have they been coded incorrectly (missing anchor etc.) 
11. Are tables used for layout. If so do they use any data table code (th, tbody, summary) 
12. Have heading codes been used for non-semantic elements (4.1.2 b). 
13. Do coded headings include images (can cause screen reader to crash if large, so note location for screen reader test)
14. Does the code include deprecated elements or styling (4.1.1 c) Is the code efficient (commented and no duplication - no <span><span> etc.) (4.1.1 c). Is the language set (3.1.1)

Now repeat tests 1 to 14 on a selection of pages. The actual number of pages depends upon the size and structure of the site but will include main landing pages, any key function pages and a random selection of pages within the site. If an error appears consistently we will not bother to continue checking it on every page we visit. Also, if the site appears to use a template we do not repeat checks on the template area. If the site uses a variety of templates each template is tested at least once. We may check a common error again occasionally just to be certain it is a consistent error, or if we suspect that a particular page has implemented the technology correctly and could be used as an example of good practice. We do not worry about validating every page’s code because we will do a site-wide check later, we do validate the landing pages. For large sites we use the site map to identify and test a selection from each department/section 

We check our progress and if required do a search for pages that contain elements not scored so far such as data tables, forms, forums, animations and multimedia to make sure that these elements are checked. 
15. Data tables use heading cells and have a summary (1.3.1 c) 
16. Forms use the fieldset element if not very short,(3.3.2)and provide feedback in text form (3.3.1 a), identify where errors occur (try to submit wrong and null data to check) (3.3.1 b)
17. Forum topics are coded as headings, (1.3.1 a)
18. Animations have text alternative (1.1 f) and do not flash fast (
19. Video and audio have transcripts or text alternative, video with audio has captions (1.1 f) (1.2.1) (1.2.2) (1.2.3) (1.2.4) (1.2.5)
20. Have foreign words been used if so is the proper language set (3.1.2)

We finish this set of tests by running an automated validation from WDG that give us a 100 page analysis, for large sites we might run this for each subsection. Copies of results are kept. 


**Step 4 uses a number of different standard browsers to check that the site work well in all. We include testing with Internet Explorer, Google Chrome, Opera and Safari. We will check the same pages as in step 1 plus at least one process or information retrieval and a few random pages with each browser. We check print preview on a few major pages, and run a quick check with an Ipad and mobile phone.

We conduct a slightly more in-depth check with a screen reader (Jaws or Supernova) using a different tester (usually a blind tester, but it can be done with a sighted tester). The important thing is that the tester has not been involved in the evaluation so far. S/he must come at it fresh. If we already know that the site is not friendly to screen readers (no headings, ambiguous links, poor html etc) the test will be brief. Otherwise we check that we can navigate the site and complete tasks using just the screen reader. Comments etc. are passed back to the original tester to incorporate in the score card. 


**Step 5 reviews the score card, checks that everything has been covered and prepares the report. If it is a simple test report then the endnotes are rationalised (combined to be one end note per success criteria). Any Near grades are discussed and converted to Pass or Fail. Pass, Fail or NA scores are put in the comment column. An executive summary is written. The report is proof read etc. and sent to the client.

I hope that is useful Eric !!!

Regards
Richard
www.userite.com
Received on Sunday, 26 February 2012 19:50:02 UTC