Summary Report-Comments on WCAG Documentation- 10 Sep 03

Comments on Draft WCAG 2.0 Implementation Testing Framework:

Congratulations - This is a very good initiative towards categorizing some important ideas. However, as it currently stands the document seems to be a loose collection of ideas, without much coherent structure. A possible organization might be: (1) WCAG testing process information, (2) WCAG technical testing information, and (3) additional WCAG testing details. Comments in support of (1) and (2) are covered under "General Comments" below, and comments in support of (3) are covered under "more specific comments" below. Perhaps all that should be in this document ise WCAG-specific information, with references to other resources if they can be used (just a thought)?

General Comments (accessibility or W3C specific):

What is the purpose and intent of this document? Is it designed to promote implementation testing, or to provide additional WCAG-specific conformance test information not mentioned elsewhere? Or both? This isn't made clear at the beginning of the document. Without a clear purpose and intent statement, as it is currently written, the draft document seems to have a mixed identity - a hodgepodge of test- related documents.
The content of this document should follow existing QA documentation as much as possible. The two QA documents that may be applicable are: (1) stable parts of the Operational Guidelines (in particular, the QA Process Document), and (2) the Test Guidelines (under development).
What happened to the WCAG test suite process document being developed by Phill Jenkins and Tim Boland at the March 2003 WCAG meeting? This document, developed from a QA template, may answer a number of questions raised. Why not use this W3C test suite process document, and if needed, the WCAG WG can have a "delta" on top of this document to address WCAG-specific issues, since the nature of WCAG testing may be different than previous testing efforts (e.g., testing guidelines rather than markup validity per se)?
There is a generic accessibility testing technical document sent to the WCAG WG in March as well as to the AUWG. The AUWG WG felt it was a good starting but needed tailoring to the specific requirements of authoring tools testing. Perhaps the same is true for WCAG testing? In any case, much of the content of that document seems to overlap with content mentioned in this document.

More specific comments (things possibly WCAG-specific)

Are the WCAG techniques documents mentioned normative in a testing sense (or informative)? This doesn't seem to be clearly addressed.
Is it worthwhile dealing with different categories of web content and having different tests apply to each? In the latest ATAG working draft different categories of authoring tools are mentioned, with possibly different tests for each. Perhaps a resolution of this issue could be stated in this document if appropriate.
It seems to me that WCAG implementation testing needs to be clearly defined in this document, in concert with any QA definitions, as opposed to testing conformance to the WCAG draft itself. Also the classification and organization of WCAG tests needs to be clearly explained, as well as how an offering would state its features for testing (by filling out a form, for example). WCAG testing may have particular requirements in these areas.
It seems to be difficult for me to understand the purpose of the two tiers of implementation testers mentioned. What is the difference? Why not just one kind of implementation tester?
I think that the importance of objective and testable conformance requirements (to the maximum extent possible) needs to be stated. It is recognized that certain testing activities of the W3C (such as voice browser, authoring tools, and possibly WCAG testing) may be somewhat subjective in nature, and thus it may be unavoidable to include some subjective tests, because of the nature of the WCAG draft, but such subjectivity should be kept to a minimum if possible.
Usability testing principles (relating to accessibility) may play a role in certain aspects of WCAG testing (just like with authoring tools testing). It may be appropriate to draw upon usability testing resources in that event and mention in this document.
I think that it is important to define test purpose, test environment, test procedure, and expected test results in a systematic fashion for each WCAG test. I think that it is also important to define which tests can be automated and which need to be "manually evaluated" NOTE: Much of this is covered in the above-mentioned testing technical document, but any WCAG-specific issues should be mentioned.
Should tests be aimed at designers (proper design practices being followed) or at users (leading the user to do the "right" thing?) Perhaps there should be testing twice, at beginning and at end of "process" (that is, test both the user and the designer). It may be appropriate to mention a resolution/clarification of this issue in this document.
For the CSS WG, for example, a "requirement" for exiting CR stage is that at least two implementations are able to provide support for mentioned functionality. Would a WCAG analogy be that at least two evalutation and repair tools (e.g., Bobby, A-Prompt) need to generate the same results (e.g., finding same problems with inaccessible code) for some particular content? If so, it may be appropriate to mention.
An discussion item occurred previously in the AUWG as to whether it was appropriate to deliberately generate broken HTML, invalid CSS, non-well-formed XML, etc., as a part of accessibility testing, in content to see whether all errors could be caught by an authoring tool , or to keep the principle of requiring validated code as a prerequisite to any accessibility testing. Perhaps a resolution of this item could be mentioned in this document if appropriate.
What is the relation of web content to a web resource, in the W3C sense (for example, to IRIs)? This is not mentioned in this document.
The audience for the tests (users of the tests) need to be specified, as well as possible uses of the test results (legal, administrative (policy), etc.).
The importance of "testing the tester" is not mentioned in this document. Perhaps it should be?
Is there any implied extensibility of WCAG testing to content other than "web content" (assuming an agreed-upon definition exists). If so, it may be appropriate to mention.

Comments on the WCAG Working Draft of 24 June 2003:

Again, congratulations - This is an excellent step forward in a WCAG working draft. Comments here are divided into general comments on the entire document, and specific comments on each checkpoint.

General comments on entire draft (NOTE: The term "success criteria" is used throughout to refer to one criterion or multiple criteria.)

Does the repeated use of the adjective "required" before each success criteria (which I like) imply that there will be "optional" success criteria?
The kind of accessibility desired is not stated explicitly. Example: Success criteria for Checkpoint 4.1 "accessibility feature". What kind of accessibility? Visual? Auditory? Tactile?
What is the difference between use of "non-normative" (#4 top layer of "how to read document", as well as Appendices A through C) and use of "informative" (as used in the checkpoint sections - for example, definitions for Checkpoint 2.1)? Perhaps one or the other term could be used, but not both, or at least indicate that the terms are equivalent, if they are meant to be? This is related to item below, but maybe on a higher level, relating to how the entire draft is viewed.
Terminology within the draft seems not to be entirely consistent. EXAMPLE: "site", "resource", "application" are all used in section 3, and it is unclear whether the context and semantics of each use are entirely different. I think that it is important to come up with common definitions and use common terms that mean the same thing for the same concept.
For the definitions under each checkpoint, are they consistent with terminology in the WAI glossary? W3C glossary? Other glossaries? I know there is W3C glossary work currently. The checkpoints are just goals, right?
Is there a logical ordering to the checkpoints as presented? In the ATAG WD the checkpoints are presented in the natural order in which one would step through them - is that true here? How should one "process" the checkpoints (perhaps this could go into the "how to read this document" section)
It might be good to insert quantifiers like "all" and use "MUST" where appropriate in every success criteria for testability (EXAMPLE: most success criteria language in the draft does not have such quantifiers). Thus, Words like "must do" and "required" should be used if appropriate to specify minimum requirements. NOTE: It is possible to test "should" statements but they should be considered as options.
The relation between checkpoint definitions (goals?), success criteria, and text in boxes for each checkpoint seems not to be adequately explained; this may hinder usability of draft. Also, the format of the checkpoints seems not to be entirely consistent (EXAMPLE: Checkpoint 1.2 has a "best practices" section, whereas Checkpoint 1.3 does not) In general, for the checkpoint formats, it seems like implementation techniques, informational statements, and conformance requirements are mixed in together, which makes the objectivity of the entire success criteria statement difficult to determine. It may be good to only use conformance requirement language (or specify options where applicable) in success criteria language (EXAMPLE: For Success Criteria for Checkpoint 4.2, second sentence seems informational).
I think a goal is that the number of success criteria per checkpoint should be minimized (EXAMPLE: Success criteria for checkpoint 1.2 has six items)
Some of the success criteria seem to be duplicates of one another, and some statements in the success criteria appear contradictory or don't appear to belong in the same success criteria. EXAMPLES: Success criteria for 4.1 and 4.3 seem to overlap, and six items in success criteria for Checkpoint 1.2 seem to have little relationship with each other or are partial duplicates (#5 points to Checkpoint 1.1 for conformance). It may be desirable to ensure that semantics of each success criteria be unique as much as possible, and that each sentence in a success criteria statement contributes to the meaning of a success criteria statement in a consistent fashion, as well as being necessary for that meaning. Also, I think that hhe success criteria formats for each Checkpoint should be similar to the maximum extent possible (EXAMPLE: In almost any two success criteria from different checkpoints, the sentence structure is different).
It seemed to be difficult for me to determine what is required (minimum requirements) by reading each success criteria (see item above). Thus there is a general concern with objectivity and testabiity of success criteria as stated. EXAMPLE: "easy to find" in success criteria for Checkpoint 3,4, and "significant" in success criteria for Checkpoint 1.2). It may be good to use words like "must do" and "required" to specify minimum requirements.

Specific Comments On Particular Parts of Document (NOTE: These are very detailed comments from my perspective, and mainly apply to evaluating objectivity and testability of Section 3 and specific success criteria language.)

Section 3 - In the "conformance" subsection of 3, it may need to be defined what it means to "meet" or "satisfy" a checkpoint. For the "Core" group under "Conformance", "all" may need to be included; should "all" be included for the "Extended" subheading as well? What if not all Core checkpoints are passed but all Extended Checkpoints are passed (Case 5 under "Conformance Claims")? Is that possible? Is "Core" a strict subset of "Extended"? Are there dependencies between the checkpoints? Do the checkpoints overlap in any way? These questions may need to be explicitly answered in the draft.
Section 3 - For the "Editorial Note" questions/issues, there may be some helpful AUWG information to address these (for example, in the first two bullets addressing "conformance claims"). QA documentation may help here as well?
Section 3 - There seems to be confusion in the terminology of the four guidelines under "Overview of Design Principles" (example: "perceivable" vs. "understandable"); also, for Guidelines 1,2,3, there are circular definitions (example: "operable" is defined in terms of itself). Any overlap or ambiguity here?
Success Criteria for Checkpoint 1.1 - Please insert "all" before 1 and 2, and replace "has" by "must have" if appropriate. Is there a third success criteria (empty circle after the first two items), or is the third item informational?
Success Criteria for Checkpoint 1.2 - There may be a question as to objective testability of "perceived" and "to the extent possible" - #1, as well as "significant" (#2), and "is primarily" - #4. What is relationship/ordering among the six points of 1.2 success criteria?
Success Criteria for Checkpoint 1.3 - It may be difficult to objectively test a "without" or "not" (testing a negation). The meaning of "emphasis" may be unclear. Is it being said here that semantics does not depend upon presentation? This criteria may need more definitions of terms. There may be some implementation techniques mixed in here.
Success Criteria for Checkpoint 1.4 - what version of Unicode is being referenced? There is no reference to any Unicode conformance requirements. "All" should be placed in front in this criteria language if appropriate.
Success Criteria for Checkpoint 1.5 - The word "all" may be needed in this success criteria, as well as more definitions of terms. There may be a question as to objective testability of "from each other"; does this phrase refer to position in document or kind of element?
Success Criteria for Checkpoint 1.6 - what happens if background image is important to meaning of text?
Success Criteria for Checkpoint 2.1 - It may be difficult to objectively test the word "concisely".
Success Criteria for Checkpoint 2.2 - Please replace "is" by "must be" if appropriate. Also the last two bullets read differently (have a different format) from the first three bullets.
Success Criteria for Checkpoint 2.3 - It may be difficult to objectively test "not designed to", because the intent of the designer is being queried. It also may be difficult to objectively test a negative, as well as to objectively test "as close as". The relation between a and b seems to be unclear. For b, in spite of the warning, does the user go to the page anyway with content that does not flicker substituted for the flickering content? Or is the user warned and thus does not go to the desired page, but is directed to a different (alternate) page with the same content without flicker? There seems to be an inconsistency between the first part of b, which says flicker is unavoidable, and the second part of b, which says that content without flicker is provided. The definition of "page" may need to be clarified and may be subjective; is "site" or "resource meant in this success criteria?
Success Criteria for Checkpoint 2.4 - The term "words" may be English-centric, and testing "perceived" pages may be subjective. There may need to be a definition of "display orders". Are all of a,b,c required to be provided simultaneously, or just some of the items at different times?
Success Criteria for Checkpoint 2.5 - What kind of error is meant in this success criteria? What constitutes an error? There are a lot of different kinds of errors and what is an error to one might not be (or be a warning) to another. Is every occurrence of every type of "error" meant? Please substitute "must be" for "is" if appropriate. What kind of feedback (notification or system shut down or aborted operation or warning) must be provided? Must the feedback be provided in every instance?
Success Criteria for Checkpoint 3.1 - Seemingly inconsistent terms (passages, content, fragment) with the appearance of some overlap in meaning are used; also "document" (yet another different term) is used in "best practices" sentence. I think that there needs to be a consistent way of stating terms that mean the same thing in a success criteria. Please insert "all" in front of "passages" if appropriate. Please provide a definition of a "passage" if appropriate. Please replace "are" with "must be" if appropriate. It seems to be unclear as to whom the identification is directed (to the user, perhaps?).
Success Criteria for Checkpoint 3.2 - It may be difficult to test "do not appear unambiguously" objectively. Please use "all" and "must be" if appropriate. Is there one unabridged dictionary for a language? It seems there may be several dictionaries for a language with "full (definition?)" content. This may have been a subject discussion at the W3C glossary meeting in March 2003. Also, "site" is used here, but "page" is used elsewhere; are the two terms being used interchangeably to mean the same thing? Would "resource" be a better term? How does one objectively measure the availability of a glossary on a site (depends upon definition of "site" and "availability")?
Success Criteria for Checkpoint 3.3 - Is all content on a site or resource meant by "the content"? Which content? There may be an issue with the objective testability of "has been reviewed". Reviewed for what? There may be an issue with objective testability of "taking into account" and "apply as appropriate". Must all of a through g be applied simultaneously? If not, which ones apply to which content in an objective sense? There may be needed in this success criteria a definition of terms in a through g. Who is "they" in g?
Success Criteria for Checkpoint 3.4 - A definition of "key" may be needed. Objective testability of "are generally found" may be an issue. Please use "all" and "must" where appropriate. What is the context of the locations? The relationship between 1, 2, and 3 seems unclear; either 1 or 2 or 3 is true at a particular time, but not all three simultaneously? Do 1,2,3 represent a partitioning of predictability? There may be questions as to the objective testability of "extreme" in 3, and objective testability of "easy to find" in 3a.
Success Criteria for Checkpoint 4.1 - #1- The objective testability of "are avoided" may be an issue (avoided in what sense?). There may need to be unambiguous definitions of "feature" and "accessibility feature". The objective testability of "are used" (in several places) may be of concern. Please use "All markup" and "must" as appropriate. What is the relationship between "markup", "element", "content", and "document" (all terms used in this draft - do they mean the same thing)? What is the definition of "markup" vs. "element"? There seems to be an overlap with success criteria of checkpoint 4.3 of #2 and #3 of this checkpoint's success criteria.
Success Criteria for Checkpoint 4.2 - What is the definition of "web resource" vs definitions of "site" or "page"? Please use "must include" if appropriate. What does "its" refer to? The objective testability of "as intended" may be an issue. There seems to be an inconsistency between sentences #1 and #2; also sentence #2 seems informational, not normative.
Success Criteria for Checkpoint 4.3 - Please replace "the.." with all technologies or combinations of technologies" if appropriate. In what context are the technologies chosen (by site, resource, application, page)? For a, there may need to be a definition of W3C device independence (DI) support re: W3C DI activity? For b, there may need to be a definition of "accessibility feature". Is it required to do all of a thru e simultaneously? For c, a definition may be needed of "publicly", "interfaces", "interoperability"? A distinction concerning interoperability testing vs. conformance testing may need to be made. Is an interface a technique in this success criteria? For d, is this a requirement for one or all operating system accessibility features (definition?)? what if there are none? There may be an issue with testability of "suported by assistive technologies" (definition?); what if there is no support? The relation between 1 and 2 of this success criteria seems unclear. Why is 2 associated with this success criteria? There may need to be a definition of "custom user interface". Does a custom user interface represent a technique in this success criteria? The items under 1 seem ambiguous and not necessarily accessibility-related.