Re: Task testing structure from Shawn Lauriat on 2020-04-28 (public-silver@w3.org from April 2020)

From: Shawn Lauriat <lauriat@google.com>
Date: Tue, 28 Apr 2020 09:09:06 -0400
To: jake abma <jake.abma@gmail.com>
Cc: John Foliot <john.foliot@deque.com>, WCAG <w3c-wai-gl@w3.org>, Silver TF <public-silver@w3.org>
Message-ID: <CAGQw2hnZE_=0u8nF7R=j5ZRT07t=dOd-CVwdphxCZiQDKPozxQ@mail.gmail.com>
Jake,

I very much share your concerns about complexity, especially where the
concept of tasks meets raw tests. Thank you for the detailed write-up of
what you worked through! It definitely sounds similar to what I've
inched closer to for how I think things can work. I have a few questions
about things, but need to digest what you've written a bit more before I
know how to ask them.

Thanks,

Shawn

On Tue, Apr 28, 2020 at 4:35 AM jake abma <jake.abma@gmail.com> wrote:

> Hi Shawn / all,
>
> After an experiment the last two weeks with "Headings" based on Johns
> pages and Bruce his approach I came to the same kind of conclusions as you
> did and understand your approach, but I see bears on the road with
> combining too much as this creates way too much complexity, takes too much
> time to test, and will lack completeness.
>
> Some findings based on the Headings experiment:
>
> - Last week I also had a call with Wilco about using and extending ACT
> rules as this seems most appropriate for methods
> - Reached the conclusion we need a low, medium, high severity like scoring
> as not all fails are as impactful as others
> - Next to the fails and severity you also have the criticality for the
> task being performed and should probably be judged somehow.
> - Adding the functional areas in your testing scoring directly makes
> testing undesirably long / complex
>
> So when I read your proposal I see similarities in conclusion and the way
> you've tried to fix them, but also so many possible variations and outcomes
> that in general this will possibly not be feasible.
>
> Take your " For ACT rules, Link has accessible name
> <https://act-rules.github.io/rules/c487ae> applies
> <https://act-rules.github.io/rules/c487ae#applicability> to any HTML...
> etc" and lay it next to the heading examples from John, there are so many
> ways to fill; them up / in that you'll have a test / decision tree pretty
> complicated to work through. Even more if you realize that Johns examples
> are not all combinations possible and not technology agnostic. Add to this
> mix all different ways they 'might' be implemented in a page as multiple
> teams might work on the same feature and you'll end up with testing results
> so complex... we need to make it easier step by step.
>
> The same for task based testing. This is a wasps nest to set in stone for
> every site I don't see is feasible for us to document this.
>
> What I do see is:
>
> - All should be based on pass /fail as we had
> - ACT seems most appropriate to extend for testing (is hard to write the
> tests!)
> - we need a adjectival scoring for the pass / fails (3 preferred as more
> will create confusion, disagree, and complexity we probably don't want)
> - A scoring for criticality seems necessary but needs to be clearly scoped
> or will get pushback because of the question "who's criticality". This
> should probably be the task based scenario.
> - Scoring tasks on the same level as pass / fail will probably not work
> and suggest to place the tasks to the scope and within that scope do your
> pass/ fail (with severity and criticallity)
> - Scoping of WCAG conformance should be opened up to add more than one
> options like: web page, instance based, process or task completion,
> component, feature ...
>
> In short this comes down to:
>
> - Create Scope (more possible like 5 screens, 2 tasks, 3 widgets)
> - Do the pass / fail based on ACT rules
> - Judge severity (Low, Medium, High impact)
> - Severity should be added to ACT rules
> - Add criticality based on scope
> - Criticality might be guided by test work like Johns Heading pages
> - Provide scoring based on total issues, severity of issues and
> criticality for scope
> - Scoring can be as easy as severity and criticality divided by total
> score (low = 1, medium = 2, high = 3?!)
>
> The result is not a perfect something, but we need to acknowledge our
> strengths and weakness and this is a simple approach, doesn't take a lot
> more time than we're used to, is for most part already known by WCAG
> testers and gives so much more insight in WCAG testing compared to what we
> have now.
>
> Have all documented but here the spreadsheet for headings experiment:
>
>
> https://docs.google.com/spreadsheets/d/1fAbtL_A6Xhbs5mKpuLuWPYu2QmKwIPdw6GoE6C5K3w0/edit?usp=sharing
>
> Cheers,
> Jake
>
>
>
> Op ma 27 apr. 2020 om 21:56 schreef John Foliot <john.foliot@deque.com>:
>
>> Hi Shawn,
>>
>> I think there is a fundamental issue here however that perhaps is being
>> overlooked: not all websites are "task-based", and/or not all sites are
>> *exclusively* task-based. And is not "playing a game" also a task?
>>
>> In your previous straw-man example, a decision had been taken that the
>> "pizza game" isn't part of the task of ordering a pizza, so you then argue
>> it's out of scope. But Accessibility Human Rights legislation doesn't work
>> that way: if it's posted on-line for the general population, then it *MUST*
>> be in scope for being accessible as well, and so while I can support the
>> idea of task-based testing within Silver, I fall significantly short of
>> allowing conformance scoping to have the ability to pick and choose what
>> they think is critical for all users, and what content they think "disabled
>> users" don't need (or want).
>>
>> And while I'll note that VPATS have a notion of "supporting with
>> exceptions", and accept that we're going to see something similar with the
>> Silver scoring methods, I personally will strenuously oppose selective
>> scoping at the page or site level. There is a world of difference in saying
>> "*We've got an 85% Accessibility score, INCLUDING our non-accessible
>> game*" versus "*We got to 85% conformance by REMOVING the game from our
>> test scope*".
>>
>> JF
>>
>> On Mon, Apr 27, 2020 at 2:25 PM Shawn Lauriat <lauriat@google.com> wrote:
>>
>>> ...now has to go to court and explain why they thought that game wasn't
>>>> important for disabled people?
>>>
>>>
>>> Exactly. And now the court and those involved have clear documentation
>>> for how the pizza place considered accessibility and can then look at the
>>> resulting impact to users. Everyone today with WCAG 2.x's conformance model
>>> has that same ability to just declare a path or a page as not a part of
>>> what needs to conform for a given conformance claim, so I don't think this
>>> concept introduces any new ways for people to get things wrong there.
>>>
>>> By giving people the ability to define their own scope as groups of user
>>> journeys, and users a way to identify gaps in that scope that affect them,
>>> I think transparent task-based conformance can better support both sides of
>>> that while also offering a structure for test results to have a better
>>> chance of expressing the resulting experience for users trying to do things.
>>>
>>> -Shawn
>>>
>>> On Mon, Apr 27, 2020 at 2:27 PM John Foliot <john.foliot@deque.com>
>>> wrote:
>>>
>>>> Hi Shawn,
>>>>
>>>> > Following that example of the pizza place site: they may have left a
>>>> pizza game out of their scope of conformance, judging it not a part of
>>>> their core offering of pizza. If someone has a problem with the pizza game
>>>> and raises that as an issue preventing them from getting pizza, everyone
>>>> then has the clear scope that left the game out, and whoever [was] involved
>>>> in the decision...
>>>>
>>>> ...now has to go to court and explain why they thought that game wasn't
>>>> important for disabled people?
>>>>
>>>> *Re: Scoping*
>>>> The ability to cherry-pick what is and isn't out of scope is a
>>>> dangerous precedent/concept, and will have (I fear) detrimental effects for
>>>> persons with disabilities. Why wouldn't a "pizza game" be of interest to
>>>> disabled users as well? Why shouldn't they also get to play along? Because
>>>> making the pizza game accessible is too hard? - wrong answer...  (I recall
>>>> our colleague and friend Victor Tsaran once saying to me - and I paraphrase
>>>> - that today it's relatively easy to make sites 'accessible', but he could
>>>> hardly wait for the day when they were also "fun" - this from back when he
>>>> was still at Yahoo!, and they included an Easter Egg on the Yahoo! site:
>>>> https://youtu.be/xXNkP2jU7Pg)
>>>>
>>>> Selective Accessibility MUST be avoided, not encouraged, and I fear
>>>> your use-case is an example of why we shouldn't be leaving scoping to the
>>>> content owners (and also demonstrates how easy it will be for uninformed
>>>> content creators to miss the forest, because we've got them looking at -
>>>> and selecting - specific trees...) Using the same logic, I could also argue
>>>> that content in an <aside> isn't really critical to the main content (which
>>>> MUST be accessible) - that's why it is an aside - and so any content in an
>>>> <aside> is then out of scope? Slippery slope ahead.
>>>>
>>>> JF
>>>>
>>>> On Mon, Apr 27, 2020 at 12:36 PM Shawn Lauriat <lauriat@google.com>
>>>> wrote:
>>>>
>>>>> Many good questions!
>>>>>
>>>>> I'm kinda liking aspects of this approach (ACT Rules format for
>>>>>> testing flows), but (of course) I have a critical question: *how do
>>>>>> we score something like this?*
>>>>>
>>>>>
>>>>> Honestly, I'd like to think through that as a separate thing to figure
>>>>> out from the topic of scoping and task definition, though still heavily
>>>>> related. We could end up with any number of scoring systems using the same
>>>>> scoping and task definition. Trying to figure them out at the same time
>>>>> just introduces too many variables for me.
>>>>>
>>>>> As I described it to Jeanne recently, I have this kind of thought
>>>>> about how we could define scope and tasks, I have a clear-ish sense of how
>>>>> we can build up tests for methods, but still have only murky ideas on how
>>>>> we can get the two to meet in the middle. We've certainly made some good
>>>>> progress on that, but we still definitely have further to go.
>>>>>
>>>>> Open question: is this a correct interpretation? Does all critical
>>>>>> path testing need to start from a common starting point?
>>>>>
>>>>>
>>>>> A really good question, and one that honestly depends on the site or
>>>>> app (etc.). For a pizza site, you can link directly to the contact page.
>>>>> For an app like Google Docs, you can't really link directly to text
>>>>> substitution preferences, so that'd need to come from a more common start
>>>>> point. We should help walk people through how to define and include this in
>>>>> scope, definitely, as the accessibility of a thing doesn't really matter if
>>>>> you can't get access to it in the first place.
>>>>>
>>>>> Additionally, how do we ensure that all critical path testing is
>>>>>> scoped by any given site? (the current scoping proposal leaves it to the
>>>>>> site-owner to scope their conformance claims, so leaving out complex or
>>>>>> critical flows that are non-conformant could be easily overcome by simply
>>>>>> leaving those flows out of the testing scope).
>>>>>
>>>>>
>>>>> I don't think we need to. If we have a clear definition of how to
>>>>> scope, and a way for people to transparently declare that scope, we can
>>>>> leave the "is this right?" part to those who need to decide it. Following
>>>>> that example of the pizza place site: they may have left a pizza game out
>>>>> of their scope of conformance, judging it not a part of their core offering
>>>>> of pizza. If someone has a problem with the pizza game and raises that as
>>>>> an issue preventing them from getting pizza, everyone then has the clear
>>>>> scope that left the game out, and whoever involved in the decision as to
>>>>> whether the scope should include the game can make it in an informed way.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Shawn
>>>>>
>>>>> On Mon, Apr 27, 2020 at 12:20 PM John Foliot <john.foliot@deque.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Shawn
>>>>>>
>>>>>> I'm kinda liking aspects of this approach (ACT Rules format for
>>>>>> testing flows), but (of course) I have a critical question: *how do
>>>>>> we score something like this*?
>>>>>>
>>>>>> Each site(1) is going to have "critical paths" but few sites will be
>>>>>> sharing the same critical paths. Additionally, some paths or tasks (find
>>>>>> hours of operation) are significantly easier to do then others (update my
>>>>>> emergency contact information on my companies HR intranet), especially if
>>>>>> it pre-supposes that *all* paths start at a site's "homepage" (and/or the
>>>>>> outcome or solution to Success Criterion 2.4.5 Multiple Ways - i.e. a
>>>>>> sitemap page or search results page).
>>>>>>
>>>>>> No matter which, it seems to me that testing a critical path needs to
>>>>>> start *somewhere*, and for a scalable and repeatable testing regime, about
>>>>>> the only thing all sites have in common is a 'homepage', which is something
>>>>>> your example already suggests:
>>>>>>
>>>>>>
>>>>>>    1. Load the pizza restaurant's site
>>>>>>       1. Possible inputs: found via search engine, hit a bookmark
>>>>>>       link, selected from browser's history, etc.
>>>>>>       2. *Main page loads* with focus at the top of the screen
>>>>>>
>>>>>> Open question: is this a correct interpretation? Does all critical
>>>>>> path testing need to start from a common starting point?
>>>>>>
>>>>>> Additionally, how do we ensure that *all *critical path testing is
>>>>>> scoped by any given site? (the current scoping proposal leaves it to the
>>>>>> site-owner to scope their conformance claims, so leaving out complex or
>>>>>> critical flows that are non-conformant could be easily overcome by simply
>>>>>> leaving those flows out of the testing scope).
>>>>>>
>>>>>> JF
>>>>>>
>>>>>>
>>>>>> (1: site being an euphemism for 'online digital activity or presence'
>>>>>> - as we need to take XR and other emergent tech into account as well)
>>>>>>
>>>>>> On Mon, Apr 27, 2020 at 10:28 AM Shawn Lauriat <lauriat@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> From an email I sent to some ACT folks a little while ago, where I
>>>>>>> had tried expressing my thoughts on how we could use the same kind of
>>>>>>> structure that ACT has, but as a way of essentially expressing overall
>>>>>>> scope as a set of user journeys for task based testing. Hoping this can
>>>>>>> help for tomorrow's conversation to have an example written out:
>>>>>>>
>>>>>>> For ACT rules, Link has accessible name
>>>>>>> <https://act-rules.github.io/rules/c487ae> applies
>>>>>>> <https://act-rules.github.io/rules/c487ae#applicability> to any
>>>>>>> HTML element with the semantic role
>>>>>>> <https://act-rules.github.io/rules/c487ae#semantic-role> of link that
>>>>>>> is included in the accessibility tree
>>>>>>> <https://act-rules.github.io/rules/c487ae#included-in-the-accessibility-tree>
>>>>>>> . Link in context is <https://act-rules.github.io/rules/5effbb>
>>>>>>> descriptive <https://act-rules.github.io/rules/5effbb> essentially
>>>>>>> applies to any element that passes Link has accessible name
>>>>>>> <https://act-rules.github.io/rules/c487ae>. In other words:
>>>>>>>
>>>>>>>    1. For each thing exposed in the accessibility tree as a link
>>>>>>>       1. Go through Link has accessible name
>>>>>>>       <https://act-rules.github.io/rules/c487ae> steps
>>>>>>>       2. For each link that fails, note result
>>>>>>>       3. For each link that passes
>>>>>>>          1. Go through Link in context is
>>>>>>>          <https://act-rules.github.io/rules/5effbb>descriptive
>>>>>>>          <https://act-rules.github.io/rules/5effbb> steps
>>>>>>>          2. For each link that fails, note result
>>>>>>>
>>>>>>> For tasks, even if simply in Education & Outreach type
>>>>>>> documentation, we could walk people through the process of defining tasks
>>>>>>> and the steps within each task similar to how the ACT Rules Format
>>>>>>> <https://www.w3.org/TR/act-rules-format/> describes composite rules
>>>>>>> and the atomic rules within each composite.
>>>>>>>
>>>>>>> The scope of a pizza restaurant's site could then have the
>>>>>>> definition of a collection of tasks, the level at which they could/would
>>>>>>> measure overall conformance:
>>>>>>>
>>>>>>>    1. Choose what kind of pizza to order from the available options
>>>>>>>    2. Find out the hours of operation
>>>>>>>    3. Find out how to get to the restaurant to dine in
>>>>>>>    4. Contact the restaurant to order delivery
>>>>>>>
>>>>>>> Each task could consist of atomic actions, typically defined by
>>>>>>> design, development, and testing activities. For task 2. Find out the hours
>>>>>>> of operation, that could look like:
>>>>>>>
>>>>>>>    1. Load the pizza restaurant's site
>>>>>>>       1. Possible inputs: found via search engine, hit a bookmark
>>>>>>>       link, selected from browser's history, etc.
>>>>>>>       2. Main page loads with focus at the top of the screen
>>>>>>>    2. Navigate to contact page (composite, describes one possible
>>>>>>>    path)
>>>>>>>       1. Move focus to site navigation menu
>>>>>>>       2. Open navigation menu
>>>>>>>       3. Move focus to "Contact us" link
>>>>>>>       4. Activate link
>>>>>>>    3. Navigate to text containing the hours of operation (composite)
>>>>>>>       1. Find "Hours of operation" section
>>>>>>>       2. Read contents of "Hours of operation" section
>>>>>>>
>>>>>>> Within the steps of each atomic task bit, we could then run through
>>>>>>> the applicability checks for each ACT-type Rule. So Link has
>>>>>>> accessible name <https://act-rules.github.io/rules/c487ae> would
>>>>>>> apply to all links within the path, but not to a random link in the footer
>>>>>>> that has a label that doesn't imply any relation to hours or contact
>>>>>>> information.
>>>>>>>
>>>>>>> I have thoughts about how each of these could work and how we would
>>>>>>> define applicability of rules and such based on the tasks, but I think it
>>>>>>> would make sense to just start with this higher-level question of whether
>>>>>>> we could (or should) have some kind of structured task definition similar
>>>>>>> to ACT's current structured rule definition.
>>>>>>>
>>>>>>> -Shawn
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *John Foliot* | Principal Accessibility Strategist | W3C AC
>>>>>> Representative
>>>>>> Deque Systems - Accessibility for Good
>>>>>> deque.com
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> *John Foliot* | Principal Accessibility Strategist | W3C AC
>>>> Representative
>>>> Deque Systems - Accessibility for Good
>>>> deque.com
>>>>
>>>>
>>>>
>>
>> --
>> *John Foliot* | Principal Accessibility Strategist | W3C AC
>> Representative
>> Deque Systems - Accessibility for Good
>> deque.com
>>
>>
>>
Received on Tuesday, 28 April 2020 13:09:34 UTC