Re: agenda for Silver meeting of 10 July 2020 from Rachael Montgomery on 2020-07-14 (public-silver@w3.org from July 2020)

From: Rachael Montgomery <rachael@accessiblecommunity.org>
Date: Tue, 14 Jul 2020 06:24:00 -0400
To: Jeanne Spellman <jspellman@spellmanconsulting.com>, jake abma <jake.abma@gmail.com>
Cc: Silver Task Force <public-silver@w3.org>
Message-ID: <39fe7376-8441-453b-86f6-2b6268594fa5@Spark>
Jake,

Thank you very much for the thought filled review. I will add clarifications before today’s meeting where I can and capture outstanding issues where I cannot.

I did want to recommend for others stopping at the history and notes section as I used that to retain previous but no longer completely relevant work so we could go back to it during discussions.

I think there is a lot here we can discuss so thank you again.

Rachael
On Jul 14, 2020, 5:44 AM -0400, jake abma <jake.abma@gmail.com>, wrote:
> Hi Rachael / all,
>
> Hereby some comments already for the proposal.
> Please excuse me if some comments are a bit bluntly in wording and might need some explanation from my side, but they are just some questions popping up when going through your great work.
>
>
> 1. Adjectival Rating
> ·      Agreed
> 2. Disclaimer
> ·      Agreed
> 3. Ideas Incorporated
> ·      Mostly agreed
> ·      Path/task based conformance
> o   how is this incorporated, I don't see it yet?
> ·      Incorporate usability testing
> o   how is this incorporated, I don't see it yet?
> ·      Current small SC (language in page) need to be balances with current large SC (text alternatives)
> o   What do you mean with this?
> ·      Substantial conformance
> o   What do you mean with this?
> 4. Declaring Scope
> ·      Agree IF my conclusion is correct
> Conformance is defined for paths
> ·      “Path - A single view or the complete series of views needed to complete a task from end-to-end. “
> ·      “View - All content visually and programmatically available without an interaction equivalent to loading a new page”
> o   So, elements on a page are in scope although not related / needed for the Task? In other words, we, kind of, replace 'Web Page' with 'view' as we're
> 5. Documentation Hierarchy
> ·      Not agreed
> Functional Categories
> ·      High level grouping of functional needs
> o   Do we mean here the EN 301 549 kind of grouping?
> Guidelines (Functional Outcome)
> ·      Score each guideline based on tests between 1 and 5
> o   A guideline is not a functional outcome, there may be 2,3,4 or even more Functional outcomes in a guideline, all with their own functional needs, own testing and multiple methods closing gaps not always in line with each other making scoring much, much more complex, see:
> o   https://docs.google.com/spreadsheets/d/1yTpbJFsaadIbCigV15YK6bWB4xk_MoOgREjGMHHMVgQ/edit#gid=1983238719
> o   https://docs.google.com/spreadsheets/d/1iCJfyMtcsSq7GHmwnc4aTNguadRfGDa0H8FBZMaJpcQ/edit#gid=2091469352
> ·      Guidelines have a many:many relationship with functional categories
> o   Isn’t it a One-to-Many? One guideline can be the habitat of lots of functional needs.
> o   And IF there is more than one functional outcome, isn’t it One-to-Many for the outcomes: One functional outcome can be the habitat of lots of functional needs. (all under one guideline)
> Tests
> ·      3 types of tests
> o   Level 1: Tests that would fit within the 2.x structure (automated, manual, page based)
> §  We now replace page based with ‘view’? And to explain this to an external person this means the same in our new approach?
> o   Level 2a: Tests that require context to evaluate or are harder to meet
> §  We have context in 2.x structure, what is the difference here?
> §  Harder to meet, what does that mean here?
> §  Why is this another type of test?
> o   Level 2b: Usability or AT testing
> §  A type of test Usability OR AT, aren’t those 2 completely different kind of tests?
> §  Usability is not adjectival rating, what kind of usability tests are we talking about here? Benchmarking?
> §  AT Testing, that is not a goal but a means, it helps but is not needed
> Tests have a many:many relationship with Guidelines
> ·      What do you mean with this?
> 6. Scoring Process
> ·      2. Run all level 1 tests for all views within a path
> o   Automated AND manual (and not page but VIEW based…)
> o   I think we’re done here as manual contains all other suggested tests IF we demand them
> o   Run all tests, what does that mean? Where are the tests? In the Methods? What IF the method is not present OR if multiple methods are present to solve an issue? Which one to choose?
> ·      4. Note the % tests passed for each view (total passed/total in view)
> o   This might be simple for some but hard for others, maybe impossible…
> o   Think of a very large / dynamic page with lots of text in all kind of different places (maybe 100, 200, 300… text nodes? Do we want to count them so we can say how many pass contrast?
> ·      5. Note tests that are not applicable
> o   What do you mean here? Many methods, many functional outcomes, not complete by definition, why mention all tests NOT applicable?
> ·      Average all the tests for a guideline for an overall %
> o   This is exactly a BIG challenge, as we need normalization from different perspectives. Like ESSENTIAL METHODS? SUPPLEMENTAL METHODS? COMBINATIONS? BONUS METHODS? AND… personalization kind of methods… see also again: https://docs.google.com/spreadsheets/d/1yTpbJFsaadIbCigV15YK6bWB4xk_MoOgREjGMHHMVgQ/edit#gid=1983238719
> ·      9. If average score = 3, run level 2a and/or 2b tests
> o   First of all I do not see this need of test 1 before 2, there is not a clear need or rationale
> o   I don’t see this work in practice yet, we need a large elaborated example with lots of data and relationships, normalization etc. of scoring before we can make such a call
> ·      10, 11, 12
> o   Also for the next steps I don’t see this work in practice yet, we need a large elaborated example with lots of data and relationships, normalization etc. of scoring before we can make such a call
> o   The example provided in the spreadsheet does not work in practice as it is not granular and mature enough and can not be used per element checked.
> o   The example: https://docs.google.com/spreadsheets/d/1Ctg489tMunn6Yfqc2x-S24WGBz6TDHuyGXBk7Y_PqJI/edit is also a all-or-nothing statement and this is not how views/pages are constructed, we need a more granular, realistic, approach and this shows the difficulty of the scoring:
> §  Here are two elaborated examples with challenges…
> §  https://docs.google.com/spreadsheets/d/1yTpbJFsaadIbCigV15YK6bWB4xk_MoOgREjGMHHMVgQ/edit#gid=1217251850
> §  https://docs.google.com/spreadsheets/d/1iCJfyMtcsSq7GHmwnc4aTNguadRfGDa0H8FBZMaJpcQ/edit#gid=633158340
> 7. Conformance
> ·      We have a lot to discuss before I can get to this point
> ·      First solve scoring in more details
> 8. Functional Categories
> ·      We need the next level of Functional outcomes and the granularity of how to apply methods to them
> ·      In other words, a more clear view on the === user need / functional outcome / guideline === structure
> ·      See: https://docs.google.com/spreadsheets/d/1yTpbJFsaadIbCigV15YK6bWB4xk_MoOgREjGMHHMVgQ/edit#gid=1983238719
> 9. Sample Page
> ·      Not sure what this slide explains, do we have an example of the page AND scoring in detail?
> 10. History & Notes
> 11. Notes from Testing
> In order to get a consistent % passed, the tests will need to be more granular than current SC and clearly define what is counted as an “item” against the %
> ·      ACT tests will be very important to this approach
> o   What do you mean here? 2.x tests? And are they not granular enough? Or do you mean adding a adjectival rating score?
> o   ACT tests ARE 2.x, so do you want them to contain adjectival rating? OR more ACT tests? They are set-up to be most objective (but not always possible)
> o   Would like to see examples of what you mean, hard to imagine from the theoretical text
> Tests will need to also need to be outcome based but specific so they can be mapped to guidelines
> ·      If we can organize the structure so that tests only map to a single guideline (vs a many to many relationship, this will be simpler and easier)
> o   Tests for guidelines or methods or functional outcomes???
> o   Tests can be replacements for others / one test for a method may be enough so other tests are not needed anymore…!
> Functional categories should be distinct and applicable
> ·      I’ve combined Mobility and Motor
> ·      Is Independence its own category?
> ·      Need to create the guidelines and tests and then can finalize the categories
> o   We don’t use functional categories for testing, they are present in the background and you can / may filter on them, but do not play a part in testing
> 12. Data Model
> ·      Can you explain?
> 13. Adjectival Ratings – Example
> ·      This example is a all-or-nothing example with lots of gaps to be filled, worked on it a lot and can share results, in its current form can’t be used for testing
> ·      A more granular example already available and lots of issues pop-up if you start to test, it will be a start / beginning for discussion on how to fill in the gaps…
> ·      https://docs.google.com/spreadsheets/d/1iCJfyMtcsSq7GHmwnc4aTNguadRfGDa0H8FBZMaJpcQ/edit#gid=633158340
> ·      https://docs.google.com/spreadsheets/d/1yTpbJFsaadIbCigV15YK6bWB4xk_MoOgREjGMHHMVgQ/edit#gid=1217251850
> 14. Documentation Hierarchy
> ·      Guideline/Functional Outcome (Scoring is handled at this level)
> o   Not sure after lot’s of tests… The methods contain the score and there are flavors to them making normalization needed.
> ·      1. Tests that do not require a judgement call (Yes/No)
> o   IF we have / apply adjectival rating, will we / do we want a baseline?
> o   If we want a baseline, we ALWAYS have a kind of yes/no to start with
> ·      2.Tests  with easy judgement call (A/B)
> o   Example: Should the image be alt=“” or alt=“[some text]”?
> o   The example given here is not easy… I remember well a discussion with Jon Avila on where he mentioned he wants alt text and I’ve created them as empty: https://a11yportal.com/
> ·      5. Usability testing and testing with AT
> o   Example: Do JAWS and NVDA users understand the alternative language when completing tasks?
> §  Already mentioned this above:
> §  A type of test Usability OR AT, aren’t those 2 completely different kind of tests?
> §  Usability is not adjectival rating, what kind of usability tests are we talking about here? Benchmarking?
> §  AT Testing, that is not a goal but a means, it helps but is not needed
> 15. Structure
> ·      (this needs a clear elaborated example, not a theory of some solutions we can thinks of, the devil is in the details…)
> ·      Is this an example of a guideline?
> ·      If so, there is no absolute relationship and order needed for the 1, 2 ,3, 4, and 5s.
> ·      Visually and programmatic need to be separated and tested for clear reasons, see my test work (can explain)
> ·      Headings are not necessary per se, what if you use LABELS or Legends or other ways to structure…?
> ·      Landmarks are not needed, also not technology agnostic
> ·      Nr 5.:
> o   Headings help users with limited cognition quickly orient to content and complete tasks
> o   Headings help screen reader users quickly navigate content
> §  What does this mean in this slide? Seems like an explanation of benefits, not test related…
> 16. Alternative Text
> ·      Have not worked on them
> 17. Visual Contrast/Affordances
> ·      This is not mature enough to make any judgement call yet
> ·      See my work at: https://docs.google.com/spreadsheets/d/1yTpbJFsaadIbCigV15YK6bWB4xk_MoOgREjGMHHMVgQ/edit#gid=1983238719
> ·      Have lots of findings for testing / scoring…
> 18. Clear Written Content
> ·      Have not worked on them
> 19. 20. 21. 22.
> ·      Lots of questions and gaps to solve before moving to this stage!
>
> > Op vr 10 jul. 2020 om 17:59 schreef Jeanne Spellman <jspellman@spellmanconsulting.com>:
> > > agenda+ Amend Representative Sampling proposal with language for transparency
> > > agenda+ Survey results of Conformance Scope
> > > agenda+ Rachael's proposal on Scoring
> > > == Links ==
> > > Minutes from 7 July meeting
> > > Results of Survey on Conformance Scope
> > > Slidedeck on Adjectival Rating Proposal and accompanying Spreadsheet
> > > == Conference Call info ==
> > > https://www.w3.org/2017/08/telecon-info_silver-fri
> > > IRC for minutes and notes is at irc.w3.org on channel #silver.
Received on Tuesday, 14 July 2020 10:24:35 UTC