Re: Summary and Minutes of Silver Virtual Meeting Tuesday Part 1 from Korn, Peter on 2020-03-10 (public-silver@w3.org from March 2020)

From: Korn, Peter <pkorn@lab126.com>
Date: Tue, 10 Mar 2020 20:53:15 +0000
To: Jeanne Spellman <jspellman@spellmanconsulting.com>, Silver Task Force <public-silver@w3.org>
Message-ID: <F8FA9573-367F-4E94-82C0-E2E56EAB7AAC@amazon.com>
Jeanne, all,

One addition I would make to the meeting summary – that in our discussion of “friction”, we also explored how it applies beyond cognitive/language complexity issues.  We discussed the example of a page with a small amount of text under two headings, one of which should be (but isn’t coded to be) an H2.  This is a technical violation of the structure criterion, but in terms of actual real difficulty for a screen reader user, this is more likely some added friction that doesn’t actually block their effective use of the page, though it may briefly confuse them.  We also discussed another example, counter to the statement that lack of captions in a video is an absolute blocker: Mel Brooks’ Silent Movie, which across 87 minutes has only a single piece of dialog – the word “no” uttered (in French) by the famous mime Marcel Marceau.  This example of what is an absolute violation of the caption requirement to make media accessible, which also seems more like a bit of friction than an actual blocker for a Deaf/Hard of Hearing viewer’s ability to enjoy this movie (and fully appreciate its meaning) un-captioned.

The take away for me from this portion of our discussion is that the concept of friction may be a way to harmonize both our existing approaches to success criteria in WCAG 2.x, with both the more nuanced situation in COGA, and the discussion around task-based analysis for understanding the impact.  If any technical failures of WCAG 2.x SCs create at most a small amount of friction in completing tasks, and likewise not more than a broadly similar amount of friction is created along COGA lines, then we might see “equal weighting” of COGA and “traditional” accessibility criteria.  And if such a “small amount of increased friction” can in turn align with an adjective<https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPxyE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643> (e.g. “acceptable” or “very good”) or if we prefer a letter grade (“B-” or “B+”) or if numbers must be used, a score (e.g. “85 out of 100”), then this might all align nicely.


Regards,

Peter
--
Peter Korn | Director, Accessibility | Amazon Lab126
pkorn@amazon.com

From: Jeanne Spellman <jspellman@spellmanconsulting.com>
Date: Tuesday, March 10, 2020 at 11:44 AM
To: Silver Task Force <public-silver@w3.org>
Subject: [EXTERNAL]Summary and Minutes of Silver Virtual Meeting Tuesday Part 1
Resent-From: <public-silver@w3.org>
Resent-Date: Tuesday, March 10, 2020 at 11:42 AM


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



== Summary ==

Homework Assignment: We reviewed the homework assignment to test the Scoring Example against a real website.  Two people responded.  One wrote a proposal for an adjectival scoring mechanism<https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPxyE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643>.  We discussed pros and cons of using this mechanism.  We need to test it against real websites.  The other person wrote Testing Headings<http://john.foliot.ca/demos/HeadingsTestOne.html> web pages to illustrate specific examples in HTML of code that technically passed the guideline for Headings, but failed it against specific disabilities.  This led to a detailed discussion and multiple verbal proposals of how scoring could be adapted to solve this problem.  Please send written proposals so we can follow up with these ideas.
·         Having a minimum that if a tool fails it, it can't pass
·         Adding Functional Requirements to each Guidelines
·         Taking the Testing Headings examples and turning them into Methods and linking Adjectival Scoring to specific tests in the new Methods.
·         Assistive Technology will change mind at the moment, same as browser. Methods must stick to is code correct or not?

Minimums: We discussed the Testing Headings examples for what was critical for which disabilities.  This led into a complex discussion of criticality and how to insure that we account for critical needs.  Some points made:
·         Agreement that we need to treat disabilities equally, but that some failures are more harmful than others.
·         We discussed the pros and cons of using adjectives or numbers for scoring.
·         We agreed to update functional user categories to the latest from the EU, but do want to be able to add categories for more granular breakdown for "limited cognition" and to add vestibular disorders.
·         If we don't take functionality into account, and we come up with a %, that % will mean different things to different groups.
·         We don't want to discriminate against one or more disability user groups. Whatever we come up with needs to simple and understandable (but we aren't in that phase yet).
·         Some members want to include weighting some guidelines more than others when accumulating a total score, others are strongly opposed to any mechanism that isn't well tested to insure that it does not discriminate against some disability groups.
·         One verbal proposal is to allow different disability groups to identify "show stopper" Guidelines.
·         Concerns that numeric scores in a granular scoring system would require thousands of data points to be valid and average out the weakness in the tool.
·         The challenge is not that one functional need does or does not have a critical item. The challenge is where 2 or more functional needs have a critical item that is in conflict.
·         The same guideline may be critical to one user group, but helpful to another user group, like captions are critical to hearing disabilities and helpful to cognitive disabilities.
·         How we measure things is more important than how we weight them. A lot of the problems in terms of coverage in wcag 2.x is how things are scoped and measured.
·         Let's start off even, and once we have better coverage of guidelines. Maybe later we can weight the guidelines when we have more. Let's start with "what is a reasonable thing to ask content authors to do".
·         Friction. Every place where the language is harder to puzzle out is friction. The accumulation of friction can take a site from great to struggling to impossible. Friction could be dealt with quite granularity with good measurement, where low scores across the board start adding up (or rather, not adding up!)
·         We could subtract data by user groups.
·         Concerns that people with cognitive disabilities are disadvantages when weighting exists.  We need to test each proposal with data from real websites.
·         Adding more cognitive guidelines that will be possible in WCAG 3 could even out the disadvantage that COGA experiences today.
·         Some solutions for one disability cause problems for others: Too much visual contrast is bad for some groups.  Large headings can trigger anxiety or be more difficult to read for screen magnifier users.

More detailed written proposals are needed that can be tested with real websites.

Should we use IETF standard RFC 2119<https://tools.ietf.org/html/rfc2119>?  RFC 2119 defines specific meanings of MUST, SHOULD, MUST NOT, SHOULD NOT, etc.  It is used by technical standards in many standards organizations.  It is proposed that WCAG has advanced from being a guideline to a technical standard and could benefit from the precise language of RFC 2119. It's unambiguous and clear. Failing means you didn't meet the requirement. It's about having clearly defined requirements with explicit language. Measurement and scoring should be unambiguous.

Comments:
·         WCAG 2.x doesn't use RFC 2119 because it is a guideline and W3C recommends keeping RFC 2119 use for technical specs.
·         The ARIA specification is an example that uses RFC2119.
·         I think as it stands now, the silver structure has both normative and informative. Anything that is a normative requirement is a must. A should or may would not be normative requirement.  We would need to be very careful using that language in informative documents. We've had charter issues, the language has strayed into informative documents.
·         I see value and intellectual rigor and honesty in being explicit. We call these guidelines, they became standards. The dirty secret is that no site can be perfect. If we take on this kind of language we need to be clear that we aren't going to say "must".
·         In terms of scoring, engineers need to have black and white decisions. Everything we do is based on binary decisions. In the must we have declared what that bright line is. Bright lines make measuring easier.
·         Elements interact with each other. WCAG has them broken down in different elements. Bring needs of people with cognitive disabilities into the mix adds a layer of complexity and conflicts.
·         I'm concerned about internationalization and not comfortable saying we need to put ourselves in the shoes of legislators.
·         Concerns around accessibility of using the RFC2119 capitalization of MUST as it is not well identified by some assistive technologies.

Agree that more specific proposals are needed.

== Minutes ==

https://www.w3.org/2020/03/10-silver-minutes.html


=== Text of Minutes ===

   [1]W3C



      [1] http://www.w3.org/




                               - DRAFT -



                       Silver Virtual F2F Tuesday



10 Mar 2020



Attendees



   Present

          jeanne, sajkaj, ChrisLoiselle, Laura, Jennie, kirkwood,

          Lauriat, Lucy, alastairc, Makoto, Chuck, JF, stevelee,

          KimD, AndyS, PeterKorn, mattg, Rachael, Detlev



   Regrets



   Chair

          Shawn, jeanne



   Scribe

          ChrisLoiselle, Chuck



Contents



     * [2]Topics

         1. [3]Review of homework: what insights did people gain

            from it?

         2. [4]Conformance and Minimums

         3. [5]Should we use IETF standard RFC 2119

     * [6]Summary of Action Items

     * [7]Summary of Resolutions

     __________________________________________________________



   <ChrisLoiselle> Scribe: ChrisLoiselle



   Bruce Bailey: The directions for the conformance exercise for

   headings or visual contrast were misunderstood. Tallying for

   qualitative assessement may not work. I have written up some

   information that I'd like to share



   <bruce_bailey>

   [8]https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPx


   yE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643&range=A1



      [8] https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPxyE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643&range=A1




   Spreadsheet that Jeanne shared before, Sample Scoring example

   is name of the google sheet



   <AndyS> AndyS present+



   Bruce Bailey: Ratings / Score is Outstanding / 4 , Very Good /

   3 , Acceptable / 2 , Unacceptable / 1



   <Zakim> bruce_bailey, you wanted to talk about conformance work



   Reviewing the heading , use of headings homework



   <kirkwood> well done Bruce!



   PeterKorn: Mirrors what we've been doing within Amazon. I

   really like the potential for this to work with scoring rubric.

   I.e. for this product release, these items are very good, these

   items are acceptable, etc. This is good.



   JF: This is getting better in terms of granularity. Where would

   we integrate en301 within the subsections? Where would we get

   into the 7 functional requirements?



   Jeanne: They will be in a different location. We may merge this

   into scoring.



   Lucy: Can you explain the scoring a bit more (to Bruce B.)



   Bruce: Not skipping a heading level , is outstanding / 4 or

   Very Good / 3.



   Bruce Bailey: If you skip levels, you at very best are at

   acceptable.



   Lucy: Acceptable to me seems that you've met every point.

   Actually, Very Good means you've hit every point.



   Bruce Bailey: I also looked at Clear Language and Visual

   Contrast of Text and went through the same rating



   Step 1 is to assign this to a web page in a website, then

   assign the website a number as well, so the numbers, mean, mode

   etc. would give a person a score for wcag 2.1 .



   3 out of 4 guideline is rated as outstanding...



   A rubric that works for assigning silver, bronze, gold rating

   to websites could be used as well as the rating scores.



   Shawn L: Opens to JF for comments on his work



   <JF> [9]http://john.foliot.ca/demos/HeadingsTestOne.html




      [9] http://john.foliot.ca/demos/HeadingsTestOne.html




   JF: Shares a Testing Headings Sample Page. Talks to heading

   structure being used properly. Each heading has a class.



Review of homework: what insights did people gain from it?



   The entire document is made up of headings, 18 heading 1's .



   My question is the score going to be the same as the previous

   example shared?



   The pages John shares are Testing and Scoring Headings - Master

   Page , [10]http://john.foliot.ca/demos/HeadingsTestOne.html




     [10] http://john.foliot.ca/demos/HeadingsTestOne.html




   Testing Headings - Test 1 ,

   [11]http://john.foliot.ca/demos/HeadingsTestTwo.html




     [11] http://john.foliot.ca/demos/HeadingsTestTwo.html




   Testing Headings - Test 2 ,

   [12]http://john.foliot.ca/demos/HeadingsTestThree.html




     [12] http://john.foliot.ca/demos/HeadingsTestThree.html




   Testing Headings - Test 3,

   [13]http://john.foliot.ca/demos/HeadingsTestFour.html




     [13] http://john.foliot.ca/demos/HeadingsTestFour.html




   At end of each page, the negative impact on functional

   requirements is listed in a table



   PeterKorn: These detailed examples are fantastic. Comment: Some

   of the examples are easy to find with a programmatic tool.



   If a tool could have found it, and you didn't do it, it is not

   acceptable.



   <jeanne> +1 Peter



   JF: Tools aren't going to catch all examples. I wouldn't

   outright fail for all users. Failing for some users is valid.

   The Functional Requirements are key to a scoring rubric.



   Shawn L: We can talk to this in minimums



   <david-macdonald> interesting that JAWS and NVDA announce the

   level without aria level for <div role="heading"

   class="level_2" ...



   Lucy G: I love the examples, John. I see Bruce's where a tally

   needs to add up to 100 points. Everything would be weighted and

   have weights within it.



   JF: A more granual score helps content creators as well.



   Lucy G: I think we are on the correct path.



   Jeanne: If we look at Bruce's example and drill down into more

   granual approach, would that help?



   JF: Explicit definition of semantic headings would be useful.

   ... The structure vs. the visual presentation helps cognitive

   disability user group.



   <Chuck> +1 to Jeanne's idea



   Jeanne: What if we took each of the 4 examples JF has and

   turned them into the correct thing, and turn those into

   methods? If we then reference the individual methods within

   Bruce's example to know what they need to review for HTML and

   scored that way?



   JF: How many methods could be constructed using the ACT rules

   format?



   <Lauriat> +1



   Jeanne: Exactly, using ACT in methods. Scoring could reference

   those.



   Referencing ACT tests would work well in methods.



   David McD: JAWS reads those headings properly, wondering what

   accessibility supported score would be if semantic code is not

   totally correct?



   Normative and methods comment: Looking at Bruce's examples,

   column B could be methods.



   Techniques would be technology specific and non normative



   <JF> +1 to Lucy



   Lucy G: Assitive Technology will change mind at the moment,

   same as browser. Methods must stick to is code correct or not?



   <Chuck> +1 to lucy



   <jeanne> +1 Lucy



   <laura> +1



   +1 to Lucy (Chris without scribing)



   Shawn L : To group: Let us shift gears to next agenda item



   Jeanne: Scoring examples topic: Testing real websites is best.

   It changed how I was approaching things. Test against a real

   website.



Conformance and Minimums



   <sajkaj> +1 to Jeanne



   Shawn L: Criticality is important. How does one express it in a

   score and fundamental critical issue of accessibility?



   Very poor, acceptable , etc. Within Silver, do we want to be

   the ones drawing the line on critical issues?



   PeterKorn: Looking at JF's second example, if the page only had

   two headings, one a heading 1 and one a heading 2...usage

   without vision, hiearchy structure would be a fail, but would

   the impact to the user be significant?



   <JF> exactly



   Is it not usable without vision?



   We need to think of overall functionality and impact of the

   fail and how we view the site?



   JF: Peter , I agree. I built the examples where structure

   created for screen readers were ok. If I did unstyled divs,

   structure would off visually for sighted users (COGA) Impact of

   different user groups needs to be looked at. Visual users vs.

   non visual users / screen reader users.



   <Zakim> bruce_bailey, you wanted to say that i like adjectival

   rating over tally because critical aspects could be in

   "average" and above



   I.e. h2 to h5 , still usable ? Or a fail? How is it rated /

   scored? As opposed to pass or fail.



   Bruce Bailey: Critical are acceptable or above in my example.



   Charles Hall: I wanted to add to the criticality severity

   comments. If I'm evaluting a rubric against a subset vs. an

   author of the website, I the author have the right to scope of

   my page through a task flow



   <JF> Not sure if we've arrived at consensus to Charles' point



   <JF> \regarding scoping



   Chris Loiselle to Charles: If I'm writing that wrong, please

   add in your comments. Sorry!



   <jeanne> +1 JOhn - The consensus on scope was a logical subset,

   not a task



   PeterKorn: Scoring with numbers is a very easy way to get lost

   in what does 85 % mean? Adjectivity based approach may lead to

   more progress quickly on scoring.



   <PeterKorn> (to respond)



   Lucy G: If adjectival route is the way we are going, numbers

   will also help in the end as well. I.e. full one point can be

   broken down as well. Requirement has its own scoring.



   When it comes to a critical criteria, that is when we can

   weight it.



   Think of it as containers. Language would be its own container.

   Is language more critical than headings?



   <Chuck> we are at time to change scribes, and someone else will

   need to monitor the participants list for raised hands, as I

   will not be able to scribe and watch that list.



   AndyS: Page is structured. Ads are present.



   @Chuck. I can continue to scribe after a five minute break.



   <CharlesHall> that was CharlesHall that mentioned the <aside>

   as part of the scope



   I will return very soon.



   <Chuck> scribe: Chuck



   <sajkaj> I can



   Jeanne: Any followup on what Andy said?



   <Zakim> JF, you wanted to respond to Peter's comment about

   numbers



   JF: responding to Peter's comments on number. Appreciating all

   the issues a number means, my understanding is...

   ... This exercise is about getting a number. We currently have

   100% or zero (in wcag 2.x)

   ... A number can be misleading at times. If anybody used the

   chrome tool (lighthouse), at the end of the process...

   ... Chrome added a score. We don't know where that score came

   from. If we start from premis that you never get 100, that

   percentile becomes incentive for doing better (72%, let's try

   to get to 85%).

   ... Peter, as much as numbers can be a rat hole, it's a

   critical part of what we are trying to do in silver.



   <ChrisLoiselle> Chuck, I can scribe again.



   <ChrisLoiselle> Scribe: ChrisLoiselle



   <Chuck> PK: I wasn't here for all of silver discussions... my

   understanding is that the high order bit is to move away from

   pass-fail perfection, to "mildly or largely".. numbers isn't

   the goal.



   <Chuck> PK: The goal is to get away from pass/fail. I like the

   rubrik. We can evaluate the rubrik evaluation. Some things are

   acceptable, some things are good.. or almost everything is

   great but some are good...



   PeterKorn: The goal was to move away from pass / fail. I like

   the rubric. If we have a way to collect up the rubric

   evaluations, most are very good, the product will be very good.



   <Chuck> PK: Whether or not we assign numbers, we need to think

   on severity, the people, and the impacts. I like adjectival

   approach. I don't know what 87% means. I do know what

   acceptable and very good means.



   <CharlesHall> to PeterKorn’s point, a qualitative metric can be

   converted to a quantitative score based on the number of

   adjectival categories



   <Chuck> JF: I agree with that statement, in regulatory

   environment we need to hit a bar. "Intelectually honest" means

   that there isn't something that is 100%.



   <Chuck> Shawn: I think you are agreeing on different points.

   Let's return to queue.



   <Chuck> detlev: I was wondering... you mentioned the 9 user

   accessibility needs. In the rating, there would be a plan to

   issue different types of results for different impacts...?



   <Chuck> detlev: Also, why is there a mismatch between user

   accessibility needs in EU implementation (there you are missing

   limited reach)...



   <Chuck> detlev: you have two categories for people with hearing

   problems (hard of.. and no). Is there a conscious decision to

   drop the difference between hard of and no hearing?



   <Chuck> Jeanne: It may be that I took an old version of the EU

   directive. Can you send me a link to the current?



   <Chuck> Detlev: I was wondering why users with limited reach

   was left out. someone explained... I'm not convinced there's a

   good reason to leave it out.



   <CharlesHall> we have discussed adding functional needs to the

   EN standard, like “intersectional”



   <Chuck> Jeanne: Great. We did have a discussion of adding

   things, for limited cognition, vestibular. We never discussed

   dropping any, so I may have had an old version.



   <Chuck> detlev: Is there an intention in all the scoring for

   differentiated by user group? At one point it was a no-no. At

   some point we decided we didn't want to differentiate, but not

   sure if that has changed.



   <Chuck> Shawn: We've been thinking about it more closely to

   what John demonstrated in that for different guidelines looking

   at the effects on different user needs given the task or scope

   of testing.



   <Chuck> Shawn: We don't have a fleshed out illustration of

   that, but we have been considering. We want to accomplish

   making sure that we are not inadvertently leaving user needs

   out.



   <Chuck> Shawn: For example, if it works great for limited

   cognition but it completely leaves out users who use screen

   readers, we want the ability to highlight that, and visa-versa.



   <Chuck> Shawn: Like if everything is symantically accurate, but

   visually not.



   <Zakim> jeanne, you wanted to talk about testing criticality

   and severity



   <Chuck> Jeanne: I'm glad we are having conversation about

   criticality. I know there's a lot of... people who feel it's

   very important. But one of the things we agreed on in Silver is

   we'll be data driven and research based.



   <Chuck> Jeanne: whenever we tried to score real websites

   against criticality we consistently didn't find a way to do

   that in a way that didn't penalize people with some

   disabilities.



   <Chuck> Jeanne: We have to find a way to stop penalizing people

   with some sorts of disabilities. We could not find a way to

   test it that didn't structurally disadvantage people with low

   vision and congitive issues.



   <KimD> +1 to Jeanne



   <Chuck> Jeanne: It's great to talk about in theory, but if you

   want to propose, you need to show research that demonstrates

   everyone is speaking equally.



   <Chuck> lucy: Speaking to numbers, we need to offer something

   that leaders can relate to. If we offer something confusing,

   they will blank us out and do other tasks.



   <AndyS> Comment: IMO the only way to treat all disabilities

   equally involved a customization and personalization, so that

   individual needs are accommodated *as needed*



   <JF> +1 to Lucy



   <Chuck> lucy: We have to have those numbers. A lawyer is not

   going to understand what it means to have "this or that" level.

   They want a number and a way to improve that number.



   <Chuck> Rachael: I have 3 things to keep in mind. Been brought

   up before, may be not in in this call. If we don't take

   functionality into account, and we come up with a %, that %

   will mean different things to different groups.



   <Chuck> Rachael: 80% may mean one thing to ceasures and another

   to a blind.



   <Chuck> Rachael: If you have a group of individuals who fall in

   the category of blind, vs people in coga, that hierarchy may

   introduce discrimination.



   <Chuck> Rachael: This is such a rich and fantastic discussion,

   but whatever we come up with needs to be understandable and

   simple.



   <Chuck> JF: Addressing a comment from Jeanne. I'm support of

   all user groups including coga, I do so in action and words.

   The reality is that if we take that user group in

   consideration, that is one user group.



   <Chuck> JF: I want to recognize severities. Those heading

   examples, if I remove visual structure, a person who has a

   cognative issue and is blind is doubly disadvantaged.



   <Chuck> JF: If we determine that an impact has a greater impact

   against a group, we can factor that in. That's what I said 6

   months ago. Not all things are created equal. We have to boil

   this down to a score and strategies to improve the score.



   <Chuck> Jeanne: I only disagree with the weighting.



   <Chuck> Jeanne: "this is more severe than that". I don't have

   an issue with your example. At the guideline level I think you

   can do that.



   <Chuck> Lucy: The weighting should be by criteria.



   <Chuck> Lucy: So many disabilities are impacted by this or

   that. I won't pit one against another, but if it's 4 vs 1, we

   have no choice but to weight that.



   <Chuck> Jeanne: I disagree. When you start looking at it as a

   whole, there are too many things that we give guidance for that

   are heavily weighted to blind vs cognative.



   <Chuck> Jeanne: The way we are set up including Silver is that

   we measure things granularly, and we say this is more important

   than that, but when we look at it as a whole...



   <CharlesHall> would a priority or severity not be functional

   need agnostic?



   <Chuck> Jeanne: granularly, this is a complete blocker, but

   someone with a cognative issue may be able to work it out. But

   in total the cognative issues become a blocker, because you

   have to look at it in totality.



   <Chuck> Jeanne: Cognative loses when we say "this individual

   piece" is more important than "this one".



   <jon_avila> I object to the notion that WCAG is heavily focused

   on Blind and visual impairment. There are many WCAG criteria

   that are aimed at a wide range of users with disabilities



   <Chuck> Jeanne: That alt text is more important than

   captioning. It's the totality of the website, it's all the

   guidelines. If they are getting a lower weight because of each

   individual guideline, they get a website that they cannot use.



   <Chuck> Jeanne: Can someone else make the argument better?



   <Chuck> Shawn: It's a complex topic and we should keep in mind

   as we work through. We all have the same level goals for

   conformance.



   <Chuck> detlev: Just to get to Jeanne's argument about critical

   issues, penalizing others. Why is that a problem? Why wouldn't

   it be possible to ask all the groups involved to basically

   identify show stopper issues. We know those.



   <Chuck> detlev: keyboard trap, lack of captions, so on. Maybe

   show stopper for cognative individuals. Basically ok to collect

   those issues and make them critical issues.



   <Chuck> detlev: Can you explain further Jeanne?



   <Chuck> Jeanne: Let's honor queue.



   <Zakim> bruce_bailey, you wanted to say numbers can be used to

   measure progress, but i have never seen numbers that were

   comparable from one website to another (or one tool to another)



   <Chuck> Bruce: We've been trying (everyone!) to rate your

   website since days of Bobby. All of these rules, all the years,

   it's only ever useful for the developer to make progress. That

   you aren't regressing.



   <Chuck> Bruce: Lot's of tools will give you a percentage.



   <Chuck> Bruce: Those only make a difference on one domain. You

   can't compare an 87 on one site to an 86 on another site. Or

   even cross tools. I don't feel like we will make progress if we

   try to have granular scoring systems.



   <Chuck> Bruce: Unless there are 1000s of data points. Enough

   data points that you aren't doing manually that eventually the

   weaknesses of the tool averages out.



   <PeterKorn> +1 to Bruce



   <Chuck> Bruce: I won't be able to say that one outstanding site

   compares well to another.



   <Chuck> Matt: Returning to yesterday, seems like there are 2

   different criteria you are trying to assess. How well has the

   author created the content, what score can you give them, vs

   how do you make this understandable to the user.



   <david-macdonald> +1 to bruce



   <CharlesHall> to Detlev’s point, the challenge is not that one

   functional need does or does not have a critical item. i think

   the challenge is where 2 or more functional needs have a

   critical item that is in conflict.



   <Chuck> Matt: There's a bucket approach, there's probably going

   to be something that they haven't met requirements to they met

   them all. There are different methods which have different

   impact on different groups.



   <Chuck> Matt: That's different from the number score that the

   producer gets. I think that these two different aspects need to

   be viewed separately.



   <Chuck> Shawn: Indeed.



   <Chuck> JF: Two thoughts... Jeanne mentioned one requirement is

   captions. If we look at the severity of captions. If you are

   deaf and a video is missing captions, that's critical. If I

   have a cognative issue and I have captions..



   <Chuck> JF: That helps me. Offering captions to a cognative

   user offers some benefit, but to a deaf user is completely

   critical.



   <JF>

   [14]https://docs.google.com/spreadsheets/d/1EXw5W6SuMXk7mrFN33h


   HVQuCn0DhKysXmUCKNML3SKg/edit#gid=108726882



     [14] https://docs.google.com/spreadsheets/d/1EXw5W6SuMXk7mrFN33hHVQuCn0DhKysXmUCKNML3SKg/edit#gid=108726882




   <Chuck> JF: How do we make this fair, realizing we can't get

   every individual out there? In an earlier example (6-8 months

   ago), I put forward a draft proposal, that had suggested that

   as we were calculating scores we used a weighting mechanism,

   and use that as a multiplier.



   <Chuck> JF: We would ultimately have a better... more data

   points. More data points will give us a better score.



   <jeanne> COGA with individual guideline weighting.



   <Chuck> Shawn: Logistically, I'd like to get through the queue

   and then get to the last agenda item.



   <Zakim> alastairc, you wanted to say that how we measure things

   is more important than how we weight them



   <Chuck> Shawn: This discussion has been great in taking in all

   thing things we need to consider.



   <Detlev> @CharlesHall: "challenge is where 2 or more functional

   needs have a critical item that is in conflict" - can you speak

   to that? I'd love to know where solving one showstopper issue

   creates a real problem for another group...



   <Chuck> Alastairc: Not sure weighting is necessarily going to

   be the answer. A lot of the problems in terms of coverage in

   wcag 2.x is how things are scoped and measured. Been clear with

   addressing coga issues.



   <Chuck> Alastairc: If there is a functional outcome that has an

   equal weighting per guideline, that may be a reasonable way to

   proceed. It's within the guideline to decide what content

   authors need to do.



   <Chuck> Alastairc: Let's start off even, and once we have

   better coverage of guidelines. Maybe later we can weight the

   guidelines when we have more. Let's start with "what is a

   reasonable thing to ask content authors to do".



   <Chuck> PK: Friction comes to mind. Every place where the

   language is harder to puzzle out is friction. The accumulation

   of friction can take a site from great to struggling to

   impossible. The spoons model.



   <Chuck> pk: I think the same concept applies to other

   disabilities. What we thought of traditionally as pass/fail.

   The header example, a page with 2 headers is a little bit of

   friction, a formal failure. Does it prevent blind individuals

   from using the page? Probably not.



   <alastairc> +1 to thinking about friction, which again could be

   dealt with quite granularity with good measurement, where low

   scores across the board start adding up (or rather, not adding

   up!)



   <Chuck> PK: Mel Brooks silent movie has text in it, except for

   one individual who has one word. Is it a problem for a deaf

   person who wants to watch that movie? I think we can look at a

   friction based model.



   <JF> +1 to Peter's point - barriers are based on the functional

   requirements of the different disability types



   <Chuck> pk: ... come up with a notion that there is a little

   bit of friction for blind folks because some small pages don't

   have headings right, there's a lot more friction for a

   cognative user. We can elevate a site and say that can be no

   worse than "good"...



   <JF> +1 to contemplating A,B,C,F scoring



   <Chuck> pk: therefore a site does or doesn't make it. Back to

   numbers. A+, B-, I think there are mechanisms that can be

   fairly granular, like we are getting a C- and we want to get a

   C+. We can get caught up in a scale of 100% and we are arguing

   if 86 or 87 is good enough.



   <Zakim> jeanne, you wanted to say to Detlev when each group

   gets a show stopper, then a bonus is given to that group. But

   for COGA, its the overall sum of all the guidelines. So they



   <david-macdonald> +1 Peter had a great concept of "friction"...

   at some point there is too much friction to use.



   <Chuck> Jeanne: To detlev to weighting, when each group gets a

   show stopper, then a bonus is given to that group. For coga

   it's the overall.



   <Chuck> Jeanne: Each group gets more points for show stoppers,

   and coga gets less. Physical or sensory issues have more

   points. But when coga looks at the overall score... like a

   website gets a 93, but for coga it's not accessible.



   <Chuck> Jeanne: For coga it's more of an overall issue. That's

   why I say that people need to test their proposals across... on

   real websites. We found that it wasn't reflecting the issues

   for coga users correctly.



   <alastairc> Jeanne - couldn't that be addressed by having

   plenty of guidelines for COGA (based on things we can't fit in

   WCAG 2.x), so that without meeting enough of those, you

   wouldn't pass?



   <Detlev> Can I answer to that directly (briefly)?



   <Chuck> Jeanne: It's great to have these proposals. John's

   proposal of putting the impact at the guideline level and we

   trickle that down, that can work. But if we put weighting in,

   I'll ask for real examples with real websites and show it's

   fair.



   <Chuck> Jeanne: I think we can do it without weighting and go

   back to an author and say "here's total score, here is how it

   breaks down by disability", I think that can work. I think we

   can avoid the weighting issue.



   <Chuck> detlev: I think you are mixing 2 different things.

   Nobody argues that wcag doesn't have enough for coga. Coga

   issues are underrepresented. I don't see a real conflict for

   critical issues by groups.



   <Chuck> detlev: cognative folk that are impacted by many

   different things combined, if the new guidelines and rubriks

   includes them, I would rather think of subtracting points.

   Cognative score would show the deficiencies clearly.



   <Chuck> detlev: You'd still have a way of showing off critical

   issues by groups.



   <jeanne> I would be interested to see a proposal with real

   data. I would help with testing.



   <Chuck> detlev: There are absolute show stopper for some users,

   and we need to show that. Jeanne you said that these can drown

   out coga issues, but I think that it can be reflected properly

   and benefit coga users.



   <Chuck> Andy: The cognative issue is such a complex subject. It

   overlaps with neurological, senses, perceptions. There are so

   many different varieties (A.D.D) will be different from

   educational handicaps.



   <jeanne> alastair, that is possible -- again, I would like to

   see a proposal with mockup of the data with proposed

   guidelines.



   <Chuck> Andy: Becomes this big mess of how we divide up, should

   we divide up... becomes a broad spectrum. Points towards the

   ability to customize and personalize as the way to ensure that

   all groups have equal access.



   <alastairc> Jeanne - I think we need to leave

   weighting/criticality until we have better coverage of

   guidelines.



   <Chuck> Andy: In terms of a triangle, the user is one point,

   the author is another point, and the technology is the 3rd

   point. Those 3 points need to work together in a way for every

   user to be accommodated.



   <alastairc> we aren't sure what / how-many new guidelines are

   likely to come from COGA.



   <Chuck> andy: I don't think there's a way for a website to

   address everyone all the time. I understand what Jeanne is

   saying in terms of weighting. Weight vision perception gets the

   site up to a high level of acceptance, but the things that made

   that...



   <Lauriat> +1 to Alastair



   <Chuck> andy: site a vision number may actually hurt coga

   issues. high contrast can cause coga issues. There's a lot of

   interaction there that can... how do we really make that into a

   matrix where you push this up and the other comes down.



   <Chuck> andy: Squeezing a toothpaste tube, one end decreases,

   the other gets larger. I don't know that weighting is the way

   of solving. I think customization is the way to achieve the

   ultimate goal. These are thoughts in my mind.



   <kirkwood> +1 to customization



   <Chuck> Shawn: We are unlikely to come to complete resolution

   to this conversation, we need to finish queue and move on.



   <JF> FWIW, the Personalization TF is looking at 'customization"

   today - but we lack the technology to make it happen today



   <Chuck> Lucy: If any disability is poorly affected by any

   system we come up with, that's our failing. We don't have the

   data and don't know, and if it's still failing cognitive,

   that's our responsibility to address that group.



   <jeanne> Andy, that's why I like that proposal of putting data

   into each guideline about each disability effected, but then

   giving a total score for each disability.



   <Chuck> Lucy: We have fixated for so long on what we know, we

   still need to do the research and determine what we don't know.



   <Chuck> lucy: I'll say that I don't know enough, but I do take

   it into effect, and I want to know more.



   <Chuck> Charles: Historic: we are all in general agreement that

   this is complex, which is why it takes a long time to get

   through one point of conversation, but I don't think it's

   impossible to account for something that is critical to one

   functional need in order to achieve a score.



   <Lauriat> @JF: I'd like to know more about the technology

   needed to make that happen, or at least prototype things out to

   figure out how to make that happen.



   <Chuck> Charles: If we have 9 functional need categories and

   one has an issue, then they all do. Historic conversations,

   where there's a conflict. Where there is a challenge is where

   something in one category that is in conflict with another

   issue in another category...



   <Chuck> Charles: headings large and in one color may conflict

   with someone where it could trigger anxiety. It's fine to

   consider this, but we need to be aware of the conflicts.



Should we use IETF standard RFC 2119



   <Chuck> Shawn: With that, we have a lot to think about and work

   through. Let's move on to whether to use rfc 2119.

   Must/should/must not/should not



   <Detlev> @Charleshall: I think research shows ALL CAPS is

   harder to read, cannot think who would benefit



   <Chuck> Jeanne: W3C and standards organization around the world

   rely on this particular RFC (request for comments). Very old,

   used in technical standards by many standards orgs.



   <Chuck> Jeanne: W3C in the past has said they don't recommend

   it for guideline use, and not used in WCAG. Designed for

   technical specs that require interoperability. More about...

   John may have lots of examples.



   <sajkaj> Think APIs



   <Chuck> Jeanne: Because it's not included, we are interested in

   including it in Silver. This is John's proposal. John...



   <CharlesHall> @Detlev. i agree. was trying to create an example

   on the fly since I couldn’t recall some of the specific

   examples we discussed in the past. particularly with insights

   from Cybelle.



   <Chuck> JF: The part of the issue is that wcag has moved from

   being a guideline to being a standard. That's the reality. We

   have govts that say "you must meet wcag 2.x AA". because of

   that, we need to have these bright and measurable points



   <jon_avila> The ARIA specification is an example that uses

   RFC2119



   <Chuck> JF: To meet that requirement. When we talk about the

   users, the 3 points, there's a 4th point... legal requirements.

   RFC 2119 calls out must should and may.



   <Chuck> JF: It's unambiguous and clear. Failing means you

   didn't meet the requirement. It's about having clearly defined

   requirements with explicit language.



   <jeanne> [15]https://tools.ietf.org/html/rfc2119




     [15] https://tools.ietf.org/html/rfc2119




   <Chuck> JF: I want to use it, when we created a guideline, that

   was guidance. Because we are at a point where we have

   measurement and scoring... as part of that requirement we

   should use very clear language.



   <CharlesHall> +1 to JF on use of an unambiguous standard



   <Chuck> JF: We've got clarity there.



   <Zakim> alastairc, you wanted to say that must = the

   requirement, and should/may wouldn't be suitable for stating

   the requirement.



   <Chuck> Alastairc: I think as it stands now... silver

   structure... normative and informative... anything that is a

   normative requirement is a must. A should or may would not be

   normative requirement.



   <Chuck> Alastairc: They are very tied together. We would need

   to be very careful using that language in informative

   documents. We've had charter issues, the language has strayed

   into informative documents.



   <Chuck> Alastairc: introduces unnecessary uses. I don't

   disagree or agree, I think our current approach takes it into

   account already.



   <Chuck> DM: The sc were designed to be testable statements. If

   the statement is true, you meet the criteria. Every one of

   those statements is a must.



   <jon_avila> I find it ironic that folks have said no disability

   should be weighted over others -- yet it was said cognitive

   disabilities are impacted by the wholistic site and thus trump

   everything in terms of importance



   <Chuck> DM: We do have must statements, we don't have shoulds

   in the normative docs.



   <Chuck> PK: I see value and intellectual rigor and honestly in

   being explicit. We call these guidelines, they became

   standards. The dirty secret is that no site can be perfect. If

   we take on this kind of language we need to be clear that we

   aren't going to say "must"



   <alastairc> jon_avila I think there is just recognition that

   there as been a gap, and there needs to be work to address the

   gap.



   <Chuck> PK: when the must is not achievable. I don't have

   strong feeling for or against. But if we use this language, we

   need to be clear that we are creating an impossible "must".



   <Zakim> JF, you wanted to note that there is a real difference

   (in RFC 2119) between MUST and must (case sensitive)



   <jon_avila> For what it's worth - WCAG 2.0 and 2.1 are

   standards according to WCAG itself - "The WCAG 2.0 document is

   designed to meet the needs of those who need a stable,

   referenceable technical standard."



   <Chuck> JF: Alastair... concern about language. Must should and

   may are always in upper case. MUST and must don't equal the

   same thing.



   <alastairc> let's not go down that route!



   <Chuck> JF: "must" is conversational. That avoids some of that

   problem.



   <AndyS> Thats scary



   <Chuck> JF: As peter noted, our guidelines have been made

   standards. In terms of scoring, engineers need to have black

   and white decisions. Everything we do is based on binary

   decisions. In the must we have declared what that bright line

   is.



   <sajkaj> Methinks JF forgot about analog engineering!



   <david-macdonald> There are no "Must", "should" or



   <Jennie> I would be concerned with upper case and lower case

   differences in meaning, from a cognitive standpoint.



   Chris Loiselle comment on standard:

   [16]https://www.iso.org/standard/58625.html, point to standard



     [16] https://www.iso.org/standard/58625.html




   <Chuck> Andy: I want to mention, all the other standards

   organizations, they use this verbage.



   <david-macdonald> "may" in WCAG 2 or 2.1



   <JF> +1 to bright lines



   <JF> bright lines make measuring easier



   <Chuck> Andy: But if we were going to go there, it's important

   to note that this is a very bright line. Requires additional

   diligence to make sure that "shall" is really understood and

   isn't going to create situations that cannot be absolutely

   achieved.



   <jeanne> I worry that accessibility needs are not oriented

   toward bright lines.



   <Chuck> Andy: With all of the many things we are talking about

   that interact with each other. WCAG has them broken down in

   different elements. These elements interact with eachother.

   Bring Coga into the mix adds a layer of complexity and

   conflicts.



   <david-macdonald> There are no instances "Must", "should",

   shall, or "may" in WCAG 2 or 2.1



   <Chuck> Andy: If one "shall" conflicts with another "shall",

   we'll get into trouble.



   <jeanne> +1 Andy



   <jon_avila> I agree that use of these RFC 2119 terms will only

   complicate things



   <Chuck> Andy: A bit more ambiguous from other standards from

   other groups. ANSI specs on displays and fonts, their language

   and examples are set in technology of the late 80's and early

   90's.



   <KimD> I'm concerned about internalization and not comfortable

   saying we need to put ourselves in the shoes of legislators.



   <Chuck> Andy: We start to get into ambiguous realm when we

   discuss different browsers render fonts differently. I like the

   idea of adopting this more affirmative use of terminology, but

   brings a great deal of complication.



   <alastairc> Is it worth trying this language out in a method?

   That seems to be the most suitable place.



   <Chuck> Lucy: I want to see it applied and see how it works,

   and then when John responded... I say what Peter said... this

   is not a possible thing to accomplish and remain accessible

   itself.



   <Lauriat> @Alastair: No, that would make tech-specific methods

   normative.



   <Chuck> Lucy: I like the idea, in the terms of what we have

   been thinking of all along, I want to see it apply to some

   examples first.



   <alastairc> Um, I'm not sure it will help with the clear

   language.



   <Chuck> Lucy: I can't tell the difference between MUST and

   "must".



   <jon_avila> There are settings in screen readers to communicate

   capitalization of text.



   <JF> <span aria-label="RFC 2119 MUST">MUST</span>



   <Chuck> Shawn: My proposal is to go through the minutes and

   pull out the pros and cons of going with this language and

   keeping the current language.



   <Chuck> Shawn: And then we can use that as a summarization for

   folk who couldn't make it to this call.



   <alastairc> Should a guideline include should/may?



   <Chuck> JF: I pasted some code in RFC to address your concerns.



   <Jennie> Won't the ARIA label only assist those using screen

   readers, but not those with reading challenges with vision?



   <KimD> +1 to Jennie



   <david-macdonald> There are no instances "Must", "should",

   shall, or "may" in WCAG 2 or 2.1 success criteria



   <Chuck> JF: <discusses rfc-2119 must>



   <Chuck> Shawn: worth looking into annotations.



   <jon_avila> ARIA labels on non-interactive text doesn't work

   well with screen readers.



   <jeanne> +1 Jennie



   <alastairc> JF - would a guideline include should or may?



   <Chuck> Shawn: With that, thank you everyone, and bringing

   examples. Super helpful as a part of these complex topics.



   <Chuck> Shawn: Anything else Jeanne?



   <JF> @ Alastair - it could



   <Chuck> Jeanne: Incredibly helpful, great conversations, we'll

   keep working.





   [End of minutes]

     __________________________________________________________
Received on Tuesday, 10 March 2020 20:53:48 UTC