Summary and Minutes of Silver Virtual Meeting Tuesday Part 1

== Summary ==

Homework Assignment: We reviewed the homework assignment to test the 
Scoring Example against a real website.  Two people responded.  One 
wrote a proposal for an adjectival scoring mechanism 
<https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPxyE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643>.  
We discussed pros and cons of using this mechanism.  We need to test it 
against real websites.  The other person wrote Testing Headings 
<http://john.foliot.ca/demos/HeadingsTestOne.html> web pages to 
illustrate specific examples in HTML of code that technically passed the 
guideline for Headings, but failed it against specific disabilities.  
This led to a detailed discussion and multiple verbal proposals of how 
scoring could be adapted to solve this problem.  Please send written 
proposals so we can follow up with these ideas.

  * Having a minimum that if a tool fails it, it can't pass
  * Adding Functional Requirements to each Guidelines
  * Taking the Testing Headings examples and turning them into Methods
    and linking Adjectival Scoring to specific tests in the new Methods.
  * Assistive Technology will change mind at the moment, same as
    browser. Methods must stick to is code correct or not?

Minimums: We discussed the Testing Headings examples for what was 
critical for which disabilities.  This led into a complex discussion of 
criticality and how to insure that we account for critical needs.  Some 
points made:

  * Agreement that we need to treat disabilities equally, but that some
    failures are more harmful than others.
  * We discussed the pros and cons of using adjectives or numbers for
    scoring.
  * We agreed to update functional user categories to the latest from
    the EU, but do want to be able to add categories for more granular
    breakdown for "limited cognition" and to add vestibular disorders.
  * If we don't take functionality into account, and we come up with a
    %, that % will mean different things to different groups.
  * We don't want to discriminate against one or more disability user
    groups. Whatever we come up with needs to simple and understandable
    (but we aren't in that phase yet).
  * Some members want to include weighting some guidelines more than
    others when accumulating a total score, others are strongly opposed
    to any mechanism that isn't well tested to insure that it does not
    discriminate against some disability groups.
  * One verbal proposal is to allow different disability groups to
    identify "show stopper" Guidelines.
  * Concerns that numeric scores in a granular scoring system would
    require thousands of data points to be valid and average out the
    weakness in the tool.
  * The challenge is not that one functional need does or does not have
    a critical item. The challenge is where 2 or more functional needs
    have a critical item that is in conflict.
  * The same guideline may be critical to one user group, but helpful to
    another user group, like captions are critical to hearing
    disabilities and helpful to cognitive disabilities.
  * How we measure things is more important than how we weight them. A
    lot of the problems in terms of coverage in wcag 2.x is how things
    are scoped and measured.
  * Let's start off even, and once we have better coverage of
    guidelines. Maybe later we can weight the guidelines when we have
    more. Let's start with "what is a reasonable thing to ask content
    authors to do".
  * Friction. Every place where the language is harder to puzzle out is
    friction. The accumulation of friction can take a site from great to
    struggling to impossible. Friction could be dealt with quite
    granularity with good measurement, where low scores across the board
    start adding up (or rather, not adding up!)
  * We could subtract data by user groups.
  * Concerns that people with cognitive disabilities are disadvantages
    when weighting exists.  We need to test each proposal with data from
    real websites.
  * Adding more cognitive guidelines that will be possible in WCAG 3
    could even out the disadvantage that COGA experiences today.
  * Some solutions for one disability cause problems for others: Too
    much visual contrast is bad for some groups.  Large headings can
    trigger anxiety or be more difficult to read for screen magnifier
    users.

More detailed written proposals are needed that can be tested with real 
websites.

Should we useIETF standard RFC 2119 
<https://tools.ietf.org/html/rfc2119>?  RFC 2119 defines specific 
meanings of MUST, SHOULD, MUST NOT, SHOULD NOT, etc.  It is used by 
technical standards in many standards organizations.  It is proposed 
that WCAG has advanced from being a guideline to a technical standard 
and could benefit from the precise language of RFC 2119. It's 
unambiguous and clear. Failing means you didn't meet the requirement. 
It's about having clearly defined requirements with explicit language. 
Measurement and scoring should be unambiguous.

Comments:

  * WCAG 2.x doesn't use RFC 2119 because it is a guideline and W3C
    recommends keeping RFC 2119 use for technical specs.
  * The ARIA specification is an example that uses RFC2119.
  * I think as it stands now, the silver structure has both normative
    and informative. Anything that is a normative requirement is a must.
    A should or may would not be normative requirement.  We would need
    to be very careful using that language in informative documents.
    We've had charter issues, the language has strayed into informative
    documents.
  * I see value and intellectual rigor and honesty in being explicit. We
    call these guidelines, they became standards. The dirty secret is
    that no site can be perfect. If we take on this kind of language we
    need to be clear that we aren't going to say "must".
  * In terms of scoring, engineers need to have black and white
    decisions. Everything we do is based on binary decisions. In the
    must we have declared what that bright line is. Bright lines make
    measuring easier.
  * Elements interact with each other. WCAG has them broken down in
    different elements. Bring needs of people with cognitive
    disabilities into the mix adds a layer of complexity and conflicts.
  * I'm concerned about internationalization and not comfortable saying
    we need to put ourselves in the shoes of legislators.
  * Concerns around accessibility of using the RFC2119 capitalization of
    MUST as it is not well identified by some assistive technologies.

Agree that more specific proposals are needed.

== Minutes ==

https://www.w3.org/2020/03/10-silver-minutes.html

=== Text of Minutes ===

    [1]W3C

       [1] http://www.w3.org/

                                - DRAFT -

                        Silver Virtual F2F Tuesday

10 Mar 2020

Attendees

    Present
           jeanne, sajkaj, ChrisLoiselle, Laura, Jennie, kirkwood,
           Lauriat, Lucy, alastairc, Makoto, Chuck, JF, stevelee,
           KimD, AndyS, PeterKorn, mattg, Rachael, Detlev

    Regrets

    Chair
           Shawn, jeanne

    Scribe
           ChrisLoiselle, Chuck

Contents

      * [2]Topics
          1. [3]Review of homework: what insights did people gain
             from it?
          2. [4]Conformance and Minimums
          3. [5]Should we use IETF standard RFC 2119
      * [6]Summary of Action Items
      * [7]Summary of Resolutions
      __________________________________________________________

    <ChrisLoiselle> Scribe: ChrisLoiselle

    Bruce Bailey: The directions for the conformance exercise for
    headings or visual contrast were misunderstood. Tallying for
    qualitative assessement may not work. I have written up some
    information that I'd like to share

    <bruce_bailey>
    [8]https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPx
    yE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643&range=A1

       [8] https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPxyE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643&range=A1

    Spreadsheet that Jeanne shared before, Sample Scoring example
    is name of the google sheet

    <AndyS> AndyS present+

    Bruce Bailey: Ratings / Score is Outstanding / 4 , Very Good /
    3 , Acceptable / 2 , Unacceptable / 1

    <Zakim> bruce_bailey, you wanted to talk about conformance work

    Reviewing the heading , use of headings homework

    <kirkwood> well done Bruce!

    PeterKorn: Mirrors what we've been doing within Amazon. I
    really like the potential for this to work with scoring rubric.
    I.e. for this product release, these items are very good, these
    items are acceptable, etc. This is good.

    JF: This is getting better in terms of granularity. Where would
    we integrate en301 within the subsections? Where would we get
    into the 7 functional requirements?

    Jeanne: They will be in a different location. We may merge this
    into scoring.

    Lucy: Can you explain the scoring a bit more (to Bruce B.)

    Bruce: Not skipping a heading level , is outstanding / 4 or
    Very Good / 3.

    Bruce Bailey: If you skip levels, you at very best are at
    acceptable.

    Lucy: Acceptable to me seems that you've met every point.
    Actually, Very Good means you've hit every point.

    Bruce Bailey: I also looked at Clear Language and Visual
    Contrast of Text and went through the same rating

    Step 1 is to assign this to a web page in a website, then
    assign the website a number as well, so the numbers, mean, mode
    etc. would give a person a score for wcag 2.1 .

    3 out of 4 guideline is rated as outstanding...

    A rubric that works for assigning silver, bronze, gold rating
    to websites could be used as well as the rating scores.

    Shawn L: Opens to JF for comments on his work

    <JF> [9]http://john.foliot.ca/demos/HeadingsTestOne.html

       [9] http://john.foliot.ca/demos/HeadingsTestOne.html

    JF: Shares a Testing Headings Sample Page. Talks to heading
    structure being used properly. Each heading has a class.

Review of homework: what insights did people gain from it?

    The entire document is made up of headings, 18 heading 1's .

    My question is the score going to be the same as the previous
    example shared?

    The pages John shares are Testing and Scoring Headings - Master
    Page , [10]http://john.foliot.ca/demos/HeadingsTestOne.html

      [10] http://john.foliot.ca/demos/HeadingsTestOne.html

    Testing Headings - Test 1 ,
    [11]http://john.foliot.ca/demos/HeadingsTestTwo.html

      [11] http://john.foliot.ca/demos/HeadingsTestTwo.html

    Testing Headings - Test 2 ,
    [12]http://john.foliot.ca/demos/HeadingsTestThree.html

      [12] http://john.foliot.ca/demos/HeadingsTestThree.html

    Testing Headings - Test 3,
    [13]http://john.foliot.ca/demos/HeadingsTestFour.html

      [13] http://john.foliot.ca/demos/HeadingsTestFour.html

    At end of each page, the negative impact on functional
    requirements is listed in a table

    PeterKorn: These detailed examples are fantastic. Comment: Some
    of the examples are easy to find with a programmatic tool.

    If a tool could have found it, and you didn't do it, it is not
    acceptable.

    <jeanne> +1 Peter

    JF: Tools aren't going to catch all examples. I wouldn't
    outright fail for all users. Failing for some users is valid.
    The Functional Requirements are key to a scoring rubric.

    Shawn L: We can talk to this in minimums

    <david-macdonald> interesting that JAWS and NVDA announce the
    level without aria level for <div role="heading"
    class="level_2" ...

    Lucy G: I love the examples, John. I see Bruce's where a tally
    needs to add up to 100 points. Everything would be weighted and
    have weights within it.

    JF: A more granual score helps content creators as well.

    Lucy G: I think we are on the correct path.

    Jeanne: If we look at Bruce's example and drill down into more
    granual approach, would that help?

    JF: Explicit definition of semantic headings would be useful.
    ... The structure vs. the visual presentation helps cognitive
    disability user group.

    <Chuck> +1 to Jeanne's idea

    Jeanne: What if we took each of the 4 examples JF has and
    turned them into the correct thing, and turn those into
    methods? If we then reference the individual methods within
    Bruce's example to know what they need to review for HTML and
    scored that way?

    JF: How many methods could be constructed using the ACT rules
    format?

    <Lauriat> +1

    Jeanne: Exactly, using ACT in methods. Scoring could reference
    those.

    Referencing ACT tests would work well in methods.

    David McD: JAWS reads those headings properly, wondering what
    accessibility supported score would be if semantic code is not
    totally correct?

    Normative and methods comment: Looking at Bruce's examples,
    column B could be methods.

    Techniques would be technology specific and non normative

    <JF> +1 to Lucy

    Lucy G: Assitive Technology will change mind at the moment,
    same as browser. Methods must stick to is code correct or not?

    <Chuck> +1 to lucy

    <jeanne> +1 Lucy

    <laura> +1

    +1 to Lucy (Chris without scribing)

    Shawn L : To group: Let us shift gears to next agenda item

    Jeanne: Scoring examples topic: Testing real websites is best.
    It changed how I was approaching things. Test against a real
    website.

Conformance and Minimums

    <sajkaj> +1 to Jeanne

    Shawn L: Criticality is important. How does one express it in a
    score and fundamental critical issue of accessibility?

    Very poor, acceptable , etc. Within Silver, do we want to be
    the ones drawing the line on critical issues?

    PeterKorn: Looking at JF's second example, if the page only had
    two headings, one a heading 1 and one a heading 2...usage
    without vision, hiearchy structure would be a fail, but would
    the impact to the user be significant?

    <JF> exactly

    Is it not usable without vision?

    We need to think of overall functionality and impact of the
    fail and how we view the site?

    JF: Peter , I agree. I built the examples where structure
    created for screen readers were ok. If I did unstyled divs,
    structure would off visually for sighted users (COGA) Impact of
    different user groups needs to be looked at. Visual users vs.
    non visual users / screen reader users.

    <Zakim> bruce_bailey, you wanted to say that i like adjectival
    rating over tally because critical aspects could be in
    "average" and above

    I.e. h2 to h5 , still usable ? Or a fail? How is it rated /
    scored? As opposed to pass or fail.

    Bruce Bailey: Critical are acceptable or above in my example.

    Charles Hall: I wanted to add to the criticality severity
    comments. If I'm evaluting a rubric against a subset vs. an
    author of the website, I the author have the right to scope of
    my page through a task flow

    <JF> Not sure if we've arrived at consensus to Charles' point

    <JF> \regarding scoping

    Chris Loiselle to Charles: If I'm writing that wrong, please
    add in your comments. Sorry!

    <jeanne> +1 JOhn - The consensus on scope was a logical subset,
    not a task

    PeterKorn: Scoring with numbers is a very easy way to get lost
    in what does 85 % mean? Adjectivity based approach may lead to
    more progress quickly on scoring.

    <PeterKorn> (to respond)

    Lucy G: If adjectival route is the way we are going, numbers
    will also help in the end as well. I.e. full one point can be
    broken down as well. Requirement has its own scoring.

    When it comes to a critical criteria, that is when we can
    weight it.

    Think of it as containers. Language would be its own container.
    Is language more critical than headings?

    <Chuck> we are at time to change scribes, and someone else will
    need to monitor the participants list for raised hands, as I
    will not be able to scribe and watch that list.

    AndyS: Page is structured. Ads are present.

    @Chuck. I can continue to scribe after a five minute break.

    <CharlesHall> that was CharlesHall that mentioned the <aside>
    as part of the scope

    I will return very soon.

    <Chuck> scribe: Chuck

    <sajkaj> I can

    Jeanne: Any followup on what Andy said?

    <Zakim> JF, you wanted to respond to Peter's comment about
    numbers

    JF: responding to Peter's comments on number. Appreciating all
    the issues a number means, my understanding is...
    ... This exercise is about getting a number. We currently have
    100% or zero (in wcag 2.x)
    ... A number can be misleading at times. If anybody used the
    chrome tool (lighthouse), at the end of the process...
    ... Chrome added a score. We don't know where that score came
    from. If we start from premis that you never get 100, that
    percentile becomes incentive for doing better (72%, let's try
    to get to 85%).
    ... Peter, as much as numbers can be a rat hole, it's a
    critical part of what we are trying to do in silver.

    <ChrisLoiselle> Chuck, I can scribe again.

    <ChrisLoiselle> Scribe: ChrisLoiselle

    <Chuck> PK: I wasn't here for all of silver discussions... my
    understanding is that the high order bit is to move away from
    pass-fail perfection, to "mildly or largely".. numbers isn't
    the goal.

    <Chuck> PK: The goal is to get away from pass/fail. I like the
    rubrik. We can evaluate the rubrik evaluation. Some things are
    acceptable, some things are good.. or almost everything is
    great but some are good...

    PeterKorn: The goal was to move away from pass / fail. I like
    the rubric. If we have a way to collect up the rubric
    evaluations, most are very good, the product will be very good.

    <Chuck> PK: Whether or not we assign numbers, we need to think
    on severity, the people, and the impacts. I like adjectival
    approach. I don't know what 87% means. I do know what
    acceptable and very good means.

    <CharlesHall> to PeterKorn’s point, a qualitative metric can be
    converted to a quantitative score based on the number of
    adjectival categories

    <Chuck> JF: I agree with that statement, in regulatory
    environment we need to hit a bar. "Intelectually honest" means
    that there isn't something that is 100%.

    <Chuck> Shawn: I think you are agreeing on different points.
    Let's return to queue.

    <Chuck> detlev: I was wondering... you mentioned the 9 user
    accessibility needs. In the rating, there would be a plan to
    issue different types of results for different impacts...?

    <Chuck> detlev: Also, why is there a mismatch between user
    accessibility needs in EU implementation (there you are missing
    limited reach)...

    <Chuck> detlev: you have two categories for people with hearing
    problems (hard of.. and no). Is there a conscious decision to
    drop the difference between hard of and no hearing?

    <Chuck> Jeanne: It may be that I took an old version of the EU
    directive. Can you send me a link to the current?

    <Chuck> Detlev: I was wondering why users with limited reach
    was left out. someone explained... I'm not convinced there's a
    good reason to leave it out.

    <CharlesHall> we have discussed adding functional needs to the
    EN standard, like “intersectional”

    <Chuck> Jeanne: Great. We did have a discussion of adding
    things, for limited cognition, vestibular. We never discussed
    dropping any, so I may have had an old version.

    <Chuck> detlev: Is there an intention in all the scoring for
    differentiated by user group? At one point it was a no-no. At
    some point we decided we didn't want to differentiate, but not
    sure if that has changed.

    <Chuck> Shawn: We've been thinking about it more closely to
    what John demonstrated in that for different guidelines looking
    at the effects on different user needs given the task or scope
    of testing.

    <Chuck> Shawn: We don't have a fleshed out illustration of
    that, but we have been considering. We want to accomplish
    making sure that we are not inadvertently leaving user needs
    out.

    <Chuck> Shawn: For example, if it works great for limited
    cognition but it completely leaves out users who use screen
    readers, we want the ability to highlight that, and visa-versa.

    <Chuck> Shawn: Like if everything is symantically accurate, but
    visually not.

    <Zakim> jeanne, you wanted to talk about testing criticality
    and severity

    <Chuck> Jeanne: I'm glad we are having conversation about
    criticality. I know there's a lot of... people who feel it's
    very important. But one of the things we agreed on in Silver is
    we'll be data driven and research based.

    <Chuck> Jeanne: whenever we tried to score real websites
    against criticality we consistently didn't find a way to do
    that in a way that didn't penalize people with some
    disabilities.

    <Chuck> Jeanne: We have to find a way to stop penalizing people
    with some sorts of disabilities. We could not find a way to
    test it that didn't structurally disadvantage people with low
    vision and congitive issues.

    <KimD> +1 to Jeanne

    <Chuck> Jeanne: It's great to talk about in theory, but if you
    want to propose, you need to show research that demonstrates
    everyone is speaking equally.

    <Chuck> lucy: Speaking to numbers, we need to offer something
    that leaders can relate to. If we offer something confusing,
    they will blank us out and do other tasks.

    <AndyS> Comment: IMO the only way to treat all disabilities
    equally involved a customization and personalization, so that
    individual needs are accommodated *as needed*

    <JF> +1 to Lucy

    <Chuck> lucy: We have to have those numbers. A lawyer is not
    going to understand what it means to have "this or that" level.
    They want a number and a way to improve that number.

    <Chuck> Rachael: I have 3 things to keep in mind. Been brought
    up before, may be not in in this call. If we don't take
    functionality into account, and we come up with a %, that %
    will mean different things to different groups.

    <Chuck> Rachael: 80% may mean one thing to ceasures and another
    to a blind.

    <Chuck> Rachael: If you have a group of individuals who fall in
    the category of blind, vs people in coga, that hierarchy may
    introduce discrimination.

    <Chuck> Rachael: This is such a rich and fantastic discussion,
    but whatever we come up with needs to be understandable and
    simple.

    <Chuck> JF: Addressing a comment from Jeanne. I'm support of
    all user groups including coga, I do so in action and words.
    The reality is that if we take that user group in
    consideration, that is one user group.

    <Chuck> JF: I want to recognize severities. Those heading
    examples, if I remove visual structure, a person who has a
    cognative issue and is blind is doubly disadvantaged.

    <Chuck> JF: If we determine that an impact has a greater impact
    against a group, we can factor that in. That's what I said 6
    months ago. Not all things are created equal. We have to boil
    this down to a score and strategies to improve the score.

    <Chuck> Jeanne: I only disagree with the weighting.

    <Chuck> Jeanne: "this is more severe than that". I don't have
    an issue with your example. At the guideline level I think you
    can do that.

    <Chuck> Lucy: The weighting should be by criteria.

    <Chuck> Lucy: So many disabilities are impacted by this or
    that. I won't pit one against another, but if it's 4 vs 1, we
    have no choice but to weight that.

    <Chuck> Jeanne: I disagree. When you start looking at it as a
    whole, there are too many things that we give guidance for that
    are heavily weighted to blind vs cognative.

    <Chuck> Jeanne: The way we are set up including Silver is that
    we measure things granularly, and we say this is more important
    than that, but when we look at it as a whole...

    <CharlesHall> would a priority or severity not be functional
    need agnostic?

    <Chuck> Jeanne: granularly, this is a complete blocker, but
    someone with a cognative issue may be able to work it out. But
    in total the cognative issues become a blocker, because you
    have to look at it in totality.

    <Chuck> Jeanne: Cognative loses when we say "this individual
    piece" is more important than "this one".

    <jon_avila> I object to the notion that WCAG is heavily focused
    on Blind and visual impairment. There are many WCAG criteria
    that are aimed at a wide range of users with disabilities

    <Chuck> Jeanne: That alt text is more important than
    captioning. It's the totality of the website, it's all the
    guidelines. If they are getting a lower weight because of each
    individual guideline, they get a website that they cannot use.

    <Chuck> Jeanne: Can someone else make the argument better?

    <Chuck> Shawn: It's a complex topic and we should keep in mind
    as we work through. We all have the same level goals for
    conformance.

    <Chuck> detlev: Just to get to Jeanne's argument about critical
    issues, penalizing others. Why is that a problem? Why wouldn't
    it be possible to ask all the groups involved to basically
    identify show stopper issues. We know those.

    <Chuck> detlev: keyboard trap, lack of captions, so on. Maybe
    show stopper for cognative individuals. Basically ok to collect
    those issues and make them critical issues.

    <Chuck> detlev: Can you explain further Jeanne?

    <Chuck> Jeanne: Let's honor queue.

    <Zakim> bruce_bailey, you wanted to say numbers can be used to
    measure progress, but i have never seen numbers that were
    comparable from one website to another (or one tool to another)

    <Chuck> Bruce: We've been trying (everyone!) to rate your
    website since days of Bobby. All of these rules, all the years,
    it's only ever useful for the developer to make progress. That
    you aren't regressing.

    <Chuck> Bruce: Lot's of tools will give you a percentage.

    <Chuck> Bruce: Those only make a difference on one domain. You
    can't compare an 87 on one site to an 86 on another site. Or
    even cross tools. I don't feel like we will make progress if we
    try to have granular scoring systems.

    <Chuck> Bruce: Unless there are 1000s of data points. Enough
    data points that you aren't doing manually that eventually the
    weaknesses of the tool averages out.

    <PeterKorn> +1 to Bruce

    <Chuck> Bruce: I won't be able to say that one outstanding site
    compares well to another.

    <Chuck> Matt: Returning to yesterday, seems like there are 2
    different criteria you are trying to assess. How well has the
    author created the content, what score can you give them, vs
    how do you make this understandable to the user.

    <david-macdonald> +1 to bruce

    <CharlesHall> to Detlev’s point, the challenge is not that one
    functional need does or does not have a critical item. i think
    the challenge is where 2 or more functional needs have a
    critical item that is in conflict.

    <Chuck> Matt: There's a bucket approach, there's probably going
    to be something that they haven't met requirements to they met
    them all. There are different methods which have different
    impact on different groups.

    <Chuck> Matt: That's different from the number score that the
    producer gets. I think that these two different aspects need to
    be viewed separately.

    <Chuck> Shawn: Indeed.

    <Chuck> JF: Two thoughts... Jeanne mentioned one requirement is
    captions. If we look at the severity of captions. If you are
    deaf and a video is missing captions, that's critical. If I
    have a cognative issue and I have captions..

    <Chuck> JF: That helps me. Offering captions to a cognative
    user offers some benefit, but to a deaf user is completely
    critical.

    <JF>
    [14]https://docs.google.com/spreadsheets/d/1EXw5W6SuMXk7mrFN33h
    HVQuCn0DhKysXmUCKNML3SKg/edit#gid=108726882

      [14] https://docs.google.com/spreadsheets/d/1EXw5W6SuMXk7mrFN33hHVQuCn0DhKysXmUCKNML3SKg/edit#gid=108726882

    <Chuck> JF: How do we make this fair, realizing we can't get
    every individual out there? In an earlier example (6-8 months
    ago), I put forward a draft proposal, that had suggested that
    as we were calculating scores we used a weighting mechanism,
    and use that as a multiplier.

    <Chuck> JF: We would ultimately have a better... more data
    points. More data points will give us a better score.

    <jeanne> COGA with individual guideline weighting.

    <Chuck> Shawn: Logistically, I'd like to get through the queue
    and then get to the last agenda item.

    <Zakim> alastairc, you wanted to say that how we measure things
    is more important than how we weight them

    <Chuck> Shawn: This discussion has been great in taking in all
    thing things we need to consider.

    <Detlev> @CharlesHall: "challenge is where 2 or more functional
    needs have a critical item that is in conflict" - can you speak
    to that? I'd love to know where solving one showstopper issue
    creates a real problem for another group...

    <Chuck> Alastairc: Not sure weighting is necessarily going to
    be the answer. A lot of the problems in terms of coverage in
    wcag 2.x is how things are scoped and measured. Been clear with
    addressing coga issues.

    <Chuck> Alastairc: If there is a functional outcome that has an
    equal weighting per guideline, that may be a reasonable way to
    proceed. It's within the guideline to decide what content
    authors need to do.

    <Chuck> Alastairc: Let's start off even, and once we have
    better coverage of guidelines. Maybe later we can weight the
    guidelines when we have more. Let's start with "what is a
    reasonable thing to ask content authors to do".

    <Chuck> PK: Friction comes to mind. Every place where the
    language is harder to puzzle out is friction. The accumulation
    of friction can take a site from great to struggling to
    impossible. The spoons model.

    <Chuck> pk: I think the same concept applies to other
    disabilities. What we thought of traditionally as pass/fail.
    The header example, a page with 2 headers is a little bit of
    friction, a formal failure. Does it prevent blind individuals
    from using the page? Probably not.

    <alastairc> +1 to thinking about friction, which again could be
    dealt with quite granularity with good measurement, where low
    scores across the board start adding up (or rather, not adding
    up!)

    <Chuck> PK: Mel Brooks silent movie has text in it, except for
    one individual who has one word. Is it a problem for a deaf
    person who wants to watch that movie? I think we can look at a
    friction based model.

    <JF> +1 to Peter's point - barriers are based on the functional
    requirements of the different disability types

    <Chuck> pk: ... come up with a notion that there is a little
    bit of friction for blind folks because some small pages don't
    have headings right, there's a lot more friction for a
    cognative user. We can elevate a site and say that can be no
    worse than "good"...

    <JF> +1 to contemplating A,B,C,F scoring

    <Chuck> pk: therefore a site does or doesn't make it. Back to
    numbers. A+, B-, I think there are mechanisms that can be
    fairly granular, like we are getting a C- and we want to get a
    C+. We can get caught up in a scale of 100% and we are arguing
    if 86 or 87 is good enough.

    <Zakim> jeanne, you wanted to say to Detlev when each group
    gets a show stopper, then a bonus is given to that group. But
    for COGA, its the overall sum of all the guidelines. So they

    <david-macdonald> +1 Peter had a great concept of "friction"...
    at some point there is too much friction to use.

    <Chuck> Jeanne: To detlev to weighting, when each group gets a
    show stopper, then a bonus is given to that group. For coga
    it's the overall.

    <Chuck> Jeanne: Each group gets more points for show stoppers,
    and coga gets less. Physical or sensory issues have more
    points. But when coga looks at the overall score... like a
    website gets a 93, but for coga it's not accessible.

    <Chuck> Jeanne: For coga it's more of an overall issue. That's
    why I say that people need to test their proposals across... on
    real websites. We found that it wasn't reflecting the issues
    for coga users correctly.

    <alastairc> Jeanne - couldn't that be addressed by having
    plenty of guidelines for COGA (based on things we can't fit in
    WCAG 2.x), so that without meeting enough of those, you
    wouldn't pass?

    <Detlev> Can I answer to that directly (briefly)?

    <Chuck> Jeanne: It's great to have these proposals. John's
    proposal of putting the impact at the guideline level and we
    trickle that down, that can work. But if we put weighting in,
    I'll ask for real examples with real websites and show it's
    fair.

    <Chuck> Jeanne: I think we can do it without weighting and go
    back to an author and say "here's total score, here is how it
    breaks down by disability", I think that can work. I think we
    can avoid the weighting issue.

    <Chuck> detlev: I think you are mixing 2 different things.
    Nobody argues that wcag doesn't have enough for coga. Coga
    issues are underrepresented. I don't see a real conflict for
    critical issues by groups.

    <Chuck> detlev: cognative folk that are impacted by many
    different things combined, if the new guidelines and rubriks
    includes them, I would rather think of subtracting points.
    Cognative score would show the deficiencies clearly.

    <Chuck> detlev: You'd still have a way of showing off critical
    issues by groups.

    <jeanne> I would be interested to see a proposal with real
    data. I would help with testing.

    <Chuck> detlev: There are absolute show stopper for some users,
    and we need to show that. Jeanne you said that these can drown
    out coga issues, but I think that it can be reflected properly
    and benefit coga users.

    <Chuck> Andy: The cognative issue is such a complex subject. It
    overlaps with neurological, senses, perceptions. There are so
    many different varieties (A.D.D) will be different from
    educational handicaps.

    <jeanne> alastair, that is possible -- again, I would like to
    see a proposal with mockup of the data with proposed
    guidelines.

    <Chuck> Andy: Becomes this big mess of how we divide up, should
    we divide up... becomes a broad spectrum. Points towards the
    ability to customize and personalize as the way to ensure that
    all groups have equal access.

    <alastairc> Jeanne - I think we need to leave
    weighting/criticality until we have better coverage of
    guidelines.

    <Chuck> Andy: In terms of a triangle, the user is one point,
    the author is another point, and the technology is the 3rd
    point. Those 3 points need to work together in a way for every
    user to be accommodated.

    <alastairc> we aren't sure what / how-many new guidelines are
    likely to come from COGA.

    <Chuck> andy: I don't think there's a way for a website to
    address everyone all the time. I understand what Jeanne is
    saying in terms of weighting. Weight vision perception gets the
    site up to a high level of acceptance, but the things that made
    that...

    <Lauriat> +1 to Alastair

    <Chuck> andy: site a vision number may actually hurt coga
    issues. high contrast can cause coga issues. There's a lot of
    interaction there that can... how do we really make that into a
    matrix where you push this up and the other comes down.

    <Chuck> andy: Squeezing a toothpaste tube, one end decreases,
    the other gets larger. I don't know that weighting is the way
    of solving. I think customization is the way to achieve the
    ultimate goal. These are thoughts in my mind.

    <kirkwood> +1 to customization

    <Chuck> Shawn: We are unlikely to come to complete resolution
    to this conversation, we need to finish queue and move on.

    <JF> FWIW, the Personalization TF is looking at 'customization"
    today - but we lack the technology to make it happen today

    <Chuck> Lucy: If any disability is poorly affected by any
    system we come up with, that's our failing. We don't have the
    data and don't know, and if it's still failing cognitive,
    that's our responsibility to address that group.

    <jeanne> Andy, that's why I like that proposal of putting data
    into each guideline about each disability effected, but then
    giving a total score for each disability.

    <Chuck> Lucy: We have fixated for so long on what we know, we
    still need to do the research and determine what we don't know.

    <Chuck> lucy: I'll say that I don't know enough, but I do take
    it into effect, and I want to know more.

    <Chuck> Charles: Historic: we are all in general agreement that
    this is complex, which is why it takes a long time to get
    through one point of conversation, but I don't think it's
    impossible to account for something that is critical to one
    functional need in order to achieve a score.

    <Lauriat> @JF: I'd like to know more about the technology
    needed to make that happen, or at least prototype things out to
    figure out how to make that happen.

    <Chuck> Charles: If we have 9 functional need categories and
    one has an issue, then they all do. Historic conversations,
    where there's a conflict. Where there is a challenge is where
    something in one category that is in conflict with another
    issue in another category...

    <Chuck> Charles: headings large and in one color may conflict
    with someone where it could trigger anxiety. It's fine to
    consider this, but we need to be aware of the conflicts.

Should we use IETF standard RFC 2119

    <Chuck> Shawn: With that, we have a lot to think about and work
    through. Let's move on to whether to use rfc 2119.
    Must/should/must not/should not

    <Detlev> @Charleshall: I think research shows ALL CAPS is
    harder to read, cannot think who would benefit

    <Chuck> Jeanne: W3C and standards organization around the world
    rely on this particular RFC (request for comments). Very old,
    used in technical standards by many standards orgs.

    <Chuck> Jeanne: W3C in the past has said they don't recommend
    it for guideline use, and not used in WCAG. Designed for
    technical specs that require interoperability. More about...
    John may have lots of examples.

    <sajkaj> Think APIs

    <Chuck> Jeanne: Because it's not included, we are interested in
    including it in Silver. This is John's proposal. John...

    <CharlesHall> @Detlev. i agree. was trying to create an example
    on the fly since I couldn’t recall some of the specific
    examples we discussed in the past. particularly with insights
    from Cybelle.

    <Chuck> JF: The part of the issue is that wcag has moved from
    being a guideline to being a standard. That's the reality. We
    have govts that say "you must meet wcag 2.x AA". because of
    that, we need to have these bright and measurable points

    <jon_avila> The ARIA specification is an example that uses
    RFC2119

    <Chuck> JF: To meet that requirement. When we talk about the
    users, the 3 points, there's a 4th point... legal requirements.
    RFC 2119 calls out must should and may.

    <Chuck> JF: It's unambiguous and clear. Failing means you
    didn't meet the requirement. It's about having clearly defined
    requirements with explicit language.

    <jeanne> [15]https://tools.ietf.org/html/rfc2119

      [15] https://tools.ietf.org/html/rfc2119

    <Chuck> JF: I want to use it, when we created a guideline, that
    was guidance. Because we are at a point where we have
    measurement and scoring... as part of that requirement we
    should use very clear language.

    <CharlesHall> +1 to JF on use of an unambiguous standard

    <Chuck> JF: We've got clarity there.

    <Zakim> alastairc, you wanted to say that must = the
    requirement, and should/may wouldn't be suitable for stating
    the requirement.

    <Chuck> Alastairc: I think as it stands now... silver
    structure... normative and informative... anything that is a
    normative requirement is a must. A should or may would not be
    normative requirement.

    <Chuck> Alastairc: They are very tied together. We would need
    to be very careful using that language in informative
    documents. We've had charter issues, the language has strayed
    into informative documents.

    <Chuck> Alastairc: introduces unnecessary uses. I don't
    disagree or agree, I think our current approach takes it into
    account already.

    <Chuck> DM: The sc were designed to be testable statements. If
    the statement is true, you meet the criteria. Every one of
    those statements is a must.

    <jon_avila> I find it ironic that folks have said no disability
    should be weighted over others -- yet it was said cognitive
    disabilities are impacted by the wholistic site and thus trump
    everything in terms of importance

    <Chuck> DM: We do have must statements, we don't have shoulds
    in the normative docs.

    <Chuck> PK: I see value and intellectual rigor and honestly in
    being explicit. We call these guidelines, they became
    standards. The dirty secret is that no site can be perfect. If
    we take on this kind of language we need to be clear that we
    aren't going to say "must"

    <alastairc> jon_avila I think there is just recognition that
    there as been a gap, and there needs to be work to address the
    gap.

    <Chuck> PK: when the must is not achievable. I don't have
    strong feeling for or against. But if we use this language, we
    need to be clear that we are creating an impossible "must".

    <Zakim> JF, you wanted to note that there is a real difference
    (in RFC 2119) between MUST and must (case sensitive)

    <jon_avila> For what it's worth - WCAG 2.0 and 2.1 are
    standards according to WCAG itself - "The WCAG 2.0 document is
    designed to meet the needs of those who need a stable,
    referenceable technical standard."

    <Chuck> JF: Alastair... concern about language. Must should and
    may are always in upper case. MUST and must don't equal the
    same thing.

    <alastairc> let's not go down that route!

    <Chuck> JF: "must" is conversational. That avoids some of that
    problem.

    <AndyS> Thats scary

    <Chuck> JF: As peter noted, our guidelines have been made
    standards. In terms of scoring, engineers need to have black
    and white decisions. Everything we do is based on binary
    decisions. In the must we have declared what that bright line
    is.

    <sajkaj> Methinks JF forgot about analog engineering!

    <david-macdonald> There are no "Must", "should" or

    <Jennie> I would be concerned with upper case and lower case
    differences in meaning, from a cognitive standpoint.

    Chris Loiselle comment on standard:
    [16]https://www.iso.org/standard/58625.html, point to standard

      [16] https://www.iso.org/standard/58625.html

    <Chuck> Andy: I want to mention, all the other standards
    organizations, they use this verbage.

    <david-macdonald> "may" in WCAG 2 or 2.1

    <JF> +1 to bright lines

    <JF> bright lines make measuring easier

    <Chuck> Andy: But if we were going to go there, it's important
    to note that this is a very bright line. Requires additional
    diligence to make sure that "shall" is really understood and
    isn't going to create situations that cannot be absolutely
    achieved.

    <jeanne> I worry that accessibility needs are not oriented
    toward bright lines.

    <Chuck> Andy: With all of the many things we are talking about
    that interact with each other. WCAG has them broken down in
    different elements. These elements interact with eachother.
    Bring Coga into the mix adds a layer of complexity and
    conflicts.

    <david-macdonald> There are no instances "Must", "should",
    shall, or "may" in WCAG 2 or 2.1

    <Chuck> Andy: If one "shall" conflicts with another "shall",
    we'll get into trouble.

    <jeanne> +1 Andy

    <jon_avila> I agree that use of these RFC 2119 terms will only
    complicate things

    <Chuck> Andy: A bit more ambiguous from other standards from
    other groups. ANSI specs on displays and fonts, their language
    and examples are set in technology of the late 80's and early
    90's.

    <KimD> I'm concerned about internalization and not comfortable
    saying we need to put ourselves in the shoes of legislators.

    <Chuck> Andy: We start to get into ambiguous realm when we
    discuss different browsers render fonts differently. I like the
    idea of adopting this more affirmative use of terminology, but
    brings a great deal of complication.

    <alastairc> Is it worth trying this language out in a method?
    That seems to be the most suitable place.

    <Chuck> Lucy: I want to see it applied and see how it works,
    and then when John responded... I say what Peter said... this
    is not a possible thing to accomplish and remain accessible
    itself.

    <Lauriat> @Alastair: No, that would make tech-specific methods
    normative.

    <Chuck> Lucy: I like the idea, in the terms of what we have
    been thinking of all along, I want to see it apply to some
    examples first.

    <alastairc> Um, I'm not sure it will help with the clear
    language.

    <Chuck> Lucy: I can't tell the difference between MUST and
    "must".

    <jon_avila> There are settings in screen readers to communicate
    capitalization of text.

    <JF> <span aria-label="RFC 2119 MUST">MUST</span>

    <Chuck> Shawn: My proposal is to go through the minutes and
    pull out the pros and cons of going with this language and
    keeping the current language.

    <Chuck> Shawn: And then we can use that as a summarization for
    folk who couldn't make it to this call.

    <alastairc> Should a guideline include should/may?

    <Chuck> JF: I pasted some code in RFC to address your concerns.

    <Jennie> Won't the ARIA label only assist those using screen
    readers, but not those with reading challenges with vision?

    <KimD> +1 to Jennie

    <david-macdonald> There are no instances "Must", "should",
    shall, or "may" in WCAG 2 or 2.1 success criteria

    <Chuck> JF: <discusses rfc-2119 must>

    <Chuck> Shawn: worth looking into annotations.

    <jon_avila> ARIA labels on non-interactive text doesn't work
    well with screen readers.

    <jeanne> +1 Jennie

    <alastairc> JF - would a guideline include should or may?

    <Chuck> Shawn: With that, thank you everyone, and bringing
    examples. Super helpful as a part of these complex topics.

    <Chuck> Shawn: Anything else Jeanne?

    <JF> @ Alastair - it could

    <Chuck> Jeanne: Incredibly helpful, great conversations, we'll
    keep working.


    [End of minutes]
      __________________________________________________________

Received on Tuesday, 10 March 2020 18:42:42 UTC