Summary and Minutes of Silver Virtual Meeting Tuesday Part 1 from Jeanne Spellman on 2020-03-10 (public-silver@w3.org from March 2020)

From: Jeanne Spellman <jspellman@spellmanconsulting.com>
Date: Tue, 10 Mar 2020 14:42:26 -0400
To: Silver Task Force <public-silver@w3.org>
Message-ID: <bf9c2976-b7eb-dde6-36a4-d67c2dae5e51@spellmanconsulting.com>

== Summary ==

Homework Assignment: We reviewed the homework assignment to test the
Scoring Example against a real website. Two people responded. One
wrote a proposal for an adjectival scoring mechanism
<https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPxyE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643>.
We discussed pros and cons of using this mechanism. We need to test it
against real websites. The other person wrote Testing Headings
<http://john.foliot.ca/demos/HeadingsTestOne.html> web pages to
illustrate specific examples in HTML of code that technically passed the
guideline for Headings, but failed it against specific disabilities.
This led to a detailed discussion and multiple verbal proposals of how
scoring could be adapted to solve this problem. Please send written
proposals so we can follow up with these ideas.

* Having a minimum that if a tool fails it, it can't pass
* Adding Functional Requirements to each Guidelines
* Taking the Testing Headings examples and turning them into Methods
and linking Adjectival Scoring to specific tests in the new Methods.
* Assistive Technology will change mind at the moment, same as
browser. Methods must stick to is code correct or not?

Minimums: We discussed the Testing Headings examples for what was
critical for which disabilities. This led into a complex discussion of
criticality and how to insure that we account for critical needs. Some
points made:

* Agreement that we need to treat disabilities equally, but that some
failures are more harmful than others.
* We discussed the pros and cons of using adjectives or numbers for
scoring.
* We agreed to update functional user categories to the latest from
the EU, but do want to be able to add categories for more granular
breakdown for "limited cognition" and to add vestibular disorders.
* If we don't take functionality into account, and we come up with a
%, that % will mean different things to different groups.
* We don't want to discriminate against one or more disability user
groups. Whatever we come up with needs to simple and understandable
(but we aren't in that phase yet).
* Some members want to include weighting some guidelines more than
others when accumulating a total score, others are strongly opposed
to any mechanism that isn't well tested to insure that it does not
discriminate against some disability groups.
* One verbal proposal is to allow different disability groups to
identify "show stopper" Guidelines.
* Concerns that numeric scores in a granular scoring system would
require thousands of data points to be valid and average out the
weakness in the tool.
* The challenge is not that one functional need does or does not have
a critical item. The challenge is where 2 or more functional needs
have a critical item that is in conflict.
* The same guideline may be critical to one user group, but helpful to
another user group, like captions are critical to hearing
disabilities and helpful to cognitive disabilities.
* How we measure things is more important than how we weight them. A
lot of the problems in terms of coverage in wcag 2.x is how things
are scoped and measured.
* Let's start off even, and once we have better coverage of
guidelines. Maybe later we can weight the guidelines when we have
more. Let's start with "what is a reasonable thing to ask content
authors to do".
* Friction. Every place where the language is harder to puzzle out is
friction. The accumulation of friction can take a site from great to
struggling to impossible. Friction could be dealt with quite
granularity with good measurement, where low scores across the board
start adding up (or rather, not adding up!)
* We could subtract data by user groups.
* Concerns that people with cognitive disabilities are disadvantages
when weighting exists. We need to test each proposal with data from
real websites.
* Adding more cognitive guidelines that will be possible in WCAG 3
could even out the disadvantage that COGA experiences today.
* Some solutions for one disability cause problems for others: Too
much visual contrast is bad for some groups. Large headings can
trigger anxiety or be more difficult to read for screen magnifier
users.

More detailed written proposals are needed that can be tested with real
websites.

Should we useIETF standard RFC 2119
<https://tools.ietf.org/html/rfc2119>? RFC 2119 defines specific
meanings of MUST, SHOULD, MUST NOT, SHOULD NOT, etc. It is used by
technical standards in many standards organizations. It is proposed
that WCAG has advanced from being a guideline to a technical standard
and could benefit from the precise language of RFC 2119. It's
unambiguous and clear. Failing means you didn't meet the requirement.
It's about having clearly defined requirements with explicit language.
Measurement and scoring should be unambiguous.

Comments:

* WCAG 2.x doesn't use RFC 2119 because it is a guideline and W3C
recommends keeping RFC 2119 use for technical specs.
* The ARIA specification is an example that uses RFC2119.
* I think as it stands now, the silver structure has both normative
and informative. Anything that is a normative requirement is a must.
A should or may would not be normative requirement. We would need
to be very careful using that language in informative documents.
We've had charter issues, the language has strayed into informative
documents.
* I see value and intellectual rigor and honesty in being explicit. We
call these guidelines, they became standards. The dirty secret is
that no site can be perfect. If we take on this kind of language we
need to be clear that we aren't going to say "must".
* In terms of scoring, engineers need to have black and white
decisions. Everything we do is based on binary decisions. In the
must we have declared what that bright line is. Bright lines make
measuring easier.
* Elements interact with each other. WCAG has them broken down in
different elements. Bring needs of people with cognitive
disabilities into the mix adds a layer of complexity and conflicts.
* I'm concerned about internationalization and not comfortable saying
we need to put ourselves in the shoes of legislators.
* Concerns around accessibility of using the RFC2119 capitalization of
MUST as it is not well identified by some assistive technologies.

Agree that more specific proposals are needed.

== Minutes ==

https://www.w3.org/2020/03/10-silver-minutes.html

=== Text of Minutes ===

[1]W3C

[1] http://www.w3.org/

- DRAFT -

Silver Virtual F2F Tuesday

10 Mar 2020

Attendees

Present
jeanne, sajkaj, ChrisLoiselle, Laura, Jennie, kirkwood,
Lauriat, Lucy, alastairc, Makoto, Chuck, JF, stevelee,
KimD, AndyS, PeterKorn, mattg, Rachael, Detlev

Regrets

Chair
Shawn, jeanne

Scribe
ChrisLoiselle, Chuck

Contents

* [2]Topics
1. [3]Review of homework: what insights did people gain
from it?
2. [4]Conformance and Minimums
3. [5]Should we use IETF standard RFC 2119
* [6]Summary of Action Items
* [7]Summary of Resolutions
__________________________________________________________

<ChrisLoiselle> Scribe: ChrisLoiselle

Bruce Bailey: The directions for the conformance exercise for
headings or visual contrast were misunderstood. Tallying for
qualitative assessement may not work. I have written up some
information that I'd like to share

<bruce_bailey>
[8]https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPx
yE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643&range=A1

[8] https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPxyE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643&range=A1

Spreadsheet that Jeanne shared before, Sample Scoring example
is name of the google sheet

<AndyS> AndyS present+

Bruce Bailey: Ratings / Score is Outstanding / 4 , Very Good /
3 , Acceptable / 2 , Unacceptable / 1

<Zakim> bruce_bailey, you wanted to talk about conformance work

Reviewing the heading , use of headings homework

<kirkwood> well done Bruce!

PeterKorn: Mirrors what we've been doing within Amazon. I
really like the potential for this to work with scoring rubric.
I.e. for this product release, these items are very good, these
items are acceptable, etc. This is good.

JF: This is getting better in terms of granularity. Where would
we integrate en301 within the subsections? Where would we get
into the 7 functional requirements?

Jeanne: They will be in a different location. We may merge this
into scoring.

Lucy: Can you explain the scoring a bit more (to Bruce B.)

Bruce: Not skipping a heading level , is outstanding / 4 or
Very Good / 3.

Bruce Bailey: If you skip levels, you at very best are at
acceptable.

Lucy: Acceptable to me seems that you've met every point.
Actually, Very Good means you've hit every point.

Bruce Bailey: I also looked at Clear Language and Visual
Contrast of Text and went through the same rating

Step 1 is to assign this to a web page in a website, then
assign the website a number as well, so the numbers, mean, mode
etc. would give a person a score for wcag 2.1 .

3 out of 4 guideline is rated as outstanding...

A rubric that works for assigning silver, bronze, gold rating
to websites could be used as well as the rating scores.

Shawn L: Opens to JF for comments on his work

<JF> [9]http://john.foliot.ca/demos/HeadingsTestOne.html

[9] http://john.foliot.ca/demos/HeadingsTestOne.html

JF: Shares a Testing Headings Sample Page. Talks to heading
structure being used properly. Each heading has a class.

Review of homework: what insights did people gain from it?

The entire document is made up of headings, 18 heading 1's .

My question is the score going to be the same as the previous
example shared?

The pages John shares are Testing and Scoring Headings - Master
Page , [10]http://john.foliot.ca/demos/HeadingsTestOne.html

[10] http://john.foliot.ca/demos/HeadingsTestOne.html

Testing Headings - Test 1 ,
[11]http://john.foliot.ca/demos/HeadingsTestTwo.html

[11] http://john.foliot.ca/demos/HeadingsTestTwo.html

Testing Headings - Test 2 ,
[12]http://john.foliot.ca/demos/HeadingsTestThree.html

[12] http://john.foliot.ca/demos/HeadingsTestThree.html

Testing Headings - Test 3,
[13]http://john.foliot.ca/demos/HeadingsTestFour.html

[13] http://john.foliot.ca/demos/HeadingsTestFour.html

At end of each page, the negative impact on functional
requirements is listed in a table

PeterKorn: These detailed examples are fantastic. Comment: Some
of the examples are easy to find with a programmatic tool.

If a tool could have found it, and you didn't do it, it is not
acceptable.

<jeanne> +1 Peter

JF: Tools aren't going to catch all examples. I wouldn't
outright fail for all users. Failing for some users is valid.
The Functional Requirements are key to a scoring rubric.

Shawn L: We can talk to this in minimums

<david-macdonald> interesting that JAWS and NVDA announce the
level without aria level for <div role="heading"
class="level_2" ...

Lucy G: I love the examples, John. I see Bruce's where a tally
needs to add up to 100 points. Everything would be weighted and
have weights within it.

JF: A more granual score helps content creators as well.

Lucy G: I think we are on the correct path.

Jeanne: If we look at Bruce's example and drill down into more
granual approach, would that help?

JF: Explicit definition of semantic headings would be useful.
... The structure vs. the visual presentation helps cognitive
disability user group.

<Chuck> +1 to Jeanne's idea

Jeanne: What if we took each of the 4 examples JF has and
turned them into the correct thing, and turn those into
methods? If we then reference the individual methods within
Bruce's example to know what they need to review for HTML and
scored that way?

JF: How many methods could be constructed using the ACT rules
format?

<Lauriat> +1

Jeanne: Exactly, using ACT in methods. Scoring could reference
those.

Referencing ACT tests would work well in methods.

David McD: JAWS reads those headings properly, wondering what
accessibility supported score would be if semantic code is not
totally correct?

Normative and methods comment: Looking at Bruce's examples,
column B could be methods.

Techniques would be technology specific and non normative

<JF> +1 to Lucy

Lucy G: Assitive Technology will change mind at the moment,
same as browser. Methods must stick to is code correct or not?

<Chuck> +1 to lucy

<jeanne> +1 Lucy

<laura> +1

+1 to Lucy (Chris without scribing)

Shawn L : To group: Let us shift gears to next agenda item

Jeanne: Scoring examples topic: Testing real websites is best.
It changed how I was approaching things. Test against a real
website.

Conformance and Minimums

<sajkaj> +1 to Jeanne

Shawn L: Criticality is important. How does one express it in a
score and fundamental critical issue of accessibility?

Very poor, acceptable , etc. Within Silver, do we want to be
the ones drawing the line on critical issues?

PeterKorn: Looking at JF's second example, if the page only had
two headings, one a heading 1 and one a heading 2...usage
without vision, hiearchy structure would be a fail, but would
the impact to the user be significant?

<JF> exactly

Is it not usable without vision?

We need to think of overall functionality and impact of the
fail and how we view the site?

JF: Peter , I agree. I built the examples where structure
created for screen readers were ok. If I did unstyled divs,
structure would off visually for sighted users (COGA) Impact of
different user groups needs to be looked at. Visual users vs.
non visual users / screen reader users.

<Zakim> bruce_bailey, you wanted to say that i like adjectival
rating over tally because critical aspects could be in
"average" and above

I.e. h2 to h5 , still usable ? Or a fail? How is it rated /
scored? As opposed to pass or fail.

Bruce Bailey: Critical are acceptable or above in my example.

Charles Hall: I wanted to add to the criticality severity
comments. If I'm evaluting a rubric against a subset vs. an
author of the website, I the author have the right to scope of
my page through a task flow

<JF> Not sure if we've arrived at consensus to Charles' point

<JF> \regarding scoping

Chris Loiselle to Charles: If I'm writing that wrong, please
add in your comments. Sorry!

<jeanne> +1 JOhn - The consensus on scope was a logical subset,
not a task

PeterKorn: Scoring with numbers is a very easy way to get lost
in what does 85 % mean? Adjectivity based approach may lead to
more progress quickly on scoring.

<PeterKorn> (to respond)

Lucy G: If adjectival route is the way we are going, numbers
will also help in the end as well. I.e. full one point can be
broken down as well. Requirement has its own scoring.

When it comes to a critical criteria, that is when we can
weight it.

Think of it as containers. Language would be its own container.
Is language more critical than headings?

<Chuck> we are at time to change scribes, and someone else will
need to monitor the participants list for raised hands, as I
will not be able to scribe and watch that list.

AndyS: Page is structured. Ads are present.

@Chuck. I can continue to scribe after a five minute break.

<CharlesHall> that was CharlesHall that mentioned the <aside>
as part of the scope

I will return very soon.

<Chuck> scribe: Chuck

<sajkaj> I can

Jeanne: Any followup on what Andy said?

<Zakim> JF, you wanted to respond to Peter's comment about
numbers

JF: responding to Peter's comments on number. Appreciating all
the issues a number means, my understanding is...
... This exercise is about getting a number. We currently have
100% or zero (in wcag 2.x)
... A number can be misleading at times. If anybody used the
chrome tool (lighthouse), at the end of the process...
... Chrome added a score. We don't know where that score came
from. If we start from premis that you never get 100, that
percentile becomes incentive for doing better (72%, let's try
to get to 85%).
... Peter, as much as numbers can be a rat hole, it's a
critical part of what we are trying to do in silver.

<ChrisLoiselle> Chuck, I can scribe again.

<ChrisLoiselle> Scribe: ChrisLoiselle

<Chuck> PK: I wasn't here for all of silver discussions... my
understanding is that the high order bit is to move away from
pass-fail perfection, to "mildly or largely".. numbers isn't
the goal.

<Chuck> PK: The goal is to get away from pass/fail. I like the
rubrik. We can evaluate the rubrik evaluation. Some things are
acceptable, some things are good.. or almost everything is
great but some are good...

PeterKorn: The goal was to move away from pass / fail. I like
the rubric. If we have a way to collect up the rubric
evaluations, most are very good, the product will be very good.

<Chuck> PK: Whether or not we assign numbers, we need to think
on severity, the people, and the impacts. I like adjectival
approach. I don't know what 87% means. I do know what
acceptable and very good means.

<CharlesHall> to PeterKorn’s point, a qualitative metric can be
converted to a quantitative score based on the number of
adjectival categories

<Chuck> JF: I agree with that statement, in regulatory
environment we need to hit a bar. "Intelectually honest" means
that there isn't something that is 100%.

<Chuck> Shawn: I think you are agreeing on different points.
Let's return to queue.

<Chuck> detlev: I was wondering... you mentioned the 9 user
accessibility needs. In the rating, there would be a plan to
issue different types of results for different impacts...?

<Chuck> detlev: Also, why is there a mismatch between user
accessibility needs in EU implementation (there you are missing
limited reach)...

<Chuck> detlev: you have two categories for people with hearing
problems (hard of.. and no). Is there a conscious decision to
drop the difference between hard of and no hearing?

<Chuck> Jeanne: It may be that I took an old version of the EU
directive. Can you send me a link to the current?

<Chuck> Detlev: I was wondering why users with limited reach
was left out. someone explained... I'm not convinced there's a
good reason to leave it out.

<CharlesHall> we have discussed adding functional needs to the
EN standard, like “intersectional”

<Chuck> Jeanne: Great. We did have a discussion of adding
things, for limited cognition, vestibular. We never discussed
dropping any, so I may have had an old version.

<Chuck> detlev: Is there an intention in all the scoring for
differentiated by user group? At one point it was a no-no. At
some point we decided we didn't want to differentiate, but not
sure if that has changed.

<Chuck> Shawn: We've been thinking about it more closely to
what John demonstrated in that for different guidelines looking
at the effects on different user needs given the task or scope
of testing.

<Chuck> Shawn: We don't have a fleshed out illustration of
that, but we have been considering. We want to accomplish
making sure that we are not inadvertently leaving user needs
out.

<Chuck> Shawn: For example, if it works great for limited
cognition but it completely leaves out users who use screen
readers, we want the ability to highlight that, and visa-versa.

<Chuck> Shawn: Like if everything is symantically accurate, but
visually not.

<Zakim> jeanne, you wanted to talk about testing criticality
and severity

<Chuck> Jeanne: I'm glad we are having conversation about
criticality. I know there's a lot of... people who feel it's
very important. But one of the things we agreed on in Silver is
we'll be data driven and research based.

<Chuck> Jeanne: whenever we tried to score real websites
against criticality we consistently didn't find a way to do
that in a way that didn't penalize people with some
disabilities.

<Chuck> Jeanne: We have to find a way to stop penalizing people
with some sorts of disabilities. We could not find a way to
test it that didn't structurally disadvantage people with low
vision and congitive issues.

<KimD> +1 to Jeanne

<Chuck> Jeanne: It's great to talk about in theory, but if you
want to propose, you need to show research that demonstrates
everyone is speaking equally.

<Chuck> lucy: Speaking to numbers, we need to offer something
that leaders can relate to. If we offer something confusing,
they will blank us out and do other tasks.

<AndyS> Comment: IMO the only way to treat all disabilities
equally involved a customization and personalization, so that
individual needs are accommodated *as needed*

<JF> +1 to Lucy

<Chuck> lucy: We have to have those numbers. A lawyer is not
going to understand what it means to have "this or that" level.
They want a number and a way to improve that number.

<Chuck> Rachael: I have 3 things to keep in mind. Been brought
up before, may be not in in this call. If we don't take
functionality into account, and we come up with a %, that %
will mean different things to different groups.

<Chuck> Rachael: 80% may mean one thing to ceasures and another
to a blind.

<Chuck> Rachael: If you have a group of individuals who fall in
the category of blind, vs people in coga, that hierarchy may
introduce discrimination.

<Chuck> Rachael: This is such a rich and fantastic discussion,
but whatever we come up with needs to be understandable and
simple.

<Chuck> JF: Addressing a comment from Jeanne. I'm support of
all user groups including coga, I do so in action and words.
The reality is that if we take that user group in
consideration, that is one user group.

<Chuck> JF: I want to recognize severities. Those heading
examples, if I remove visual structure, a person who has a
cognative issue and is blind is doubly disadvantaged.

<Chuck> JF: If we determine that an impact has a greater impact
against a group, we can factor that in. That's what I said 6
months ago. Not all things are created equal. We have to boil
this down to a score and strategies to improve the score.

<Chuck> Jeanne: I only disagree with the weighting.

<Chuck> Jeanne: "this is more severe than that". I don't have
an issue with your example. At the guideline level I think you
can do that.

<Chuck> Lucy: The weighting should be by criteria.

<Chuck> Lucy: So many disabilities are impacted by this or
that. I won't pit one against another, but if it's 4 vs 1, we
have no choice but to weight that.

<Chuck> Jeanne: I disagree. When you start looking at it as a
whole, there are too many things that we give guidance for that
are heavily weighted to blind vs cognative.

<Chuck> Jeanne: The way we are set up including Silver is that
we measure things granularly, and we say this is more important
than that, but when we look at it as a whole...

<CharlesHall> would a priority or severity not be functional
need agnostic?

<Chuck> Jeanne: granularly, this is a complete blocker, but
someone with a cognative issue may be able to work it out. But
in total the cognative issues become a blocker, because you
have to look at it in totality.

<Chuck> Jeanne: Cognative loses when we say "this individual
piece" is more important than "this one".

<jon_avila> I object to the notion that WCAG is heavily focused
on Blind and visual impairment. There are many WCAG criteria
that are aimed at a wide range of users with disabilities

<Chuck> Jeanne: That alt text is more important than
captioning. It's the totality of the website, it's all the
guidelines. If they are getting a lower weight because of each
individual guideline, they get a website that they cannot use.

<Chuck> Jeanne: Can someone else make the argument better?

<Chuck> Shawn: It's a complex topic and we should keep in mind
as we work through. We all have the same level goals for
conformance.

<Chuck> detlev: Just to get to Jeanne's argument about critical
issues, penalizing others. Why is that a problem? Why wouldn't
it be possible to ask all the groups involved to basically
identify show stopper issues. We know those.

<Chuck> detlev: keyboard trap, lack of captions, so on. Maybe
show stopper for cognative individuals. Basically ok to collect
those issues and make them critical issues.

<Chuck> detlev: Can you explain further Jeanne?

<Chuck> Jeanne: Let's honor queue.

<Zakim> bruce_bailey, you wanted to say numbers can be used to
measure progress, but i have never seen numbers that were
comparable from one website to another (or one tool to another)

<Chuck> Bruce: We've been trying (everyone!) to rate your
website since days of Bobby. All of these rules, all the years,
it's only ever useful for the developer to make progress. That
you aren't regressing.

<Chuck> Bruce: Lot's of tools will give you a percentage.

<Chuck> Bruce: Those only make a difference on one domain. You
can't compare an 87 on one site to an 86 on another site. Or
even cross tools. I don't feel like we will make progress if we
try to have granular scoring systems.

<Chuck> Bruce: Unless there are 1000s of data points. Enough
data points that you aren't doing manually that eventually the
weaknesses of the tool averages out.

<PeterKorn> +1 to Bruce

<Chuck> Bruce: I won't be able to say that one outstanding site
compares well to another.

<Chuck> Matt: Returning to yesterday, seems like there are 2
different criteria you are trying to assess. How well has the
author created the content, what score can you give them, vs
how do you make this understandable to the user.

<david-macdonald> +1 to bruce

<CharlesHall> to Detlev’s point, the challenge is not that one
functional need does or does not have a critical item. i think
the challenge is where 2 or more functional needs have a
critical item that is in conflict.

<Chuck> Matt: There's a bucket approach, there's probably going
to be something that they haven't met requirements to they met
them all. There are different methods which have different
impact on different groups.

<Chuck> Matt: That's different from the number score that the
producer gets. I think that these two different aspects need to
be viewed separately.

<Chuck> Shawn: Indeed.

<Chuck> JF: Two thoughts... Jeanne mentioned one requirement is
captions. If we look at the severity of captions. If you are
deaf and a video is missing captions, that's critical. If I
have a cognative issue and I have captions..

<Chuck> JF: That helps me. Offering captions to a cognative
user offers some benefit, but to a deaf user is completely
critical.

<JF>
[14]https://docs.google.com/spreadsheets/d/1EXw5W6SuMXk7mrFN33h
HVQuCn0DhKysXmUCKNML3SKg/edit#gid=108726882

[14] https://docs.google.com/spreadsheets/d/1EXw5W6SuMXk7mrFN33hHVQuCn0DhKysXmUCKNML3SKg/edit#gid=108726882

<Chuck> JF: How do we make this fair, realizing we can't get
every individual out there? In an earlier example (6-8 months
ago), I put forward a draft proposal, that had suggested that
as we were calculating scores we used a weighting mechanism,
and use that as a multiplier.

<Chuck> JF: We would ultimately have a better... more data
points. More data points will give us a better score.

<jeanne> COGA with individual guideline weighting.

<Chuck> Shawn: Logistically, I'd like to get through the queue
and then get to the last agenda item.

<Zakim> alastairc, you wanted to say that how we measure things
is more important than how we weight them

<Chuck> Shawn: This discussion has been great in taking in all
thing things we need to consider.

<Detlev> @CharlesHall: "challenge is where 2 or more functional
needs have a critical item that is in conflict" - can you speak
to that? I'd love to know where solving one showstopper issue
creates a real problem for another group...

<Chuck> Alastairc: Not sure weighting is necessarily going to
be the answer. A lot of the problems in terms of coverage in
wcag 2.x is how things are scoped and measured. Been clear with
addressing coga issues.

<Chuck> Alastairc: If there is a functional outcome that has an
equal weighting per guideline, that may be a reasonable way to
proceed. It's within the guideline to decide what content
authors need to do.

<Chuck> Alastairc: Let's start off even, and once we have
better coverage of guidelines. Maybe later we can weight the
guidelines when we have more. Let's start with "what is a
reasonable thing to ask content authors to do".

<Chuck> PK: Friction comes to mind. Every place where the
language is harder to puzzle out is friction. The accumulation
of friction can take a site from great to struggling to
impossible. The spoons model.

<Chuck> pk: I think the same concept applies to other
disabilities. What we thought of traditionally as pass/fail.
The header example, a page with 2 headers is a little bit of
friction, a formal failure. Does it prevent blind individuals
from using the page? Probably not.

<alastairc> +1 to thinking about friction, which again could be
dealt with quite granularity with good measurement, where low
scores across the board start adding up (or rather, not adding
up!)

<Chuck> PK: Mel Brooks silent movie has text in it, except for
one individual who has one word. Is it a problem for a deaf
person who wants to watch that movie? I think we can look at a
friction based model.

<JF> +1 to Peter's point - barriers are based on the functional
requirements of the different disability types

<Chuck> pk: ... come up with a notion that there is a little
bit of friction for blind folks because some small pages don't
have headings right, there's a lot more friction for a
cognative user. We can elevate a site and say that can be no
worse than "good"...

<JF> +1 to contemplating A,B,C,F scoring

<Chuck> pk: therefore a site does or doesn't make it. Back to
numbers. A+, B-, I think there are mechanisms that can be
fairly granular, like we are getting a C- and we want to get a
C+. We can get caught up in a scale of 100% and we are arguing
if 86 or 87 is good enough.

<Zakim> jeanne, you wanted to say to Detlev when each group
gets a show stopper, then a bonus is given to that group. But
for COGA, its the overall sum of all the guidelines. So they

<david-macdonald> +1 Peter had a great concept of "friction"...
at some point there is too much friction to use.

<Chuck> Jeanne: To detlev to weighting, when each group gets a
show stopper, then a bonus is given to that group. For coga
it's the overall.

<Chuck> Jeanne: Each group gets more points for show stoppers,
and coga gets less. Physical or sensory issues have more
points. But when coga looks at the overall score... like a
website gets a 93, but for coga it's not accessible.

<Chuck> Jeanne: For coga it's more of an overall issue. That's
why I say that people need to test their proposals across... on
real websites. We found that it wasn't reflecting the issues
for coga users correctly.

<alastairc> Jeanne - couldn't that be addressed by having
plenty of guidelines for COGA (based on things we can't fit in
WCAG 2.x), so that without meeting enough of those, you
wouldn't pass?

<Detlev> Can I answer to that directly (briefly)?

<Chuck> Jeanne: It's great to have these proposals. John's
proposal of putting the impact at the guideline level and we
trickle that down, that can work. But if we put weighting in,
I'll ask for real examples with real websites and show it's
fair.

<Chuck> Jeanne: I think we can do it without weighting and go
back to an author and say "here's total score, here is how it
breaks down by disability", I think that can work. I think we
can avoid the weighting issue.

<Chuck> detlev: I think you are mixing 2 different things.
Nobody argues that wcag doesn't have enough for coga. Coga
issues are underrepresented. I don't see a real conflict for
critical issues by groups.

<Chuck> detlev: cognative folk that are impacted by many
different things combined, if the new guidelines and rubriks
includes them, I would rather think of subtracting points.
Cognative score would show the deficiencies clearly.

<Chuck> detlev: You'd still have a way of showing off critical
issues by groups.

<jeanne> I would be interested to see a proposal with real
data. I would help with testing.

<Chuck> detlev: There are absolute show stopper for some users,
and we need to show that. Jeanne you said that these can drown
out coga issues, but I think that it can be reflected properly
and benefit coga users.

<Chuck> Andy: The cognative issue is such a complex subject. It
overlaps with neurological, senses, perceptions. There are so
many different varieties (A.D.D) will be different from
educational handicaps.

<jeanne> alastair, that is possible -- again, I would like to
see a proposal with mockup of the data with proposed
guidelines.

<Chuck> Andy: Becomes this big mess of how we divide up, should
we divide up... becomes a broad spectrum. Points towards the
ability to customize and personalize as the way to ensure that
all groups have equal access.

<alastairc> Jeanne - I think we need to leave
weighting/criticality until we have better coverage of
guidelines.

<Chuck> Andy: In terms of a triangle, the user is one point,
the author is another point, and the technology is the 3rd
point. Those 3 points need to work together in a way for every
user to be accommodated.

<alastairc> we aren't sure what / how-many new guidelines are
likely to come from COGA.

<Chuck> andy: I don't think there's a way for a website to
address everyone all the time. I understand what Jeanne is
saying in terms of weighting. Weight vision perception gets the
site up to a high level of acceptance, but the things that made
that...

<Lauriat> +1 to Alastair

<Chuck> andy: site a vision number may actually hurt coga
issues. high contrast can cause coga issues. There's a lot of
interaction there that can... how do we really make that into a
matrix where you push this up and the other comes down.

<Chuck> andy: Squeezing a toothpaste tube, one end decreases,
the other gets larger. I don't know that weighting is the way
of solving. I think customization is the way to achieve the
ultimate goal. These are thoughts in my mind.

<kirkwood> +1 to customization

<Chuck> Shawn: We are unlikely to come to complete resolution
to this conversation, we need to finish queue and move on.

<JF> FWIW, the Personalization TF is looking at 'customization"
today - but we lack the technology to make it happen today

<Chuck> Lucy: If any disability is poorly affected by any
system we come up with, that's our failing. We don't have the
data and don't know, and if it's still failing cognitive,
that's our responsibility to address that group.

<jeanne> Andy, that's why I like that proposal of putting data
into each guideline about each disability effected, but then
giving a total score for each disability.

<Chuck> Lucy: We have fixated for so long on what we know, we
still need to do the research and determine what we don't know.

<Chuck> lucy: I'll say that I don't know enough, but I do take
it into effect, and I want to know more.

<Chuck> Charles: Historic: we are all in general agreement that
this is complex, which is why it takes a long time to get
through one point of conversation, but I don't think it's
impossible to account for something that is critical to one
functional need in order to achieve a score.

<Lauriat> @JF: I'd like to know more about the technology
needed to make that happen, or at least prototype things out to
figure out how to make that happen.

<Chuck> Charles: If we have 9 functional need categories and
one has an issue, then they all do. Historic conversations,
where there's a conflict. Where there is a challenge is where
something in one category that is in conflict with another
issue in another category...

<Chuck> Charles: headings large and in one color may conflict
with someone where it could trigger anxiety. It's fine to
consider this, but we need to be aware of the conflicts.

Should we use IETF standard RFC 2119

<Chuck> Shawn: With that, we have a lot to think about and work
through. Let's move on to whether to use rfc 2119.
Must/should/must not/should not

<Detlev> @Charleshall: I think research shows ALL CAPS is
harder to read, cannot think who would benefit

<Chuck> Jeanne: W3C and standards organization around the world
rely on this particular RFC (request for comments). Very old,
used in technical standards by many standards orgs.

<Chuck> Jeanne: W3C in the past has said they don't recommend
it for guideline use, and not used in WCAG. Designed for
technical specs that require interoperability. More about...
John may have lots of examples.

<sajkaj> Think APIs

<Chuck> Jeanne: Because it's not included, we are interested in
including it in Silver. This is John's proposal. John...

<CharlesHall> @Detlev. i agree. was trying to create an example
on the fly since I couldn’t recall some of the specific
examples we discussed in the past. particularly with insights
from Cybelle.

<Chuck> JF: The part of the issue is that wcag has moved from
being a guideline to being a standard. That's the reality. We
have govts that say "you must meet wcag 2.x AA". because of
that, we need to have these bright and measurable points

<jon_avila> The ARIA specification is an example that uses
RFC2119

<Chuck> JF: To meet that requirement. When we talk about the
users, the 3 points, there's a 4th point... legal requirements.
RFC 2119 calls out must should and may.

<Chuck> JF: It's unambiguous and clear. Failing means you
didn't meet the requirement. It's about having clearly defined
requirements with explicit language.

<jeanne> [15]https://tools.ietf.org/html/rfc2119

[15] https://tools.ietf.org/html/rfc2119

<Chuck> JF: I want to use it, when we created a guideline, that
was guidance. Because we are at a point where we have
measurement and scoring... as part of that requirement we
should use very clear language.

<CharlesHall> +1 to JF on use of an unambiguous standard

<Chuck> JF: We've got clarity there.

<Zakim> alastairc, you wanted to say that must = the
requirement, and should/may wouldn't be suitable for stating
the requirement.

<Chuck> Alastairc: I think as it stands now... silver
structure... normative and informative... anything that is a
normative requirement is a must. A should or may would not be
normative requirement.

<Chuck> Alastairc: They are very tied together. We would need
to be very careful using that language in informative
documents. We've had charter issues, the language has strayed
into informative documents.

<Chuck> Alastairc: introduces unnecessary uses. I don't
disagree or agree, I think our current approach takes it into
account already.

<Chuck> DM: The sc were designed to be testable statements. If
the statement is true, you meet the criteria. Every one of
those statements is a must.

<jon_avila> I find it ironic that folks have said no disability
should be weighted over others -- yet it was said cognitive
disabilities are impacted by the wholistic site and thus trump
everything in terms of importance

<Chuck> DM: We do have must statements, we don't have shoulds
in the normative docs.

<Chuck> PK: I see value and intellectual rigor and honestly in
being explicit. We call these guidelines, they became
standards. The dirty secret is that no site can be perfect. If
we take on this kind of language we need to be clear that we
aren't going to say "must"

<alastairc> jon_avila I think there is just recognition that
there as been a gap, and there needs to be work to address the
gap.

<Chuck> PK: when the must is not achievable. I don't have
strong feeling for or against. But if we use this language, we
need to be clear that we are creating an impossible "must".

<Zakim> JF, you wanted to note that there is a real difference
(in RFC 2119) between MUST and must (case sensitive)

<jon_avila> For what it's worth - WCAG 2.0 and 2.1 are
standards according to WCAG itself - "The WCAG 2.0 document is
designed to meet the needs of those who need a stable,
referenceable technical standard."

<Chuck> JF: Alastair... concern about language. Must should and
may are always in upper case. MUST and must don't equal the
same thing.

<alastairc> let's not go down that route!

<Chuck> JF: "must" is conversational. That avoids some of that
problem.

<AndyS> Thats scary

<Chuck> JF: As peter noted, our guidelines have been made
standards. In terms of scoring, engineers need to have black
and white decisions. Everything we do is based on binary
decisions. In the must we have declared what that bright line
is.

<sajkaj> Methinks JF forgot about analog engineering!

<david-macdonald> There are no "Must", "should" or

<Jennie> I would be concerned with upper case and lower case
differences in meaning, from a cognitive standpoint.

Chris Loiselle comment on standard:
[16]https://www.iso.org/standard/58625.html, point to standard

[16] https://www.iso.org/standard/58625.html

<Chuck> Andy: I want to mention, all the other standards
organizations, they use this verbage.

<david-macdonald> "may" in WCAG 2 or 2.1

<JF> +1 to bright lines

<JF> bright lines make measuring easier

<Chuck> Andy: But if we were going to go there, it's important
to note that this is a very bright line. Requires additional
diligence to make sure that "shall" is really understood and
isn't going to create situations that cannot be absolutely
achieved.

<jeanne> I worry that accessibility needs are not oriented
toward bright lines.

<Chuck> Andy: With all of the many things we are talking about
that interact with each other. WCAG has them broken down in
different elements. These elements interact with eachother.
Bring Coga into the mix adds a layer of complexity and
conflicts.

<david-macdonald> There are no instances "Must", "should",
shall, or "may" in WCAG 2 or 2.1

<Chuck> Andy: If one "shall" conflicts with another "shall",
we'll get into trouble.

<jeanne> +1 Andy

<jon_avila> I agree that use of these RFC 2119 terms will only
complicate things

<Chuck> Andy: A bit more ambiguous from other standards from
other groups. ANSI specs on displays and fonts, their language
and examples are set in technology of the late 80's and early
90's.

<KimD> I'm concerned about internalization and not comfortable
saying we need to put ourselves in the shoes of legislators.

<Chuck> Andy: We start to get into ambiguous realm when we
discuss different browsers render fonts differently. I like the
idea of adopting this more affirmative use of terminology, but
brings a great deal of complication.

<alastairc> Is it worth trying this language out in a method?
That seems to be the most suitable place.

<Chuck> Lucy: I want to see it applied and see how it works,
and then when John responded... I say what Peter said... this
is not a possible thing to accomplish and remain accessible
itself.

<Lauriat> @Alastair: No, that would make tech-specific methods
normative.

<Chuck> Lucy: I like the idea, in the terms of what we have
been thinking of all along, I want to see it apply to some
examples first.

<alastairc> Um, I'm not sure it will help with the clear
language.

<Chuck> Lucy: I can't tell the difference between MUST and
"must".

<jon_avila> There are settings in screen readers to communicate
capitalization of text.

<Chuck> Shawn: My proposal is to go through the minutes and
pull out the pros and cons of going with this language and
keeping the current language.

<Chuck> Shawn: And then we can use that as a summarization for
folk who couldn't make it to this call.

<alastairc> Should a guideline include should/may?

<Chuck> JF: I pasted some code in RFC to address your concerns.

<Jennie> Won't the ARIA label only assist those using screen
readers, but not those with reading challenges with vision?

<KimD> +1 to Jennie

<david-macdonald> There are no instances "Must", "should",
shall, or "may" in WCAG 2 or 2.1 success criteria

<Chuck> Shawn: worth looking into annotations.

<jon_avila> ARIA labels on non-interactive text doesn't work
well with screen readers.

<jeanne> +1 Jennie

<alastairc> JF - would a guideline include should or may?

<Chuck> Shawn: With that, thank you everyone, and bringing
examples. Super helpful as a part of these complex topics.

<Chuck> Shawn: Anything else Jeanne?

<JF> @ Alastair - it could

<Chuck> Jeanne: Incredibly helpful, great conversations, we'll
keep working.

[End of minutes]
__________________________________________________________

Received on Tuesday, 10 March 2020 18:42:42 UTC