- From: Korn, Peter <pkorn@lab126.com>
- Date: Tue, 10 Mar 2020 20:53:15 +0000
- To: Jeanne Spellman <jspellman@spellmanconsulting.com>, Silver Task Force <public-silver@w3.org>
- Message-ID: <F8FA9573-367F-4E94-82C0-E2E56EAB7AAC@amazon.com>
Jeanne, all, One addition I would make to the meeting summary – that in our discussion of “friction”, we also explored how it applies beyond cognitive/language complexity issues. We discussed the example of a page with a small amount of text under two headings, one of which should be (but isn’t coded to be) an H2. This is a technical violation of the structure criterion, but in terms of actual real difficulty for a screen reader user, this is more likely some added friction that doesn’t actually block their effective use of the page, though it may briefly confuse them. We also discussed another example, counter to the statement that lack of captions in a video is an absolute blocker: Mel Brooks’ Silent Movie, which across 87 minutes has only a single piece of dialog – the word “no” uttered (in French) by the famous mime Marcel Marceau. This example of what is an absolute violation of the caption requirement to make media accessible, which also seems more like a bit of friction than an actual blocker for a Deaf/Hard of Hearing viewer’s ability to enjoy this movie (and fully appreciate its meaning) un-captioned. The take away for me from this portion of our discussion is that the concept of friction may be a way to harmonize both our existing approaches to success criteria in WCAG 2.x, with both the more nuanced situation in COGA, and the discussion around task-based analysis for understanding the impact. If any technical failures of WCAG 2.x SCs create at most a small amount of friction in completing tasks, and likewise not more than a broadly similar amount of friction is created along COGA lines, then we might see “equal weighting” of COGA and “traditional” accessibility criteria. And if such a “small amount of increased friction” can in turn align with an adjective<https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPxyE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643> (e.g. “acceptable” or “very good”) or if we prefer a letter grade (“B-” or “B+”) or if numbers must be used, a score (e.g. “85 out of 100”), then this might all align nicely. Regards, Peter -- Peter Korn | Director, Accessibility | Amazon Lab126 pkorn@amazon.com From: Jeanne Spellman <jspellman@spellmanconsulting.com> Date: Tuesday, March 10, 2020 at 11:44 AM To: Silver Task Force <public-silver@w3.org> Subject: [EXTERNAL]Summary and Minutes of Silver Virtual Meeting Tuesday Part 1 Resent-From: <public-silver@w3.org> Resent-Date: Tuesday, March 10, 2020 at 11:42 AM CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. == Summary == Homework Assignment: We reviewed the homework assignment to test the Scoring Example against a real website. Two people responded. One wrote a proposal for an adjectival scoring mechanism<https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPxyE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643>. We discussed pros and cons of using this mechanism. We need to test it against real websites. The other person wrote Testing Headings<http://john.foliot.ca/demos/HeadingsTestOne.html> web pages to illustrate specific examples in HTML of code that technically passed the guideline for Headings, but failed it against specific disabilities. This led to a detailed discussion and multiple verbal proposals of how scoring could be adapted to solve this problem. Please send written proposals so we can follow up with these ideas. · Having a minimum that if a tool fails it, it can't pass · Adding Functional Requirements to each Guidelines · Taking the Testing Headings examples and turning them into Methods and linking Adjectival Scoring to specific tests in the new Methods. · Assistive Technology will change mind at the moment, same as browser. Methods must stick to is code correct or not? Minimums: We discussed the Testing Headings examples for what was critical for which disabilities. This led into a complex discussion of criticality and how to insure that we account for critical needs. Some points made: · Agreement that we need to treat disabilities equally, but that some failures are more harmful than others. · We discussed the pros and cons of using adjectives or numbers for scoring. · We agreed to update functional user categories to the latest from the EU, but do want to be able to add categories for more granular breakdown for "limited cognition" and to add vestibular disorders. · If we don't take functionality into account, and we come up with a %, that % will mean different things to different groups. · We don't want to discriminate against one or more disability user groups. Whatever we come up with needs to simple and understandable (but we aren't in that phase yet). · Some members want to include weighting some guidelines more than others when accumulating a total score, others are strongly opposed to any mechanism that isn't well tested to insure that it does not discriminate against some disability groups. · One verbal proposal is to allow different disability groups to identify "show stopper" Guidelines. · Concerns that numeric scores in a granular scoring system would require thousands of data points to be valid and average out the weakness in the tool. · The challenge is not that one functional need does or does not have a critical item. The challenge is where 2 or more functional needs have a critical item that is in conflict. · The same guideline may be critical to one user group, but helpful to another user group, like captions are critical to hearing disabilities and helpful to cognitive disabilities. · How we measure things is more important than how we weight them. A lot of the problems in terms of coverage in wcag 2.x is how things are scoped and measured. · Let's start off even, and once we have better coverage of guidelines. Maybe later we can weight the guidelines when we have more. Let's start with "what is a reasonable thing to ask content authors to do". · Friction. Every place where the language is harder to puzzle out is friction. The accumulation of friction can take a site from great to struggling to impossible. Friction could be dealt with quite granularity with good measurement, where low scores across the board start adding up (or rather, not adding up!) · We could subtract data by user groups. · Concerns that people with cognitive disabilities are disadvantages when weighting exists. We need to test each proposal with data from real websites. · Adding more cognitive guidelines that will be possible in WCAG 3 could even out the disadvantage that COGA experiences today. · Some solutions for one disability cause problems for others: Too much visual contrast is bad for some groups. Large headings can trigger anxiety or be more difficult to read for screen magnifier users. More detailed written proposals are needed that can be tested with real websites. Should we use IETF standard RFC 2119<https://tools.ietf.org/html/rfc2119>? RFC 2119 defines specific meanings of MUST, SHOULD, MUST NOT, SHOULD NOT, etc. It is used by technical standards in many standards organizations. It is proposed that WCAG has advanced from being a guideline to a technical standard and could benefit from the precise language of RFC 2119. It's unambiguous and clear. Failing means you didn't meet the requirement. It's about having clearly defined requirements with explicit language. Measurement and scoring should be unambiguous. Comments: · WCAG 2.x doesn't use RFC 2119 because it is a guideline and W3C recommends keeping RFC 2119 use for technical specs. · The ARIA specification is an example that uses RFC2119. · I think as it stands now, the silver structure has both normative and informative. Anything that is a normative requirement is a must. A should or may would not be normative requirement. We would need to be very careful using that language in informative documents. We've had charter issues, the language has strayed into informative documents. · I see value and intellectual rigor and honesty in being explicit. We call these guidelines, they became standards. The dirty secret is that no site can be perfect. If we take on this kind of language we need to be clear that we aren't going to say "must". · In terms of scoring, engineers need to have black and white decisions. Everything we do is based on binary decisions. In the must we have declared what that bright line is. Bright lines make measuring easier. · Elements interact with each other. WCAG has them broken down in different elements. Bring needs of people with cognitive disabilities into the mix adds a layer of complexity and conflicts. · I'm concerned about internationalization and not comfortable saying we need to put ourselves in the shoes of legislators. · Concerns around accessibility of using the RFC2119 capitalization of MUST as it is not well identified by some assistive technologies. Agree that more specific proposals are needed. == Minutes == https://www.w3.org/2020/03/10-silver-minutes.html === Text of Minutes === [1]W3C [1] http://www.w3.org/ - DRAFT - Silver Virtual F2F Tuesday 10 Mar 2020 Attendees Present jeanne, sajkaj, ChrisLoiselle, Laura, Jennie, kirkwood, Lauriat, Lucy, alastairc, Makoto, Chuck, JF, stevelee, KimD, AndyS, PeterKorn, mattg, Rachael, Detlev Regrets Chair Shawn, jeanne Scribe ChrisLoiselle, Chuck Contents * [2]Topics 1. [3]Review of homework: what insights did people gain from it? 2. [4]Conformance and Minimums 3. [5]Should we use IETF standard RFC 2119 * [6]Summary of Action Items * [7]Summary of Resolutions __________________________________________________________ <ChrisLoiselle> Scribe: ChrisLoiselle Bruce Bailey: The directions for the conformance exercise for headings or visual contrast were misunderstood. Tallying for qualitative assessement may not work. I have written up some information that I'd like to share <bruce_bailey> [8]https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPx yE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643&range=A1 [8] https://docs.google.com/spreadsheets/d/1G0KLv1Nfvy5QWN7t9jPxyE6UEcTHE5A8tKYiDOhuZRY/edit#gid=1833982643&range=A1 Spreadsheet that Jeanne shared before, Sample Scoring example is name of the google sheet <AndyS> AndyS present+ Bruce Bailey: Ratings / Score is Outstanding / 4 , Very Good / 3 , Acceptable / 2 , Unacceptable / 1 <Zakim> bruce_bailey, you wanted to talk about conformance work Reviewing the heading , use of headings homework <kirkwood> well done Bruce! PeterKorn: Mirrors what we've been doing within Amazon. I really like the potential for this to work with scoring rubric. I.e. for this product release, these items are very good, these items are acceptable, etc. This is good. JF: This is getting better in terms of granularity. Where would we integrate en301 within the subsections? Where would we get into the 7 functional requirements? Jeanne: They will be in a different location. We may merge this into scoring. Lucy: Can you explain the scoring a bit more (to Bruce B.) Bruce: Not skipping a heading level , is outstanding / 4 or Very Good / 3. Bruce Bailey: If you skip levels, you at very best are at acceptable. Lucy: Acceptable to me seems that you've met every point. Actually, Very Good means you've hit every point. Bruce Bailey: I also looked at Clear Language and Visual Contrast of Text and went through the same rating Step 1 is to assign this to a web page in a website, then assign the website a number as well, so the numbers, mean, mode etc. would give a person a score for wcag 2.1 . 3 out of 4 guideline is rated as outstanding... A rubric that works for assigning silver, bronze, gold rating to websites could be used as well as the rating scores. Shawn L: Opens to JF for comments on his work <JF> [9]http://john.foliot.ca/demos/HeadingsTestOne.html [9] http://john.foliot.ca/demos/HeadingsTestOne.html JF: Shares a Testing Headings Sample Page. Talks to heading structure being used properly. Each heading has a class. Review of homework: what insights did people gain from it? The entire document is made up of headings, 18 heading 1's . My question is the score going to be the same as the previous example shared? The pages John shares are Testing and Scoring Headings - Master Page , [10]http://john.foliot.ca/demos/HeadingsTestOne.html [10] http://john.foliot.ca/demos/HeadingsTestOne.html Testing Headings - Test 1 , [11]http://john.foliot.ca/demos/HeadingsTestTwo.html [11] http://john.foliot.ca/demos/HeadingsTestTwo.html Testing Headings - Test 2 , [12]http://john.foliot.ca/demos/HeadingsTestThree.html [12] http://john.foliot.ca/demos/HeadingsTestThree.html Testing Headings - Test 3, [13]http://john.foliot.ca/demos/HeadingsTestFour.html [13] http://john.foliot.ca/demos/HeadingsTestFour.html At end of each page, the negative impact on functional requirements is listed in a table PeterKorn: These detailed examples are fantastic. Comment: Some of the examples are easy to find with a programmatic tool. If a tool could have found it, and you didn't do it, it is not acceptable. <jeanne> +1 Peter JF: Tools aren't going to catch all examples. I wouldn't outright fail for all users. Failing for some users is valid. The Functional Requirements are key to a scoring rubric. Shawn L: We can talk to this in minimums <david-macdonald> interesting that JAWS and NVDA announce the level without aria level for <div role="heading" class="level_2" ... Lucy G: I love the examples, John. I see Bruce's where a tally needs to add up to 100 points. Everything would be weighted and have weights within it. JF: A more granual score helps content creators as well. Lucy G: I think we are on the correct path. Jeanne: If we look at Bruce's example and drill down into more granual approach, would that help? JF: Explicit definition of semantic headings would be useful. ... The structure vs. the visual presentation helps cognitive disability user group. <Chuck> +1 to Jeanne's idea Jeanne: What if we took each of the 4 examples JF has and turned them into the correct thing, and turn those into methods? If we then reference the individual methods within Bruce's example to know what they need to review for HTML and scored that way? JF: How many methods could be constructed using the ACT rules format? <Lauriat> +1 Jeanne: Exactly, using ACT in methods. Scoring could reference those. Referencing ACT tests would work well in methods. David McD: JAWS reads those headings properly, wondering what accessibility supported score would be if semantic code is not totally correct? Normative and methods comment: Looking at Bruce's examples, column B could be methods. Techniques would be technology specific and non normative <JF> +1 to Lucy Lucy G: Assitive Technology will change mind at the moment, same as browser. Methods must stick to is code correct or not? <Chuck> +1 to lucy <jeanne> +1 Lucy <laura> +1 +1 to Lucy (Chris without scribing) Shawn L : To group: Let us shift gears to next agenda item Jeanne: Scoring examples topic: Testing real websites is best. It changed how I was approaching things. Test against a real website. Conformance and Minimums <sajkaj> +1 to Jeanne Shawn L: Criticality is important. How does one express it in a score and fundamental critical issue of accessibility? Very poor, acceptable , etc. Within Silver, do we want to be the ones drawing the line on critical issues? PeterKorn: Looking at JF's second example, if the page only had two headings, one a heading 1 and one a heading 2...usage without vision, hiearchy structure would be a fail, but would the impact to the user be significant? <JF> exactly Is it not usable without vision? We need to think of overall functionality and impact of the fail and how we view the site? JF: Peter , I agree. I built the examples where structure created for screen readers were ok. If I did unstyled divs, structure would off visually for sighted users (COGA) Impact of different user groups needs to be looked at. Visual users vs. non visual users / screen reader users. <Zakim> bruce_bailey, you wanted to say that i like adjectival rating over tally because critical aspects could be in "average" and above I.e. h2 to h5 , still usable ? Or a fail? How is it rated / scored? As opposed to pass or fail. Bruce Bailey: Critical are acceptable or above in my example. Charles Hall: I wanted to add to the criticality severity comments. If I'm evaluting a rubric against a subset vs. an author of the website, I the author have the right to scope of my page through a task flow <JF> Not sure if we've arrived at consensus to Charles' point <JF> \regarding scoping Chris Loiselle to Charles: If I'm writing that wrong, please add in your comments. Sorry! <jeanne> +1 JOhn - The consensus on scope was a logical subset, not a task PeterKorn: Scoring with numbers is a very easy way to get lost in what does 85 % mean? Adjectivity based approach may lead to more progress quickly on scoring. <PeterKorn> (to respond) Lucy G: If adjectival route is the way we are going, numbers will also help in the end as well. I.e. full one point can be broken down as well. Requirement has its own scoring. When it comes to a critical criteria, that is when we can weight it. Think of it as containers. Language would be its own container. Is language more critical than headings? <Chuck> we are at time to change scribes, and someone else will need to monitor the participants list for raised hands, as I will not be able to scribe and watch that list. AndyS: Page is structured. Ads are present. @Chuck. I can continue to scribe after a five minute break. <CharlesHall> that was CharlesHall that mentioned the <aside> as part of the scope I will return very soon. <Chuck> scribe: Chuck <sajkaj> I can Jeanne: Any followup on what Andy said? <Zakim> JF, you wanted to respond to Peter's comment about numbers JF: responding to Peter's comments on number. Appreciating all the issues a number means, my understanding is... ... This exercise is about getting a number. We currently have 100% or zero (in wcag 2.x) ... A number can be misleading at times. If anybody used the chrome tool (lighthouse), at the end of the process... ... Chrome added a score. We don't know where that score came from. If we start from premis that you never get 100, that percentile becomes incentive for doing better (72%, let's try to get to 85%). ... Peter, as much as numbers can be a rat hole, it's a critical part of what we are trying to do in silver. <ChrisLoiselle> Chuck, I can scribe again. <ChrisLoiselle> Scribe: ChrisLoiselle <Chuck> PK: I wasn't here for all of silver discussions... my understanding is that the high order bit is to move away from pass-fail perfection, to "mildly or largely".. numbers isn't the goal. <Chuck> PK: The goal is to get away from pass/fail. I like the rubrik. We can evaluate the rubrik evaluation. Some things are acceptable, some things are good.. or almost everything is great but some are good... PeterKorn: The goal was to move away from pass / fail. I like the rubric. If we have a way to collect up the rubric evaluations, most are very good, the product will be very good. <Chuck> PK: Whether or not we assign numbers, we need to think on severity, the people, and the impacts. I like adjectival approach. I don't know what 87% means. I do know what acceptable and very good means. <CharlesHall> to PeterKorn’s point, a qualitative metric can be converted to a quantitative score based on the number of adjectival categories <Chuck> JF: I agree with that statement, in regulatory environment we need to hit a bar. "Intelectually honest" means that there isn't something that is 100%. <Chuck> Shawn: I think you are agreeing on different points. Let's return to queue. <Chuck> detlev: I was wondering... you mentioned the 9 user accessibility needs. In the rating, there would be a plan to issue different types of results for different impacts...? <Chuck> detlev: Also, why is there a mismatch between user accessibility needs in EU implementation (there you are missing limited reach)... <Chuck> detlev: you have two categories for people with hearing problems (hard of.. and no). Is there a conscious decision to drop the difference between hard of and no hearing? <Chuck> Jeanne: It may be that I took an old version of the EU directive. Can you send me a link to the current? <Chuck> Detlev: I was wondering why users with limited reach was left out. someone explained... I'm not convinced there's a good reason to leave it out. <CharlesHall> we have discussed adding functional needs to the EN standard, like “intersectional” <Chuck> Jeanne: Great. We did have a discussion of adding things, for limited cognition, vestibular. We never discussed dropping any, so I may have had an old version. <Chuck> detlev: Is there an intention in all the scoring for differentiated by user group? At one point it was a no-no. At some point we decided we didn't want to differentiate, but not sure if that has changed. <Chuck> Shawn: We've been thinking about it more closely to what John demonstrated in that for different guidelines looking at the effects on different user needs given the task or scope of testing. <Chuck> Shawn: We don't have a fleshed out illustration of that, but we have been considering. We want to accomplish making sure that we are not inadvertently leaving user needs out. <Chuck> Shawn: For example, if it works great for limited cognition but it completely leaves out users who use screen readers, we want the ability to highlight that, and visa-versa. <Chuck> Shawn: Like if everything is symantically accurate, but visually not. <Zakim> jeanne, you wanted to talk about testing criticality and severity <Chuck> Jeanne: I'm glad we are having conversation about criticality. I know there's a lot of... people who feel it's very important. But one of the things we agreed on in Silver is we'll be data driven and research based. <Chuck> Jeanne: whenever we tried to score real websites against criticality we consistently didn't find a way to do that in a way that didn't penalize people with some disabilities. <Chuck> Jeanne: We have to find a way to stop penalizing people with some sorts of disabilities. We could not find a way to test it that didn't structurally disadvantage people with low vision and congitive issues. <KimD> +1 to Jeanne <Chuck> Jeanne: It's great to talk about in theory, but if you want to propose, you need to show research that demonstrates everyone is speaking equally. <Chuck> lucy: Speaking to numbers, we need to offer something that leaders can relate to. If we offer something confusing, they will blank us out and do other tasks. <AndyS> Comment: IMO the only way to treat all disabilities equally involved a customization and personalization, so that individual needs are accommodated *as needed* <JF> +1 to Lucy <Chuck> lucy: We have to have those numbers. A lawyer is not going to understand what it means to have "this or that" level. They want a number and a way to improve that number. <Chuck> Rachael: I have 3 things to keep in mind. Been brought up before, may be not in in this call. If we don't take functionality into account, and we come up with a %, that % will mean different things to different groups. <Chuck> Rachael: 80% may mean one thing to ceasures and another to a blind. <Chuck> Rachael: If you have a group of individuals who fall in the category of blind, vs people in coga, that hierarchy may introduce discrimination. <Chuck> Rachael: This is such a rich and fantastic discussion, but whatever we come up with needs to be understandable and simple. <Chuck> JF: Addressing a comment from Jeanne. I'm support of all user groups including coga, I do so in action and words. The reality is that if we take that user group in consideration, that is one user group. <Chuck> JF: I want to recognize severities. Those heading examples, if I remove visual structure, a person who has a cognative issue and is blind is doubly disadvantaged. <Chuck> JF: If we determine that an impact has a greater impact against a group, we can factor that in. That's what I said 6 months ago. Not all things are created equal. We have to boil this down to a score and strategies to improve the score. <Chuck> Jeanne: I only disagree with the weighting. <Chuck> Jeanne: "this is more severe than that". I don't have an issue with your example. At the guideline level I think you can do that. <Chuck> Lucy: The weighting should be by criteria. <Chuck> Lucy: So many disabilities are impacted by this or that. I won't pit one against another, but if it's 4 vs 1, we have no choice but to weight that. <Chuck> Jeanne: I disagree. When you start looking at it as a whole, there are too many things that we give guidance for that are heavily weighted to blind vs cognative. <Chuck> Jeanne: The way we are set up including Silver is that we measure things granularly, and we say this is more important than that, but when we look at it as a whole... <CharlesHall> would a priority or severity not be functional need agnostic? <Chuck> Jeanne: granularly, this is a complete blocker, but someone with a cognative issue may be able to work it out. But in total the cognative issues become a blocker, because you have to look at it in totality. <Chuck> Jeanne: Cognative loses when we say "this individual piece" is more important than "this one". <jon_avila> I object to the notion that WCAG is heavily focused on Blind and visual impairment. There are many WCAG criteria that are aimed at a wide range of users with disabilities <Chuck> Jeanne: That alt text is more important than captioning. It's the totality of the website, it's all the guidelines. If they are getting a lower weight because of each individual guideline, they get a website that they cannot use. <Chuck> Jeanne: Can someone else make the argument better? <Chuck> Shawn: It's a complex topic and we should keep in mind as we work through. We all have the same level goals for conformance. <Chuck> detlev: Just to get to Jeanne's argument about critical issues, penalizing others. Why is that a problem? Why wouldn't it be possible to ask all the groups involved to basically identify show stopper issues. We know those. <Chuck> detlev: keyboard trap, lack of captions, so on. Maybe show stopper for cognative individuals. Basically ok to collect those issues and make them critical issues. <Chuck> detlev: Can you explain further Jeanne? <Chuck> Jeanne: Let's honor queue. <Zakim> bruce_bailey, you wanted to say numbers can be used to measure progress, but i have never seen numbers that were comparable from one website to another (or one tool to another) <Chuck> Bruce: We've been trying (everyone!) to rate your website since days of Bobby. All of these rules, all the years, it's only ever useful for the developer to make progress. That you aren't regressing. <Chuck> Bruce: Lot's of tools will give you a percentage. <Chuck> Bruce: Those only make a difference on one domain. You can't compare an 87 on one site to an 86 on another site. Or even cross tools. I don't feel like we will make progress if we try to have granular scoring systems. <Chuck> Bruce: Unless there are 1000s of data points. Enough data points that you aren't doing manually that eventually the weaknesses of the tool averages out. <PeterKorn> +1 to Bruce <Chuck> Bruce: I won't be able to say that one outstanding site compares well to another. <Chuck> Matt: Returning to yesterday, seems like there are 2 different criteria you are trying to assess. How well has the author created the content, what score can you give them, vs how do you make this understandable to the user. <david-macdonald> +1 to bruce <CharlesHall> to Detlev’s point, the challenge is not that one functional need does or does not have a critical item. i think the challenge is where 2 or more functional needs have a critical item that is in conflict. <Chuck> Matt: There's a bucket approach, there's probably going to be something that they haven't met requirements to they met them all. There are different methods which have different impact on different groups. <Chuck> Matt: That's different from the number score that the producer gets. I think that these two different aspects need to be viewed separately. <Chuck> Shawn: Indeed. <Chuck> JF: Two thoughts... Jeanne mentioned one requirement is captions. If we look at the severity of captions. If you are deaf and a video is missing captions, that's critical. If I have a cognative issue and I have captions.. <Chuck> JF: That helps me. Offering captions to a cognative user offers some benefit, but to a deaf user is completely critical. <JF> [14]https://docs.google.com/spreadsheets/d/1EXw5W6SuMXk7mrFN33h HVQuCn0DhKysXmUCKNML3SKg/edit#gid=108726882 [14] https://docs.google.com/spreadsheets/d/1EXw5W6SuMXk7mrFN33hHVQuCn0DhKysXmUCKNML3SKg/edit#gid=108726882 <Chuck> JF: How do we make this fair, realizing we can't get every individual out there? In an earlier example (6-8 months ago), I put forward a draft proposal, that had suggested that as we were calculating scores we used a weighting mechanism, and use that as a multiplier. <Chuck> JF: We would ultimately have a better... more data points. More data points will give us a better score. <jeanne> COGA with individual guideline weighting. <Chuck> Shawn: Logistically, I'd like to get through the queue and then get to the last agenda item. <Zakim> alastairc, you wanted to say that how we measure things is more important than how we weight them <Chuck> Shawn: This discussion has been great in taking in all thing things we need to consider. <Detlev> @CharlesHall: "challenge is where 2 or more functional needs have a critical item that is in conflict" - can you speak to that? I'd love to know where solving one showstopper issue creates a real problem for another group... <Chuck> Alastairc: Not sure weighting is necessarily going to be the answer. A lot of the problems in terms of coverage in wcag 2.x is how things are scoped and measured. Been clear with addressing coga issues. <Chuck> Alastairc: If there is a functional outcome that has an equal weighting per guideline, that may be a reasonable way to proceed. It's within the guideline to decide what content authors need to do. <Chuck> Alastairc: Let's start off even, and once we have better coverage of guidelines. Maybe later we can weight the guidelines when we have more. Let's start with "what is a reasonable thing to ask content authors to do". <Chuck> PK: Friction comes to mind. Every place where the language is harder to puzzle out is friction. The accumulation of friction can take a site from great to struggling to impossible. The spoons model. <Chuck> pk: I think the same concept applies to other disabilities. What we thought of traditionally as pass/fail. The header example, a page with 2 headers is a little bit of friction, a formal failure. Does it prevent blind individuals from using the page? Probably not. <alastairc> +1 to thinking about friction, which again could be dealt with quite granularity with good measurement, where low scores across the board start adding up (or rather, not adding up!) <Chuck> PK: Mel Brooks silent movie has text in it, except for one individual who has one word. Is it a problem for a deaf person who wants to watch that movie? I think we can look at a friction based model. <JF> +1 to Peter's point - barriers are based on the functional requirements of the different disability types <Chuck> pk: ... come up with a notion that there is a little bit of friction for blind folks because some small pages don't have headings right, there's a lot more friction for a cognative user. We can elevate a site and say that can be no worse than "good"... <JF> +1 to contemplating A,B,C,F scoring <Chuck> pk: therefore a site does or doesn't make it. Back to numbers. A+, B-, I think there are mechanisms that can be fairly granular, like we are getting a C- and we want to get a C+. We can get caught up in a scale of 100% and we are arguing if 86 or 87 is good enough. <Zakim> jeanne, you wanted to say to Detlev when each group gets a show stopper, then a bonus is given to that group. But for COGA, its the overall sum of all the guidelines. So they <david-macdonald> +1 Peter had a great concept of "friction"... at some point there is too much friction to use. <Chuck> Jeanne: To detlev to weighting, when each group gets a show stopper, then a bonus is given to that group. For coga it's the overall. <Chuck> Jeanne: Each group gets more points for show stoppers, and coga gets less. Physical or sensory issues have more points. But when coga looks at the overall score... like a website gets a 93, but for coga it's not accessible. <Chuck> Jeanne: For coga it's more of an overall issue. That's why I say that people need to test their proposals across... on real websites. We found that it wasn't reflecting the issues for coga users correctly. <alastairc> Jeanne - couldn't that be addressed by having plenty of guidelines for COGA (based on things we can't fit in WCAG 2.x), so that without meeting enough of those, you wouldn't pass? <Detlev> Can I answer to that directly (briefly)? <Chuck> Jeanne: It's great to have these proposals. John's proposal of putting the impact at the guideline level and we trickle that down, that can work. But if we put weighting in, I'll ask for real examples with real websites and show it's fair. <Chuck> Jeanne: I think we can do it without weighting and go back to an author and say "here's total score, here is how it breaks down by disability", I think that can work. I think we can avoid the weighting issue. <Chuck> detlev: I think you are mixing 2 different things. Nobody argues that wcag doesn't have enough for coga. Coga issues are underrepresented. I don't see a real conflict for critical issues by groups. <Chuck> detlev: cognative folk that are impacted by many different things combined, if the new guidelines and rubriks includes them, I would rather think of subtracting points. Cognative score would show the deficiencies clearly. <Chuck> detlev: You'd still have a way of showing off critical issues by groups. <jeanne> I would be interested to see a proposal with real data. I would help with testing. <Chuck> detlev: There are absolute show stopper for some users, and we need to show that. Jeanne you said that these can drown out coga issues, but I think that it can be reflected properly and benefit coga users. <Chuck> Andy: The cognative issue is such a complex subject. It overlaps with neurological, senses, perceptions. There are so many different varieties (A.D.D) will be different from educational handicaps. <jeanne> alastair, that is possible -- again, I would like to see a proposal with mockup of the data with proposed guidelines. <Chuck> Andy: Becomes this big mess of how we divide up, should we divide up... becomes a broad spectrum. Points towards the ability to customize and personalize as the way to ensure that all groups have equal access. <alastairc> Jeanne - I think we need to leave weighting/criticality until we have better coverage of guidelines. <Chuck> Andy: In terms of a triangle, the user is one point, the author is another point, and the technology is the 3rd point. Those 3 points need to work together in a way for every user to be accommodated. <alastairc> we aren't sure what / how-many new guidelines are likely to come from COGA. <Chuck> andy: I don't think there's a way for a website to address everyone all the time. I understand what Jeanne is saying in terms of weighting. Weight vision perception gets the site up to a high level of acceptance, but the things that made that... <Lauriat> +1 to Alastair <Chuck> andy: site a vision number may actually hurt coga issues. high contrast can cause coga issues. There's a lot of interaction there that can... how do we really make that into a matrix where you push this up and the other comes down. <Chuck> andy: Squeezing a toothpaste tube, one end decreases, the other gets larger. I don't know that weighting is the way of solving. I think customization is the way to achieve the ultimate goal. These are thoughts in my mind. <kirkwood> +1 to customization <Chuck> Shawn: We are unlikely to come to complete resolution to this conversation, we need to finish queue and move on. <JF> FWIW, the Personalization TF is looking at 'customization" today - but we lack the technology to make it happen today <Chuck> Lucy: If any disability is poorly affected by any system we come up with, that's our failing. We don't have the data and don't know, and if it's still failing cognitive, that's our responsibility to address that group. <jeanne> Andy, that's why I like that proposal of putting data into each guideline about each disability effected, but then giving a total score for each disability. <Chuck> Lucy: We have fixated for so long on what we know, we still need to do the research and determine what we don't know. <Chuck> lucy: I'll say that I don't know enough, but I do take it into effect, and I want to know more. <Chuck> Charles: Historic: we are all in general agreement that this is complex, which is why it takes a long time to get through one point of conversation, but I don't think it's impossible to account for something that is critical to one functional need in order to achieve a score. <Lauriat> @JF: I'd like to know more about the technology needed to make that happen, or at least prototype things out to figure out how to make that happen. <Chuck> Charles: If we have 9 functional need categories and one has an issue, then they all do. Historic conversations, where there's a conflict. Where there is a challenge is where something in one category that is in conflict with another issue in another category... <Chuck> Charles: headings large and in one color may conflict with someone where it could trigger anxiety. It's fine to consider this, but we need to be aware of the conflicts. Should we use IETF standard RFC 2119 <Chuck> Shawn: With that, we have a lot to think about and work through. Let's move on to whether to use rfc 2119. Must/should/must not/should not <Detlev> @Charleshall: I think research shows ALL CAPS is harder to read, cannot think who would benefit <Chuck> Jeanne: W3C and standards organization around the world rely on this particular RFC (request for comments). Very old, used in technical standards by many standards orgs. <Chuck> Jeanne: W3C in the past has said they don't recommend it for guideline use, and not used in WCAG. Designed for technical specs that require interoperability. More about... John may have lots of examples. <sajkaj> Think APIs <Chuck> Jeanne: Because it's not included, we are interested in including it in Silver. This is John's proposal. John... <CharlesHall> @Detlev. i agree. was trying to create an example on the fly since I couldn’t recall some of the specific examples we discussed in the past. particularly with insights from Cybelle. <Chuck> JF: The part of the issue is that wcag has moved from being a guideline to being a standard. That's the reality. We have govts that say "you must meet wcag 2.x AA". because of that, we need to have these bright and measurable points <jon_avila> The ARIA specification is an example that uses RFC2119 <Chuck> JF: To meet that requirement. When we talk about the users, the 3 points, there's a 4th point... legal requirements. RFC 2119 calls out must should and may. <Chuck> JF: It's unambiguous and clear. Failing means you didn't meet the requirement. It's about having clearly defined requirements with explicit language. <jeanne> [15]https://tools.ietf.org/html/rfc2119 [15] https://tools.ietf.org/html/rfc2119 <Chuck> JF: I want to use it, when we created a guideline, that was guidance. Because we are at a point where we have measurement and scoring... as part of that requirement we should use very clear language. <CharlesHall> +1 to JF on use of an unambiguous standard <Chuck> JF: We've got clarity there. <Zakim> alastairc, you wanted to say that must = the requirement, and should/may wouldn't be suitable for stating the requirement. <Chuck> Alastairc: I think as it stands now... silver structure... normative and informative... anything that is a normative requirement is a must. A should or may would not be normative requirement. <Chuck> Alastairc: They are very tied together. We would need to be very careful using that language in informative documents. We've had charter issues, the language has strayed into informative documents. <Chuck> Alastairc: introduces unnecessary uses. I don't disagree or agree, I think our current approach takes it into account already. <Chuck> DM: The sc were designed to be testable statements. If the statement is true, you meet the criteria. Every one of those statements is a must. <jon_avila> I find it ironic that folks have said no disability should be weighted over others -- yet it was said cognitive disabilities are impacted by the wholistic site and thus trump everything in terms of importance <Chuck> DM: We do have must statements, we don't have shoulds in the normative docs. <Chuck> PK: I see value and intellectual rigor and honestly in being explicit. We call these guidelines, they became standards. The dirty secret is that no site can be perfect. If we take on this kind of language we need to be clear that we aren't going to say "must" <alastairc> jon_avila I think there is just recognition that there as been a gap, and there needs to be work to address the gap. <Chuck> PK: when the must is not achievable. I don't have strong feeling for or against. But if we use this language, we need to be clear that we are creating an impossible "must". <Zakim> JF, you wanted to note that there is a real difference (in RFC 2119) between MUST and must (case sensitive) <jon_avila> For what it's worth - WCAG 2.0 and 2.1 are standards according to WCAG itself - "The WCAG 2.0 document is designed to meet the needs of those who need a stable, referenceable technical standard." <Chuck> JF: Alastair... concern about language. Must should and may are always in upper case. MUST and must don't equal the same thing. <alastairc> let's not go down that route! <Chuck> JF: "must" is conversational. That avoids some of that problem. <AndyS> Thats scary <Chuck> JF: As peter noted, our guidelines have been made standards. In terms of scoring, engineers need to have black and white decisions. Everything we do is based on binary decisions. In the must we have declared what that bright line is. <sajkaj> Methinks JF forgot about analog engineering! <david-macdonald> There are no "Must", "should" or <Jennie> I would be concerned with upper case and lower case differences in meaning, from a cognitive standpoint. Chris Loiselle comment on standard: [16]https://www.iso.org/standard/58625.html, point to standard [16] https://www.iso.org/standard/58625.html <Chuck> Andy: I want to mention, all the other standards organizations, they use this verbage. <david-macdonald> "may" in WCAG 2 or 2.1 <JF> +1 to bright lines <JF> bright lines make measuring easier <Chuck> Andy: But if we were going to go there, it's important to note that this is a very bright line. Requires additional diligence to make sure that "shall" is really understood and isn't going to create situations that cannot be absolutely achieved. <jeanne> I worry that accessibility needs are not oriented toward bright lines. <Chuck> Andy: With all of the many things we are talking about that interact with each other. WCAG has them broken down in different elements. These elements interact with eachother. Bring Coga into the mix adds a layer of complexity and conflicts. <david-macdonald> There are no instances "Must", "should", shall, or "may" in WCAG 2 or 2.1 <Chuck> Andy: If one "shall" conflicts with another "shall", we'll get into trouble. <jeanne> +1 Andy <jon_avila> I agree that use of these RFC 2119 terms will only complicate things <Chuck> Andy: A bit more ambiguous from other standards from other groups. ANSI specs on displays and fonts, their language and examples are set in technology of the late 80's and early 90's. <KimD> I'm concerned about internalization and not comfortable saying we need to put ourselves in the shoes of legislators. <Chuck> Andy: We start to get into ambiguous realm when we discuss different browsers render fonts differently. I like the idea of adopting this more affirmative use of terminology, but brings a great deal of complication. <alastairc> Is it worth trying this language out in a method? That seems to be the most suitable place. <Chuck> Lucy: I want to see it applied and see how it works, and then when John responded... I say what Peter said... this is not a possible thing to accomplish and remain accessible itself. <Lauriat> @Alastair: No, that would make tech-specific methods normative. <Chuck> Lucy: I like the idea, in the terms of what we have been thinking of all along, I want to see it apply to some examples first. <alastairc> Um, I'm not sure it will help with the clear language. <Chuck> Lucy: I can't tell the difference between MUST and "must". <jon_avila> There are settings in screen readers to communicate capitalization of text. <JF> <span aria-label="RFC 2119 MUST">MUST</span> <Chuck> Shawn: My proposal is to go through the minutes and pull out the pros and cons of going with this language and keeping the current language. <Chuck> Shawn: And then we can use that as a summarization for folk who couldn't make it to this call. <alastairc> Should a guideline include should/may? <Chuck> JF: I pasted some code in RFC to address your concerns. <Jennie> Won't the ARIA label only assist those using screen readers, but not those with reading challenges with vision? <KimD> +1 to Jennie <david-macdonald> There are no instances "Must", "should", shall, or "may" in WCAG 2 or 2.1 success criteria <Chuck> JF: <discusses rfc-2119 must> <Chuck> Shawn: worth looking into annotations. <jon_avila> ARIA labels on non-interactive text doesn't work well with screen readers. <jeanne> +1 Jennie <alastairc> JF - would a guideline include should or may? <Chuck> Shawn: With that, thank you everyone, and bringing examples. Super helpful as a part of these complex topics. <Chuck> Shawn: Anything else Jeanne? <JF> @ Alastair - it could <Chuck> Jeanne: Incredibly helpful, great conversations, we'll keep working. [End of minutes] __________________________________________________________
Received on Tuesday, 10 March 2020 20:53:48 UTC