W3C home > Mailing lists > Public > public-silver@w3.org > July 2019

Re: TED talk on algorithm bias

From: John Foliot <john.foliot@deque.com>
Date: Wed, 10 Jul 2019 13:16:06 -0500
Message-ID: <CAKdCpxxS5u2qRGTc9WGjdgM-UBhu_rmk42e-dofGzYQdrTDE4w@mail.gmail.com>
To: Jeanne Spellman <jspellman@spellmanconsulting.com>
Cc: Silver Task Force <public-silver@w3.org>
Thanks Jeanne for sharing these. I've not spend the requisite time with the
longer video, but did review the shorter one.

I have to say that I am personally concerned by the seemingly definitive
declaration that "...blind faith in big data must end..." as nobody
(certainly not me) has suggested that we put blind faith in anything. But
data is what we are working on and with; it is in many ways our stock in
trade, and is what "measurement" is all about. Measurement is data (whether
big or small).

*Without data, you have nothing but opinion.*

Ms. O'Neil states:

"(1:49) *Algorithms are opinions embedded in code...*"


That's one way of looking at it (Cathy O'Neil's *opinion*), however is it
the *only* way of looking at it?

I'll suggest that Ms. O'Neil has politicized "data" to fit her
narrative; Merriam
Webster *apolitically* defines algorithm
<https://www.merriam-webster.com/dictionary/algorithm>, as "...*broadly:* *a
step-by-step procedure for solving a problem or accomplishing some end.
... Algorithm is often paired with words specifying the activity for which
a set of rules have been designed.*"

That is what we are doing: we're defining "problems" (use-cases), and we're
also proposing methods (step-by-step procedures) and rules (Requirements)
to address those use-cases. Finally, we need a mechanism beyond Pass/Fail
(100% or 0%) to measure the progress or success of solving the use-case
scenario.

In WCAG, we assumed a simple Pass/Fail approach which we now know is
neither accurate nor fair, and so in Silver we're going with a "somewhere
between black and white - i.e. a shade of gray" approach.

Defining and measuring that gray will require math, and yes also opinions -
the opinions of experts and concerned parties in the field. What makes one
"method" preferable to another? Says who, and why? (Says experts, in their
opinion, based on experience and... that's right, data).

"(2:03) *That's a marketing trick ...because you trust and fear
mathematics...*"

Pfft. That is one woman's opinion - I neither fear nor trust math any more
than I fear or trust physics - I know our understanding and use of physics
is not always perfect, but like democracy, it's better than any of the
other options available to me.

"(7:20) *...and we have plenty of evidence of bias policing and justice
system data...*"

Am I the only one who finds it ironic that Ms. On'Neil is using selective
evidence and data to make her point that evidence and data is biased? IMHO,
she shot down her own argument right there, and she spends a good portion
of the remainder of that video using and interpreting specifically selected
data to make her point. For example, she surfaces *one use-case* of a
recidivism risk algorithm that resulted in cultural bias in Florida to
"prove" that her assertion that all algorithms have a bias: she used data
to arrive at a conclusion.

"(8:36) *...When they're secret, important and destructive...*"

If I found one important take-away from this video, it was this: *the
openness of our algorithm will be a critical component*. There is a world
of difference between "Black-box" algorithms and open and transparent
algorithms. Thankfully, we've already stated categorically that:

ScoringPoint scoring system - we have been working on a point system and
have a number of prototypes. This is what we most need help on. It must be
transparent and have rules that can be applied across different guidance.
We are not going to individually decide what a Method is worth because it
doesn't meet the needs of regulators for transparency, it doesn't scale,
and it is too vulnerable to influence.

(source:
https://docs.google.com/document/d/1wklZRJAIPzdp2RmRKZcVsyRdXpgFbqFF6i7gzCRqldc/edit#heading=h.acod2js7mcnj
)


Ms. O'Neil continues,

(8:52) "...*These are private companies building private algorithms for
private ends*..."


One of the advantages of doing this work in the W3C is to avoid this kind
of 'private company' bias. Will we be 'perfect' in that goal? Likely not,
because as Ms. O'Neil also noted, we don't live in a perfect world. But the
openness of the W3C in it's mission will hopefully ensure that whatever we
end up with will be *MORE* open than a proprietary system or solution. But
it still won't be perfect.

(9:43) "...*We know this, though, in aggregate*..."

In aggregate? You mean, like "big data"? Funny how, when it supports her
opinion, big data isn't so bad after all...

(11:29) "*...We should look to the blind orchestra audition as an example.
...the people who are listening have decided what's important and they've
decided what's not important...*"

OK, so not so much then that big-data is "evil", or that algorithms are
biased, but rather we need to be mindful of bias and decide what's
important and what's not, so that we construct an algorithm (set of tests
and steps) that, if not completely eliminates bias, flattens it
significantly. That's a world of difference from saying that algorithms are
"Weapons of Math Destruction".

(Additionally, I'll note that "experts", aka '*the people who are
listening' *decided what was and wasn't important, so they introduced a
bias, perhaps we can call it an informed bias, there as well: seemingly a
positive one that Ms. O'Neil's subsequent point that female employment
increased 5-fold proved. So bias, in-and-of-itself isn't the real problem
is it? *Rather, it's the awareness that bias plays in the calculation of
the data.*)

(12:12) "*...What is the cost of that failure?*"

Indeed. Everything - EVERYTHING - has a cost/benefit ratio, and at scale
regulators, lawyers, and their kind do risk analysis to weigh that
cost/benefit ratio.

This is why I've proposed that - all other things being equal - the greater
the cost for success, the greater the value(*) in our scoring algorithm. If
it costs more to accommodate and test to ensure that some users with some
disabilities are not left behind, that needs to be rewarded appropriately,
otherwise the cost/benefit ratio doesn't matter: the decision will be "Pay
the fine - it's cheaper", and sadly, I've personally lived through that
specific mind-set at a previous - un-named for obvious reasons -
employment, where a senior compliance person confided in me that they
figured that the executives were waiting for exactly that to happen before
they went any further. So this is a real thing too.

(* One of the things I'm still struggling with is our unit of measurement,
so that it can be applied *proportionately* across our "rules" and "sets of
steps" as part of the cost/benefit analysis.)

Ms. O'Neil talks about recognizing what is and isn't important, and
focusing on that to ensure the algorithm is un-biased. OK, but before we
can determine if there is any bias, we also need to be thinking about bias
towards whom? All users in aggregate, or specific users with specific needs
(and if the latter, to what level of specificity?)

There is no disagreement that currently, WCAG is today biased towards
people with cognitive disabilities, but before we can even make that
statement, we also have to recognize that people with cognitive
disabilities is not a monolithic block, even when they are a sub-set of
"all users". In real world terms however, they *are* a specific sub-set,
with needs and requirements that are different or enhanced over the needs
of others (that currently WCAG fails at). That's the definition of bias
right there: "*prejudice in favor of or against one thing, person, or group
compared with another*" (source:
https://diversity.ucsf.edu/resources/unconscious-bias) - *but to recognize
bias is to also recognize "groups". *For this reason, I continue to believe
that accounting for the needs of these different groups will be a factor in
the cost/benefit computation. And in fact, we've already spoken and thought
at length about how to ensure that our new scoring system cannot be "gamed"
to favor one group over another, so this Task Force has already accepted
that there are different "groups" with differing needs.

To state now that our scoring system should not account for different
user-groups in the scoring algorithm, while at the same time working
towards ensuring that different or specific user-groups are not
biased-against by our scoring system is a contradiction that I am
struggling with, and that I've not seen a valid response to.

In the end, whatever we emerge with will need to be *consistently* measurable,
repeatable, report-able, and scale-able across all sizes of sites and types
of content. And like it or not, all of our documentation to date includes
"points", and/or "values" which will need to be added (or subtracted,
multiplied or otherwise processed), so math *will* be involved. (And that's
OK.)

My $0.05 Cdn.

JF

On Wed, Jul 10, 2019 at 9:21 AM Jeanne Spellman <
jspellman@spellmanconsulting.com> wrote:

> Cyborg asked me to send this around and asks that those working on
> conformance watch it:
>
> TED Task:  Cathy O'Neil - Weapons of Math Destruction
>
> There is a short version and the full version
>
> Short version: https://www.youtube.com/watch?v=_2u_eHHzRto
>
> Full version: https://www.youtube.com/watch?v=TQHs8SA1qpk
>
> I watched the short version and  thought it was well done. It is about
> various kinds of bias and not specific to PwD.   Her points about the
> data of the past continuing a bias into the future are cautionary.   We
> do not collect big data and our formulas are not sophisticated AI
> algorithms, but the principles she cautions about apply, IMO.  There are
> people in accessibility doing research on algorithmic bias against PwD,
> and there are broader lessons from the research that could apply to our
> work.
>
>
>
>
>

-- 
*​John Foliot* | Principal Accessibility Strategist | W3C AC Representative
Deque Systems - Accessibility for Good
deque.com
Received on Wednesday, 10 July 2019 18:17:23 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:24:01 UTC