Re: Requirements draft - objectivity from Kerstin Probiesch on 2011-09-16 (public-wai-evaltf@w3.org from September 2011)

From: Kerstin Probiesch <k.probiesch@googlemail.com>
Date: Fri, 16 Sep 2011 10:52:39 +0200
To: Michael S Elledge <elledge@msu.edu>, "Velleman, Eric" <evelleman@bartimeus.nl>
Cc: Detlev Fischer <fischer@dias.de>, "public-wai-evaltf@w3.org" <public-wai-evaltf@w3.org>
Message-ID: <CAC6E-ZgdZ8dAGQDjHf1OsU14RRKD8Tz2gDZwXFgwOjQw0UDjdw@mail.gmail.com>
Hi all,

one could think that Objectivity is included in Reliability, because a
non-objective test is not reliable and in consequence also not valide.
In the worst case a test would not measure what we want to measure. I
wrote about that some mails ago.

As Objectivity is an important concept I think it is not only
necessary but essential to have it as own Requirement guided by an
explanation about biases which can influence the result of a test.
Every testing procedure can produce it's own violations against
Objectivity. As I see it there are three alternatives for an
Evaluation Methodology:

- Testing every SC on every page
- Testing a sample of X pages
- Testing a sample of X pages _and_ for those SCs which are not
violated testing those SCs on other pages and parts of the website

Leave for a moment feasiblity, pragmatism behind and just have a look
at Objectivity and Reliability.

1. Testing every SC on every page
A tester checks all  SCs on every page. After he/she gives the
protocol to an independant second tester with the same qualification.
The second tester can find out if every SC on every page was checked.
If not, there are two possibilties: A. the first tester overlooked
something. or B. he has not overlooked something, but the second
tester comes to another result.

One reason for A can be a measurement error, probably the tool was not
the right one for this test or is buggy. Measurement errors like this
don't belong to Objectivity or Reliability.

Some other reasons for A can be: the tester really overlooked
something ("measurement error") or and know "Objectivity" comes in:
the tester may think: Well the Layout is nice, so the Accessibility is
also nice or "I know the web agency, they are doing a good work, so
there is no need to have a look on all SCs on every page". A bias may
also be: "not enough time". More a possible - I suggest to collect
them in a document for every alternative

And It is very brief, probably it's clearer to imagine for those two
steps a group of testers and a group as second testers.

Let's have a look on B. Both checked every SC on every page, but not
with the same result. This is a question of Reliability as long as
there was not a measurement error (buggy tool) and of course is
influenced by pass/fail or score and score also be a bias.

It is really complex and what I'm writing are some considerations
about these things. If I would have the perfect test, than there would
be no need for discussions.

2. Testing a sample of X pages
For testing a sample of X pages a preliminary proceeding is needed.
During this a tester will not only have the collection of pages in
mind but will also have a look at the SCs - I think it's an illusion
that a tester will only look, wether a page is typical/common or not.
Even if there will be no protocoll for the preliminary proceeding, he
will have this "protocoll" in mind.

After this proceeding he has to decide which pages should be checked.
This is a very critical point, because on this point a tester can
influence the result in both directions. Of course this shouldn't
happen, but a tester is not a Buddha. One additional critical point at
this stage is the amount of pages. The more pages, the less
possibility for influencing a result is given.

This sound not nice I know, but we have to speak about this and should
not do as if we are the already mentioned Buddhas. There can be a lot
of biases on this point:

- It's a nice layout, so probably the tester don't look very deeply or
the tester don't like the layout...
- the tester knows the web developer (he/she is a friend of mine) or
he/she don't like the web developer

and so on

Even the character of the website can influence the selection of the
amount of X pages:

-  it's the website of a political party the tester likes/or don't
like or the second tester likes or don't like
- it's an organisation which a tester supports or don't support

Even the personal attitude towards some probable barriers could be a bias.

Another critical issue is what I have written in another mail: The
risk that a website owner counted the points/percents of passes/fails
before the test.

If the second tester or the second group of testers will come to
another result and it has consequence for the question accessible/not
accessible it could easily be a question of Objectivity. I'm not sure
if this is really controllable.

If the second independant tester comes to another result, when checked
the same pages, it's a question of reliability.If the second testers
come to same result even if they have checked other (content) pages,
we have a good degree of Objectivity and Reliability. If the second
tester/s are coming to a near-by result especially when they have
checked other content pages it's the question of the metrics.

How can we controll that when a web page contains videos that those
videos will be checked? How can we controll if the tester didn't check
videos, wether he/she didn't do so because he really overlooked them
or any other reasons?

And we also shouldn't forget the Levels A, AA, AAA...

The more the Methodology leaves pass/fail  (and maybe near-by which is
a question of the Metrics) and the less pages will be checked the more
critical it is and the more uncontrolled is the test.

3. Testing a sample of X pages _and_ those SCs which are not already
violated might be good way. There would be place for what the tester
found in the preliminary proceeding and for those SCs which are not
already violated. Even in this case there can be Violations against
Objectivity (and of course Reliaiblity). I haven't considered about
this alternative very deeply, but it could be a good way.

- The first Alternative is good for small websites but not very
comfortable for huge websites and we need for every test some more
testers to controll the biases. It might be the best and most
uncontrollable Methodology. But costs a lot of time.
- The second Alternative is good for websites when we have a 1:1
(amount of pages or less pages:amount of selected pages for the test)
and probably for Processes but I fear it closes the door to Validity
with the Conformance Requirements (even for smaller pages) and there
is a high risk for Objectivity and Reliability also. And to fail all
three Criteria is even more possible when we have a nearly
uncontrollable test design *and * are adding Tolerance/Metrrics.

I would suggest spending some time in discussing the Third Alternative
as a mix of page-centered and problem-centered is a good way.
Especially in the light of Conformance Level (met in full, satisfies
all the Level A Success Criteria). No test design is 100% but probably
this alternative guided by documents (how to test, definition of the
metrics, independant second tester or better group of testers)

I agree with Eric that we need a process for testing the Methodology.


Best

Kerstin

2011/9/15 Michael S Elledge <elledge@msu.edu>:
> Hi all--
>
> I'm not sure what is meant by a controlled test design. Is this the same as
> a test protocol?
>
> Also, when we are talking about objectivity, are we saying that a method
> must lead to an unbiased result, that the reviewer must be unbiased, our
> criteria are not subjective, or all three?
>
> A bit confused.
>
> Mike
>
> On Sep 15, 2011, at 4:24 AM, Kerstin Probiesch <k.probiesch@googlemail.com>
> wrote:
>
>> Hi Derlev, all,
>>
>> because one can not be sure about 100 percent objectivity a Test Design
>> should be a controlled test design. In our case - we haven't decided about
>> the Approach - this can happen for example over the amount of pages or the
>> amount of pages per SC. Also with other Deskriptions for Testing Procedures.
>>
>> Best
>>
>> Kerstin
>>
>> Via Mobile
>>
>> Am 15.09.2011 um 07:39 schrieb Detlev Fischer <fischer@dias.de>:
>>
>>> Quoting Kerstin Probiesch <k.probiesch@googlemail.com>:
>>>
>>>> Central question:
>>>>
>>>> Do we want that a tester can manipulate the results?
>>>
>>> DF: of course not, but this cannot be ensured by objectivity (whatever
>>> that would mean in practice) but only by some measure of quality control: a
>>> second tester or independent verification of results (also, verification of
>>> the adequacy of the page sample)
>>>>
>>>> I don't mean the case that something was overlooked but the case that
>>>> something was willingly overlooked. Or the other Way round.
>>>
>>> DF: Well, if someone wants to distort results there will probably always
>>> ways to do that, I would not start from that assumption. Is one imperfect or
>>> missing alt attributes TRUE or FALSE for SC 1.1.1 applied to the entire
>>> page? What about a less than perfect heading structure? etc, etc. There is,
>>> "objectively", always leeway, room for interpretation, and I think we
>>> unfortunately DO need agreement with reference to cases / examples that set
>>> out a model for how they should be rated.
>>>>
>>>> If not we need Objectivity as a Requirement. Just Agreement on something
>>>> is not enough.
>>>
>>> DF: Can you explain what in your view the requirement of "objectivity"
>>> should entail *in practice*, as part of the test procedure the methodology
>>> defines?
>>>
>>>>
>>>> And again: No Objectivity - no standardized methodology.
>>>>
>>>> Kerstin
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Via Mobile
>>>>
>>>> Am 14.09.2011 um 12:09 schrieb Detlev Fischer <fischer@dias.de>:
>>>>
>>>>> DF: Just one point on objective, objectivity:
>>>>> This is not an easy concept - it relies on a proof protocol. For
>>>>> example, you would *map* a page instance tested to a documented inventory of
>>>>> model cases to establish how you should rate it against a particular SC.
>>>>> Often this is easy, but there are many "not ideal" cases to be dealt with.
>>>>> So "objective" sounds nice but it does not remove the problem that
>>>>> there will be cases that do not fit the protocol, at which point a human (or
>>>>> group, community) will have to make an informed mapping decision or extend
>>>>> the protocol to include the new instance. I think "agreed interpretation"
>>>>> hits it nicely because there is the community element in it which is quite
>>>>> central to WCAG 2.0 (think of defining accessibility support)
>>>>>
>>>>> Regards,
>>>>> Detlev
>>>>>
>>>>>>
>>>>>> Comment (KP): I understand the Denis' arguments. The more I think
>>>>>> about
>>>>>> this: neither "unique interpretation" nor "agreed interpretation" work
>>>>>> very
>>>>>> well. I would like to suggest "Objective". Because of the following
>>>>>> reason:
>>>>>> It would be one of Criteria for the quality of tests and includes
>>>>>> Execution
>>>>>> objectivity, Analysis objectivity and Interpretation objectivity. If
>>>>>> we will
>>>>>> have in some cases 100% percent fine, if not we can discuss the
>>>>>> "tolerance".
>>>>>> I would suggest:
>>>>>>
>>>>>> (VC)  I'm still contemplating this one.  I can see both arguments as
>>>>>> plausible.
>>>>>> I'm okay with 'objectivity' but think it needs more explanation i.e.
>>>>>> who defines
>>>>>> how objective it is?
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> ---------------------------------------------------------------
>>> Detlev Fischer PhD
>>> DIAS GmbH - Daten, Informationssysteme und Analysen im Sozialen
>>> Geschäftsführung: Thomas Lilienthal, Michael Zapp
>>>
>>> Telefon: +49-40-43 18 75-25
>>> Mobile: +49-157 7-170 73 84
>>> Fax: +49-40-43 18 75-19
>>> E-Mail: fischer@dias.de
>>>
>>> Anschrift: Schulterblatt 36, D-20357 Hamburg
>>> Amtsgericht Hamburg HRB 58 167
>>> Geschäftsführer: Thomas Lilienthal, Michael Zapp
>>> ---------------------------------------------------------------
>>>
>>
>



-- 
-------------------------------------
Kerstin Probiesch - Freie Beraterin
Barrierefreiheit, Social Media, Webkompetenz
Kantstraße 10/19 | 35039 Marburg
Tel.: 06421 167002
E-Mail: k.probiesch@gmail.com
Web: http://www.barrierefreie-informationskultur.de

XING: http://www.xing.com/profile/Kerstin_Probiesch
Twitter: http://twitter.com/kprobiesch
------------------------------------

*** Neue Veröffentlichung ***

Barrierefreiheit verstehen und umsetzen:
Webstandards für ein zugängliches und nutzbares Internet
812 S., Dpunkt Verlag, Auflage: 1 (März 2011)
Kurzlink zu Amazon: http://is.gd/FIEntB
Received on Friday, 16 September 2011 08:53:17 UTC