AW: Do we share an understanding of "requirement"? from Kerstin Probiesch on 2011-09-13 (public-wai-evaltf@w3.org from September 2011)

From: Kerstin Probiesch <k.probiesch@googlemail.com>
Date: Tue, 13 Sep 2011 15:23:16 +0200
To: "'Vivienne CONWAY'" <v.conway@ecu.edu.au>, "'Detlev Fischer'" <fischer@dias.de>, <public-wai-evaltf@w3.org>, "'Shadi Abou-Zahra'" <shadi@w3.org>
Message-ID: <4e6f588c.d612df0a.3f50.1518@mx.google.com>
Hi Vivienne, Detlev, Shadi, all,

I don't understand: how can we speak about feasible when we don't have
decided about the evaluation methodology itself? For me this would mean
doing the third step before the first. Feasible - I sometimes call it
possible (not quite a good term) - has nothing to do with Accessibility. It
is a claim of a website owner (time, money), a testing organization (time,
money, testers), a freelancer (time, money, and also testers) and so on.
Probably the same methodology can be feasible for a testing organization but
not for a freelancer alone, if the freelancer can build a team with other
freelancers he can make it feasible. What we don't have is "it". 

What do we have? We have the WCAG 2.0, we have the three already mentioned
main internationally known and recognized Criteria for the quality of tests
which have a lot to do with credibility - as long as the goal is not a
non-standardized test but a standardized. If we don't regard them which
*includes* metrics and *tolerance* we have a high risk (and not only that)
that our test will not measure what we want to measure but anything else,
for example how much a website owner wants to pay or how much time a testing
organization needs (for what?). One small example: If we want to find out
the sex of an individual (yes I know about the third) we ask: "Are you A
masculine or B feminine"? We don't ask: "Do you feel more A. masculine or B.
feminine?" Asking both questions is feasible (enough time, they can be
written on a paper etc.) but just the first question gives us the
opportunity to proof the answer: we can go to the person and have a look
;-). If we don't want to find out the sex but the gender (ok, this is very
very brief and there a lot of more aspects) we will not ask: Are you
masculine or feminine, because this is not what we want to know - but of
course it will also be "feasible" to ask this. What are we testing then?
Just: is it feasible to ask any question?

The process of the development of a test don't stops at the point of the
three criteria for good tests, the next step will be the decisions about
"tolerance" about "what is good enough" for measuring what we want to
measure: the WCAG 2.0. After that, one have to regard additional Critera for
the Quality of tests: on this point feasible comes in. Probably we find an
evaluation methodology which is highly reliable, highly objective and highly
valide but just a few clients would pay for that. Is it then feasible or
not? Some would say yes, some no. Probably we have the next best: reliable
"enough", "objective" and "highly valide" and probably more clients would
like to pay, than it would be more feasible. A No-Go - as I see the work of
this TF and even more the work of the W3C is: not reliable, enough
objective, not valide or the worst case: not reliable, not objective and not
valide at all - but highly feasible.

Also the wording _is_ important, not just because of the translation but
also because we as TF should also know about what we are speaking.

We all are using different methodologies, we all have our reasons for using
them - guided by different experiences, different ideas about what is
feasible, what should be or what must be "feasible", "ideas" in the widest
sense about Compliance and so on. Some are freelancers, some are working for
testing organizations. So we also have different personal situations. 

Some of us are very long in this "business", some not so long. Some of us
know how it was in 2002. In 2002 I started with all this and I think others
have other stories to tell or have been by that time more familiar with
testing: No WCAG 2.0, many more things than nowadays were difficult to
interpret, a lot of website owners who don't want to make their websites
accessibility - for example because of their ideas of "feasible" (well, we
have this now also, but I think we all see improvements) - and some even
don't wanted to pay for any evaluation and when I remember the testing
tools... one can't even speak about "testing tools" - not in the sense of
toolbars. I think nearly every month we were searching in google for
bookmarklets. We find out that they are good for this testing things: One
bookmarklet for resizing windows, one for removing background images, one
for this, one for that. Ok, a lot of these bookmarklets were released much
earlier - some in 1998 - but this is Germany, sometimes we need more time
;-) And then: Hooray! the beta of Accessibility Toolbar 1.0 for IE came in
December 2003 and was called AIS; the final version 1.0 came in July 2004
and it was time to delete a lot of bookmarklets. First there was a need to
check Hs in the code, after we worked with bookmarklets, after with toolbars
for different! Browsers and now we also have the toolbars and tools like
HeadingsMap and other extensions, even some automatic tools are not that
bad. So much more is feasible in less time than before. "feasible" depends
on many many factors. Even the time a tester needs for the same test will
not be the same. Some are working more quick than others, some know more
testing tools, some have more experience, more discipline. Time is money -
another aspect when speaking about "feasible" and people are no testing
machines. The money people would pay for tests differ from client to client
and from country to country. I still remember the question: "It will cost
something?" Now people are asking: "What's it going to cost?". 

Just some reasons why I think it is tricky to speak about feasible at this
point of our discussions and why feasible is *not* a main criteria for
credible tests and will never be - especially when the evaluation
methodology is not defined. We should not forget it, but we should not
stress it to much - not at this point. Probably in 2012 or 2013 which is
defined as endpoint of this Tasking Force (it's very sporty, or? ;-) ) much
more will be feasible than today. We do not know, we don't have a crystal
ball.

I think the most important question of all is not "Do we share an
understanding of requirement?"  The most important question has a lot to do
with what Denis wrote in his mail, when discussing R10: "I wish that we can
draw from everybody's experience and come up with something new and
improved, compared to our respective approaches."

Best

Kerstin

Today with a little bit of that we called in German "Oma erzählt vom Krieg"
(Grandma is telling about war). I think I don't need just linguee but also
some dictionaries for idioms.




> -----Ursprüngliche Nachricht-----
> Von: public-wai-evaltf-request@w3.org [mailto:public-wai-evaltf-
> request@w3.org] Im Auftrag von Vivienne CONWAY
> Gesendet: Dienstag, 13. September 2011 09:48
> An: Detlev Fischer; public-wai-evaltf@w3.org
> Betreff: RE: Do we share an understanding of "requirement"?
> 
> Hi Detlev & TF'ers
> 
> Detlev, as usual, you are making me think way too hard.  Just kidding
> of course.
> Yes, of course I agree with your 3 top points.  And Yes, I think I am
> probably being overly optimistic thinking that if it's designed
> properly everyone will get the same outcome for the same site.  If I'm
> perfectly honest, I may not even get the same answer twice for the same
> site.  I'm going to have to bow to your superior reasoning on this one.
> At the moment, we have no idea (at least till we build something)
> whether it is replicable.  Perhaps we need to propose a test we all
> carry out on a certain page and using our own techniques to test it
> against WCAG 2.0 AA amd see the answers?  This might give us an idea of
> how replicable our methods are.
> 
> 
> Regards
> 
> Vivienne L. Conway
> ________________________________________
> From: public-wai-evaltf-request@w3.org [public-wai-evaltf-
> request@w3.org] On Behalf Of Detlev Fischer [fischer@dias.de]
> Sent: Tuesday, 13 September 2011 3:35 PM
> To: public-wai-evaltf@w3.org
> Subject: Do we share an understanding of "requirement"?
> 
> Hi everyone,
> 
> I am getting quite concerned myself now, so please forgive me if I
> break
> my promise to “stay shtum” to kick off a discussion about we mean when
> we are using the term *requirement*.
> 
> 1) Do we agree that we should not include requirements for
>     attributes which we have not shown to be *feasible*?
> 
> 2) Do we agree that a requirement identifies a *necessary* attribute,
>     capability, characteristic, or quality of a system in order for
>     it to have value and utility to a user?
> 
> 3) Do we further agree that requirements should be *verifiable*, i.e.
>     that tests can eventually prove that the thing built (our
>     methodology, in this case) meets the requirements we have
> specified?
> 
> If we agree on these three points (and I hope we do) then R03: Unique
> interpretation and R04: Replicability should be first of all feasible;
> they should be shown to be necessary (e.g., the methodology would have
> reduced credibility without them); finally, they should also be
> verifiable (e.g. replicability and uniqueness of interpretation can be
> proven in independent tests of a real-world sites).
> 
> If you agree so far, were do we stand in this?
> 
> *Feasible:* I have not read a single statement on this mailing list so
> far that has offered any evidence that replicability and unique
> (unambiguous) interpretation are feasible  -  especially if the
> methodology stays on a fairly generic level (i.e., if it does not
> prescribe the tools to be used, a step-by-step procedure, and detailed
> instructions for evaluating test results).
> 
> *Verifiable:*  We do not know yet, we have not built anything so far
> that we could use to carry out tests independently and then compare
> results. So let’s move on to second-best, the various methods we
> currently use. I would ask all of you to report on any tests that were
> carried out by two independent testers and arrived at the same result.
> No one has come forward and claimed it has happened, or even, that it
> can be done.
> 
> *Necessary:*  Some of you may believe that replicability and uniqueness
> of interpretation are necessary because the methodology would be less
> credible without them. But unless the methodology mandates that tests
> are actually replicated, the claim of replicability is just a red
> herring. I think that any claims that cannot be verified in practical
> application seriously undermine the credibility of a methodology.
> 
> Detlev
> 
> --
> ---------------------------------------------------------------
> Detlev Fischer PhD
> DIAS GmbH - Daten, Informationssysteme und Analysen im Sozialen
> Geschäftsführung: Thomas Lilienthal, Michael Zapp
> 
> Telefon: +49-40-43 18 75-25
> Mobile: +49-157 7-170 73 84
> Fax: +49-40-43 18 75-19
> E-Mail: fischer@dias.de
> 
> Anschrift: Schulterblatt 36, D-20357 Hamburg
> Amtsgericht Hamburg HRB 58 167
> Geschäftsführer: Thomas Lilienthal, Michael Zapp
> ---------------------------------------------------------------
> 
> This e-mail is confidential. If you are not the intended recipient you
> must not disclose or use the information contained within. If you have
> received it in error please return it to the sender via reply e-mail
> and delete any record of it from your system. The information contained
> within is not the opinion of Edith Cowan University in general and the
> University accepts no liability for the accuracy of the information
> provided.
> 
> CRICOS IPC 00279B
Received on Tuesday, 13 September 2011 13:20:54 UTC