Re: AW: Do we share an understanding of "requirement"? from Detlev Fischer on 2011-09-13 (public-wai-evaltf@w3.org from September 2011)

From: Detlev Fischer <fischer@dias.de>
Date: Tue, 13 Sep 2011 17:14:47 +0200
To: public-wai-evaltf@w3.org
Message-ID: <20110913171447.2149665u518zr1if@webmail.dias.de>
When I brought up feasibility, I meant to suggest that at this  
important stage of setting requirements for our methodology, we should  
go over the list and check that we require only things that we believe  
we can achieve.

I am quite happy if we treat replicability and uniqueness of  
interpretation as ideals that can guide us in making the methodology  
more unambiguous, knowing that we may actually never reach them.  
Personally, I prefer the term reliability which seems more amenable to  
qualification (ways of aggregation of instance results, tolerances,  
etc.).

Kerstin, thanks for the "auntie talks about the war" mail. It does put  
things into perspective. You are right that testing tools emerge all  
the time, so the best way to evaluate is a moving target.

Checking a modern web page with stages, image carousels,  
mega-dropdowns etc is quite a task. Things have gotten a lot more  
difficult compared to the days of WCAG 1.0 and generally more static  
sites. If we remain tool-agnostic (and we may have to), one tester  
will use Firebug, the next one will use aChecker, the third one will  
check the source code, etc.

Sp picking up Michael's point, given that scenario I still have my  
doubts that any two testers will ever come up with the same result  
even for one (complex) page. But maybe I'll see the day...

I was interested to hear Eric report on double tests with identical  
results that used different tools mapping onto UWEM 1.0 /- is any of  
this documented and still available? (I guess the sites won't exist in  
the stage they were in when they were tested - moving target again.)

Regards,
Detlev

Quoting Michael S Elledge <elledge@msu.edu>:

> Hi Everyone--
>
> I think there may be some confusion about the use of the term  
> "feasible." I understood it to mean the testing methodology itself,  
> as in "Is it feasible to determine if this item is accessible?", not  
> whether it was affordable, or important enough to the client to do  
> it. In other words, do we have the tools and methods to determine if  
> something is accessible. I think we should be "client attitude"  
> agnostic, i.e., not let requirements be determined by whether  
> someone wants to do them, but whether someone is capable of doing  
> them.
>
> Another point. "Replication" to me means that someone is able to  
> conduct the test and come up with the same result for that specific  
> checkpoint or criteria. If someone chooses to do a different set of  
> pages the overall results may of course be different. It is my  
> experience that two people using the same protocol will almost  
> always come to the same conclusion about whether a web page is  
> accessible so long as the criteria and methodology are clearly  
> defined.
>
> Mike
>
> On 9/13/2011 9:23 AM, Kerstin Probiesch wrote:
>> Hi Vivienne, Detlev, Shadi, all,
>>
>> I don't understand: how can we speak about feasible when we don't have
>> decided about the evaluation methodology itself? For me this would mean
>> doing the third step before the first. Feasible - I sometimes call it
>> possible (not quite a good term) - has nothing to do with Accessibility. It
>> is a claim of a website owner (time, money), a testing organization (time,
>> money, testers), a freelancer (time, money, and also testers) and so on.
>> Probably the same methodology can be feasible for a testing organization but
>> not for a freelancer alone, if the freelancer can build a team with other
>> freelancers he can make it feasible. What we don't have is "it".
>>
>> What do we have? We have the WCAG 2.0, we have the three already mentioned
>> main internationally known and recognized Criteria for the quality of tests
>> which have a lot to do with credibility - as long as the goal is not a
>> non-standardized test but a standardized. If we don't regard them which
>> *includes* metrics and *tolerance* we have a high risk (and not only that)
>> that our test will not measure what we want to measure but anything else,
>> for example how much a website owner wants to pay or how much time a testing
>> organization needs (for what?). One small example: If we want to find out
>> the sex of an individual (yes I know about the third) we ask: "Are you A
>> masculine or B feminine"? We don't ask: "Do you feel more A. masculine or B.
>> feminine?" Asking both questions is feasible (enough time, they can be
>> written on a paper etc.) but just the first question gives us the
>> opportunity to proof the answer: we can go to the person and have a look
>> ;-). If we don't want to find out the sex but the gender (ok, this is very
>> very brief and there a lot of more aspects) we will not ask: Are you
>> masculine or feminine, because this is not what we want to know - but of
>> course it will also be "feasible" to ask this. What are we testing then?
>> Just: is it feasible to ask any question?
>>
>> The process of the development of a test don't stops at the point of the
>> three criteria for good tests, the next step will be the decisions about
>> "tolerance" about "what is good enough" for measuring what we want to
>> measure: the WCAG 2.0. After that, one have to regard additional Critera for
>> the Quality of tests: on this point feasible comes in. Probably we find an
>> evaluation methodology which is highly reliable, highly objective and highly
>> valide but just a few clients would pay for that. Is it then feasible or
>> not? Some would say yes, some no. Probably we have the next best: reliable
>> "enough", "objective" and "highly valide" and probably more clients would
>> like to pay, than it would be more feasible. A No-Go - as I see the work of
>> this TF and even more the work of the W3C is: not reliable, enough
>> objective, not valide or the worst case: not reliable, not objective and not
>> valide at all - but highly feasible.
>>
>> Also the wording _is_ important, not just because of the translation but
>> also because we as TF should also know about what we are speaking.
>>
>> We all are using different methodologies, we all have our reasons for using
>> them - guided by different experiences, different ideas about what is
>> feasible, what should be or what must be "feasible", "ideas" in the widest
>> sense about Compliance and so on. Some are freelancers, some are working for
>> testing organizations. So we also have different personal situations.
>>
>> Some of us are very long in this "business", some not so long. Some of us
>> know how it was in 2002. In 2002 I started with all this and I think others
>> have other stories to tell or have been by that time more familiar with
>> testing: No WCAG 2.0, many more things than nowadays were difficult to
>> interpret, a lot of website owners who don't want to make their websites
>> accessibility - for example because of their ideas of "feasible" (well, we
>> have this now also, but I think we all see improvements) - and some even
>> don't wanted to pay for any evaluation and when I remember the testing
>> tools... one can't even speak about "testing tools" - not in the sense of
>> toolbars. I think nearly every month we were searching in google for
>> bookmarklets. We find out that they are good for this testing things: One
>> bookmarklet for resizing windows, one for removing background images, one
>> for this, one for that. Ok, a lot of these bookmarklets were released much
>> earlier - some in 1998 - but this is Germany, sometimes we need more time
>> ;-) And then: Hooray! the beta of Accessibility Toolbar 1.0 for IE came in
>> December 2003 and was called AIS; the final version 1.0 came in July 2004
>> and it was time to delete a lot of bookmarklets. First there was a need to
>> check Hs in the code, after we worked with bookmarklets, after with toolbars
>> for different! Browsers and now we also have the toolbars and tools like
>> HeadingsMap and other extensions, even some automatic tools are not that
>> bad. So much more is feasible in less time than before. "feasible" depends
>> on many many factors. Even the time a tester needs for the same test will
>> not be the same. Some are working more quick than others, some know more
>> testing tools, some have more experience, more discipline. Time is money -
>> another aspect when speaking about "feasible" and people are no testing
>> machines. The money people would pay for tests differ from client to client
>> and from country to country. I still remember the question: "It will cost
>> something?" Now people are asking: "What's it going to cost?".
>>
>> Just some reasons why I think it is tricky to speak about feasible at this
>> point of our discussions and why feasible is *not* a main criteria for
>> credible tests and will never be - especially when the evaluation
>> methodology is not defined. We should not forget it, but we should not
>> stress it to much - not at this point. Probably in 2012 or 2013 which is
>> defined as endpoint of this Tasking Force (it's very sporty, or? ;-) ) much
>> more will be feasible than today. We do not know, we don't have a crystal
>> ball.
>>
>> I think the most important question of all is not "Do we share an
>> understanding of requirement?"  The most important question has a lot to do
>> with what Denis wrote in his mail, when discussing R10: "I wish that we can
>> draw from everybody's experience and come up with something new and
>> improved, compared to our respective approaches."
>>
>> Best
>>
>> Kerstin
>>
>> Today with a little bit of that we called in German "Oma erzählt vom Krieg"
>> (Grandma is telling about war). I think I don't need just linguee but also
>> some dictionaries for idioms.
>>
>>
>>
>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: public-wai-evaltf-request@w3.org [mailto:public-wai-evaltf-
>>> request@w3.org] Im Auftrag von Vivienne CONWAY
>>> Gesendet: Dienstag, 13. September 2011 09:48
>>> An: Detlev Fischer; public-wai-evaltf@w3.org
>>> Betreff: RE: Do we share an understanding of "requirement"?
>>>
>>> Hi Detlev&  TF'ers
>>>
>>> Detlev, as usual, you are making me think way too hard.  Just kidding
>>> of course.
>>> Yes, of course I agree with your 3 top points.  And Yes, I think I am
>>> probably being overly optimistic thinking that if it's designed
>>> properly everyone will get the same outcome for the same site.  If I'm
>>> perfectly honest, I may not even get the same answer twice for the same
>>> site.  I'm going to have to bow to your superior reasoning on this one.
>>> At the moment, we have no idea (at least till we build something)
>>> whether it is replicable.  Perhaps we need to propose a test we all
>>> carry out on a certain page and using our own techniques to test it
>>> against WCAG 2.0 AA amd see the answers?  This might give us an idea of
>>> how replicable our methods are.
>>>
>>>
>>> Regards
>>>
>>> Vivienne L. Conway
>>> ________________________________________
>>> From: public-wai-evaltf-request@w3.org [public-wai-evaltf-
>>> request@w3.org] On Behalf Of Detlev Fischer [fischer@dias.de]
>>> Sent: Tuesday, 13 September 2011 3:35 PM
>>> To: public-wai-evaltf@w3.org
>>> Subject: Do we share an understanding of "requirement"?
>>>
>>> Hi everyone,
>>>
>>> I am getting quite concerned myself now, so please forgive me if I
>>> break
>>> my promise to “stay shtum” to kick off a discussion about we mean when
>>> we are using the term *requirement*.
>>>
>>> 1) Do we agree that we should not include requirements for
>>>     attributes which we have not shown to be *feasible*?
>>>
>>> 2) Do we agree that a requirement identifies a *necessary* attribute,
>>>     capability, characteristic, or quality of a system in order for
>>>     it to have value and utility to a user?
>>>
>>> 3) Do we further agree that requirements should be *verifiable*, i.e.
>>>     that tests can eventually prove that the thing built (our
>>>     methodology, in this case) meets the requirements we have
>>> specified?
>>>
>>> If we agree on these three points (and I hope we do) then R03: Unique
>>> interpretation and R04: Replicability should be first of all feasible;
>>> they should be shown to be necessary (e.g., the methodology would have
>>> reduced credibility without them); finally, they should also be
>>> verifiable (e.g. replicability and uniqueness of interpretation can be
>>> proven in independent tests of a real-world sites).
>>>
>>> If you agree so far, were do we stand in this?
>>>
>>> *Feasible:* I have not read a single statement on this mailing list so
>>> far that has offered any evidence that replicability and unique
>>> (unambiguous) interpretation are feasible  -  especially if the
>>> methodology stays on a fairly generic level (i.e., if it does not
>>> prescribe the tools to be used, a step-by-step procedure, and detailed
>>> instructions for evaluating test results).
>>>
>>> *Verifiable:*  We do not know yet, we have not built anything so far
>>> that we could use to carry out tests independently and then compare
>>> results. So let’s move on to second-best, the various methods we
>>> currently use. I would ask all of you to report on any tests that were
>>> carried out by two independent testers and arrived at the same result.
>>> No one has come forward and claimed it has happened, or even, that it
>>> can be done.
>>>
>>> *Necessary:*  Some of you may believe that replicability and uniqueness
>>> of interpretation are necessary because the methodology would be less
>>> credible without them. But unless the methodology mandates that tests
>>> are actually replicated, the claim of replicability is just a red
>>> herring. I think that any claims that cannot be verified in practical
>>> application seriously undermine the credibility of a methodology.
>>>
>>> Detlev
>>>
>>> --
>>> ---------------------------------------------------------------
>>> Detlev Fischer PhD
>>> DIAS GmbH - Daten, Informationssysteme und Analysen im Sozialen
>>> Geschäftsführung: Thomas Lilienthal, Michael Zapp
>>>
>>> Telefon: +49-40-43 18 75-25
>>> Mobile: +49-157 7-170 73 84
>>> Fax: +49-40-43 18 75-19
>>> E-Mail: fischer@dias.de
>>>
>>> Anschrift: Schulterblatt 36, D-20357 Hamburg
>>> Amtsgericht Hamburg HRB 58 167
>>> Geschäftsführer: Thomas Lilienthal, Michael Zapp
>>> ---------------------------------------------------------------
>>>
>>> This e-mail is confidential. If you are not the intended recipient you
>>> must not disclose or use the information contained within. If you have
>>> received it in error please return it to the sender via reply e-mail
>>> and delete any record of it from your system. The information contained
>>> within is not the opinion of Edith Cowan University in general and the
>>> University accepts no liability for the accuracy of the information
>>> provided.
>>>
>>> CRICOS IPC 00279B
>>
>>
>>
>
>



--
---------------------------------------------------------------
Detlev Fischer PhD
DIAS GmbH - Daten, Informationssysteme und Analysen im Sozialen
Geschäftsführung: Thomas Lilienthal, Michael Zapp

Telefon: +49-40-43 18 75-25
Mobile: +49-157 7-170 73 84
Fax: +49-40-43 18 75-19
E-Mail: fischer@dias.de

Anschrift: Schulterblatt 36, D-20357 Hamburg
Amtsgericht Hamburg HRB 58 167
Geschäftsführer: Thomas Lilienthal, Michael Zapp
---------------------------------------------------------------
Received on Tuesday, 13 September 2011 15:15:15 UTC