- From: Annika Nietzio (for mailing lists) <an@ftb-volmarstein.de>
- Date: Fri, 05 Oct 2012 16:08:09 +0200
- To: public-wai-rd-comments@w3.org
Dear Markel, dear Giorgio, dear Joshue, after the deadline for review was extended, I thought that I should use the opportunity to send you some feedback. First of all, I think the report is very comprehensive. It highlights many different aspects and has a clear structure and language. You did a great job in summarising the online symposium. Now the details... Comments on Research Report on Web Accessibility Metrics (W3C Working Draft 30 August 2012) Section 1.1 Definition and background. In the beginning of this section the concepts "metric" and "indicator" got mixed up. I'd suggest to use the term "indicator" to refer to single dimensions that can be assessed objectively (such as number of pictures, violations, etc). Maybe you mean the same, when you refer to "basic metrics". In my opinion, a metric includes the combination of several indicators using different mathematical operations, weigthing parameters etc. - exactly as in your example (readability metrics). As a side note: The item "The severity of an accessibility barrier." doesn't fit in the list because it the not an indicator (at least I don't know who it could be measured objectively). The list "different types of data can be produced" mixes "ordinal values" and "conformance levels". These should be distiguished: * Conformance levels (AAA, AA, A) have a fixed frame of reference (WCAG). It is possible to determine the conformance level of a single web site. * Ordinal values (ordinal means "ordered") refer to something like a ranking, i.e. you can compare two web sites, determine which one is better, but not necessarily to which extent is a better than another one. It does not make sense to compute an ordinal value for a single site. The distinction you want to make here is maybe between discrete and continuous values. * Discrete values: for instance school grades "A, B, C, D, E, F" * Continuous values: for instance values between 0 and 1 (maybe this is what you call "Quantitative ratio values". As a side note: There are other mathematical properties of the results that could be interesting such as "bounded vs unbounded". Section 1.2 The Benefits of Using Metrics The reasons given in this section all relate to automated calculation of metrics. Also the last paragraph of the previous section discusses to relationship (mainly disadvantages) of metrics and automated testing. Suggestions: Make it more explicit that metrics are not the same as automated testing. Discuss the benefits and disadvantages in the same section. Section 2.1 Validity The example (picture without alt) seems to question to validity of WCAG. The goal of the guidelines is the describe accessibility for the widest possible range of users. How can the definition of users in "accessibility-in-use" address this issue? Section 2.3 Sensitivity The logic in the language is strange. Is should say: "how changes in a given website are reflected in the metric output". The web site can not reflect the metric because it is independent of it. Section 3.5 Novel Measurement Approaches Wording: "counter-example techniques" -> "common failures" Section 4.2 Validity "Conformance" can not be viewed independent of the requirements to which conformance is claimed. That means that "validity with respect to conformance" is directly related to "validity of the requirements". But validity of requirements (or guidelines) is clearly beyond the scope of this TR. How can this research question be refined? Section 4.3 Reliability Question about the first item: In other parts of this report you say that a tool produces data and the metrics calculate the score from this data. So this research question can be interpreted in two ways: (1) compare the results of the same metric applied to the output of different tools. (2) compare the results of different metrics applied to the same tool output. - Both could be interesting. Question about the second item: Is the data really independent from the guideline? - I think is it not, but that is of course also the view presented in my paper. The guidelines already contain a lot of information that can help shape the indicators (ie. the collected data) AND the metrics. Section 4.4.3 Complexity In some parts of the report you say that an easy metric is not necessarily a good metric. This is not the whole truth. Complex metrics (formulae with many unknown parameters such as weights for disability and severity) also cause many problems in terms of parameter estimation and justification. So in these cases simple might be better. Section 5. A Corpus for Benchmarking Metric A comment on tools: software has bugs, which can of course affect validity of the results. A benchmarking corpus could be use to improve the quality of software. Is is important to define if the corpus should consist of labeled or unlabeled examples. And what would the labels be? Binary labels (accessible vs. not accessible) are not sufficient. But on the other hand any more complex definition of labels would be a metric in itself. Section 5.2 User-tailored metrics It would be heplful to clarify the relationship of "user-tailored metrics" to the concept of "accessibility-in-use" mentioned earlier. Other comment: And finally some input from the discussions during the ICCHP session: A topic that came up several times was that the idea of enhancing automated tests by combining them with expert or user input. This should also be mentioned in the road map. I hope my comments are helpful for the finalisation of your report. I'd be happy to discuss and provide further details, in case you have any questions. Kind regards Annika -- Annika Nietzio email an@ftb-volmarstein.de web www.ftb-net.de phone +49 (0) 2335 9681-29 Forschungsinstitut Technologie und Behinderung (FTB) der Evangelischen Stiftung Volmarstein Grundschoetteler Str. 40 D-58300 Wetter/Ruhr Germany
Received on Friday, 5 October 2012 14:11:52 UTC