---------------------------------
Comments on section "Conformance Evaluation Procedure"
----
Provide any comments you have on section Conformance Evaluation Procedure
(remember to include priority, location, suggested revision, and rationale
for each comment).

Comments (or a URI pointing to your comments): 

Comment #2:
- Priority: important to be addressed before publication
- Location: Step 5.c: Provide a Performance Score (Optional).
- Suggested revision: conformance scores should differentiate between Success Criteria that are met just because they are not applicable, and those that are met because they are applicable *and* succeed. In particular, I would suggest only including in the numerators and denominators those criteria that are applicable somewhere on the the website [or in the audited sample] (for per-website scoring), or in each webpage (for per-webpage scoring). A different comment is provided below regarding per-instance scoring. 
I consider it is important to clarify this before publishing the next Working Draft. Given that the score will probably be regarded as a "quantitative summary" of the evaluation results, its semantics should be clear-cut (either "pass over total" or "pass over total restricted to applicable", which I would vote for). In any case, if a consensus is not easily reached, I would be happy if the step is somehow annotated to account for this issue, and public feedback is explicitly requested regarding that point.

- Rationale: WCAG 2.0 states that Success Criteria that do not apply to the content are deemed to have been met. That's a fact, and I indeed agree with that. However, WCAG 2.0 conformance is oriented to conformance levels, while step 5.c in WCAG-EM suggests the addition of performance scores, which have a *different nature*. Some important, desirable properties of a scoring metric are that a) two samples with approximately the same scoring are (more or less) similarly accessible, b) from two samples with clearly different scorings, the one with a higher score is more accessible. This is what would be usually inferred by anyone reading scores (although there is not a universal, formal definition of what "more accessible" means, given that it is not a total order). However, these properties would not hold if non-applicable Success Criteria are computed for the scoring. Including non-applicable SC in the numerator and denominator artificially compacts the results towards the upper range.

To demonstrate this, consider we are evaluating some websites, targeting conformance level A. There are 25 Success Criteria for that level, 7 of which only apply to time-based content (audio, multimedia or animations) or time-based functionality. We will call them "time-based criteria".

Consider a simple, yet common type of website, which has no time-based content or time-based functionality. A per-website score is applied. The 7 time-based criteria would be automatically met as they are not applicable. If this website fails at each and every other criteria, it would still get a 7/25 =0.28 score. Even though it is a monument of disaster regarding accessibility, it would score the same as a bad-yet-trying site where the 25 criteria are applicable, and the 7 time-based criteria indeed succeed when tested. On the other hand, had we not considered these time-based criteria when computing the score of the first website, it would have been 0/18 =0.

Consider another example: a website with ten pages, two of them (pages 1 and 2) have audio and animations, the others do not. A per-webpage score is applied. Pages 1 and 2 fail only at each of the 7 time-based criteria; the other 18 criteria pass for the 10 pages. The per-webpage score would yield (18*2+25*8)/(25*10) = 0.944, although it systematically fails at time-based criteria. Now consider audio and animations are added to the other eight web pages, passing all the time-based (and other) accessibility criteria. The site would seem proportionally "more accessible", as we now only have an ocassional oversight of the time-based criteria; however, the score would be the same. This could even disincentivize adding new accessible content. On the other hand, had we not considered these time-based criteria, the original website would have had (18*2+18*8)/(25*2+18*8) = 0.928, which would improve to 0.944 when new, time-based yet accessible content is added.

Moreover, note document [1], which presents a survey of different accessibility metrics. Note that the concept of "potential barrier" is key to most of them. A  non-applicable criterion would never be a potential barrier, and thus, would not be computed in those metrics.
[1] André P. Freire, Renata P. M. Fortes, Marcelo A. S. Turine, and Debora M. B. Paiva. 2008. An evaluation of web accessibility metrics based on their attributes. In Proceedings of the 26th annual ACM international conference on Design of communication (SIGDOC '08). ACM, New York, NY, USA, 73-80. DOI=10.1145/1456536.1456551 <http://doi.acm.org/10.1145/1456536.1456551>


Comment #3:
- Priority: important to be addressed before publication
- Location: Step 5.c: Provide a Performance Score (Optional), Per Instance, point 2.
- Suggestion: [added text between brackets] "During the evaluation of each web page within the selected sample (as per Step 3: Select a Representative Sample) calculate the sum instances for which each Success Criterion is applicable for each web page and [, among these,]  the number of instances for which they are met"
- Rationale: This is related to comment #2, although it should be independently considered. The final note of this step reminds us that "Success Criteria that do not apply to the content are deemed to have been met." If we apply this literally, we could end up with a per-webpage score greater than 100%, as we would include instances where success criteria are met (ie, appearing in the numerator) but are not applicable (ie, not appearing in the denominator). The proposed writing constrains the criteria in the numerator only to those that are met *and* applicable.


Comment #4:
- Priority: mild suggestion
- Location: Step 5.c: Provide a Performance Score (Optional)
- Rationale/Suggested revision: are these the only performance scores that can be conforming to this Methodology? If yes, it should be clarified (now it just reads "The following scoring approaches may be used:"). Otherwise, it should be clarified as well, stating any expected requirements which acceptable scores must abide by.


Comment #5:
- Priority: mild suggestion
- Location: Re-Running a Website Conformance Evaluation
- Suggestion: Rewrite the second and third sentences: "In such cases, for the issues that were identified, include at least: 
1) a portion of the pages that were in the original sample, to facilitate comparability between the results; 
2) additional web pages that were not in the original sample (as per Step 3: Select a Representative Sample), to xxxx and,
3) where possible, a different set of exemplar web page instances (as per Step 3.b: Include Exemplar Instances of Web Pages).
- Rationale: Current wording basically suggests that _new web pages_ should be added in the new sample; but it provides a rationale to keep _old web pages_ . The suggestions and rationale should be reconciled (e.g. by providing suggestions and rationale for both new and old web pages).