Re: SC 4.1.1 source fails but DOM passes - must a page fail? from Christophe Strobbe on 2019-01-11 (w3c-wai-ig@w3.org from January to March 2019)

From: Christophe Strobbe <strobbe@hdm-stuttgart.de>
Date: Fri, 11 Jan 2019 17:25:04 +0100
To: w3c-wai-ig@w3.org
Message-ID: <bc17bb06-4013-0ecd-92fc-06c561f2a0dd@hdm-stuttgart.de>
Hi Mark,

Thank your for posting these cases of strange HTML markup. I think the
editors of WCAG Silver should bookmark your message ;-)

I also have a few comments and questions.

1. When WCAG 2.0 SC 4.1.1 was written, web content using markup
languages was based on SGML (or at least the HTML 4 spec made that
claim) or XML, but the working group did not assume that this would
always be the case. HTML5, being based on neither SGML nor XML, somehow
confirms that case. Since a different style of markup is imaginable and
WCAG 2.0 wanted to be technology neutral, arguments that are solely
based current HTML parsing don't make SC 4.1.1 redundant.

2."IDs and source IDREFs that only differ by case": in the case of label
(for attribute) and input (id attribute), wouldn't this lead to a
failure of SC 1.3.1, since the association between the label and the
form field can no longer be programmatically determined?

3. In the case of the misquoted alt attribute on the image, wouldn't
that lead to a failure of SC 1.1.1, because the alt attribute as
represented in the DOM isn't appropriate for the image?

4. The example of the unclosed comment at the top of the file would
affect all users alike, since the content is invisible to all users, not
just people with disabilities. So I don't think SC 4.1.1 needs to cover
this edge case.

Best regards,

Christophe Strobbe

On 11/01/2019 14:49, Mark Rogers wrote:
> The Understanding SC 4.1.1 Parsing doc says 'the Success Criterion requires that the content can be parsed using only the rules of the formal grammar.'
>
> The key bit is 'parsing' - the parsing phase in browsers transforms raw HTML source into the initial DOM tree. Once you have a DOM there's no more parsing involved unless you set innerHtml or outerHtml. If there are parsing problems you may have lost information or produce unexpected side effects, but in many cases the parser can recover with few problems for the end user.
>
> However, there are some assumptions in the SC that aren't true in practice:
>
> 1) The formal grammar (DTDs in the case of HTML 4 and XHTML) doesn't always match the normative text in the same spec, or match up with other specs. See below for examples of things that validate but don't work.
>
> 2) Duplicate attributes can't occur in the DOM because the DOM has no way to store duplicate attributes:
> https://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-1780488922
> and the subsequent attributes with the same name are ignored according to spec.
>
> 3) Most mismatched start and end tags aren't a problem
> For example <h1>Heading</h2> is parsed into <h1>Heading</h1> in the DOM.
>
> Things that do cause problems:
>
> 1) Duplicate IDs on different elements - the DOM can contain duplicate IDs, and the DOM spec says behaviour is undefined if they do:
> https://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-getElBId
> Screen reader behaviour when duplicate IDs are used is very random:
> https://www.powermapper.com/tests/screen-readers/labelling/dupe-ids/
>
> 2) IDs and source IDREFs that only differ by case. These don't produce validation errors with the HTML 4 doctype, and other doctypes that specify NAMECASE GENERAL YES in the DTD formal grammar (this makes IDs case insensitive). The normative text elsewhere in the HTML 4 recommendation marks ids as case-sensitive. These ID/IDREFs do produce validation errors with the HTML 5 and XHTML doctypes. For example:
>
> a) This code doesn't validate and label not associated due to case mis-match:
> <!DOCTYPE html>
> <title>Example</title>
> <label for='TextField'>Name:</label>
> <input id='TEXTFIELD' type='text' >
>
> b) Same code with HTML 4 doctype validates successfully, but label not associated due to case mis-match 
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
> <title>Example</title>
> <label for='TextField'>Name:</label>
> <input id='TEXTFIELD' type='text' >
>
> 3) Misquoted attributes - for example:
> <img src='rota.png' alt='Teachers' class rota class='shadow' >
> Is parsed into the DOM as
> <img src="rota.png" alt="Teachers" class="" rota="">
> This is definitely not what the author intended
>
> 4) Unterminated HTML comments :
> <!-- where does this comment finish 
> <html>
> ...
> </html>
>
>> 4.1.1 Parsing: In content implemented using markup languages, 
>> elements have complete start and end tags,
>> elements are nested according to their specifications, 
>> elements do not contain duplicate attributes, 
>> and any IDs are unique, 
>> except where the specifications allow these features.
> If the SC is applied to the DOM most of the things the SC looks for can't happen:
>
> - the DOM can't have incomplete start and tags because each element is represented as a single Element node https://www.w3.org/TR/dom/#node-tree 
> - the DOM can't store duplicate attributes https://www.w3.org/TR/dom/#node-tree 
> - most nesting problems can't happen, other than using nested interactive elements like  <button>Button <a href='/'>Link</a></button>
> - but duplicate IDs can occur
>
> Best Regards
> Mark
>

-- 
Christophe Strobbe
Akademischer Mitarbeiter
Responsive Media Experience Research Group (REMEX)
Hochschule der Medien
Nobelstraße 10
70569 Stuttgart
Tel. +49 711 8923 2749

“I drink tea and I know things.” 
Falsely attributed to Christophe Lannister.
Received on Friday, 11 January 2019 16:25:30 UTC