Re: W3C validator and ASP.NET __VIEWSTATE

On Fri, May 14, 2010 at 1:33 PM, Thomas Gambet <tgambet@w3.org>
wrote:

> All I can say is that your page should not validate and you should
> see messages like: value of attribute "ID" invalid: "_" cannot
> start a name. I am pretty sure there are no exceptions made for
> .Net websites.

On Fri, May 14, 2010 at 11.37 PM, David Millier
<David.Millier@nao.gsi.gov.uk> wrote:

> I'm a web developer at the UK's National Audit Office and a few
> years ago my organisation published some reports about the
> provision of government services via the web that was quite
> influential in informing government policy on UK public sector
> websites. One of the recommendations that is now mandatory for all
> UK public sector sites is that have to be WCAG 1.0 AA compliant by
> this year.

In passing, I hope that the government begins to migrate to WCAG 2.0
(which is now two years old).

> Imagine then, my horror this week when I discovered that every one
> of our pages fails to meet AA standards because they do not
> validate successfully to xhtml 1.0 strict as they are supposed to.

Specifically, the requirement is to "Create documents that validate
to published formal grammars."

http://www.w3.org/TR/WCAG10/#q22

> We had, until this week complacently believed that they did
> because they pass the W3C validation check successfully.  However,
> they shouldn't.  Our website uses a content management system
> called Immediacy (recently renamed Alterian) which sits on a
> Microsoft ..Net platform.  .Net outputs special tags for state
> management (eg VIEWSTATE) that use ID attribute values that start
> with a double underscore (eg id="__VIEWSTATE"). According to the
> W3C standard section C, id attributes are SGML tokens not CDATE
> data types and can only start with alphabetical characters, not
> underscores. Therefore, our pages should fail validation using the
> W3C validator.

/Pace/ Thomas Gambet, this is wrong. ;)

HTML 4.01 is an SGML dialect. The "id" attribute is of type ID, and
type ID 'must begin with a letter ([A-Za-z]) and may be followed by
any number of letters, digits ([0-9]), hyphens ("-"), underscores
("_"), colons (":"), and periods (".").' So "id='__VIEWSTATE'" is
invalid HTML 4.01.

http://www.w3.org/TR/REC-html40/sgml/dtd.html

http://www.w3.org/TR/REC-html40/types.html#type-id

XHTML 1.0 is an XML dialect. In XHTML 1.0, the "id" attribute is of
XML type ID. Type ID must match the Name production, which is
defined in BNF:

    NameStartChar  ::=    ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6]
                            | [#xD8-#xF6] | [#xF8-#x2FF] |
                            [#x370-#x37D] | [#x37F-#x1FFF] |
                            [#x200C-#x200D] | [#x2070-#x218F] |
                            [#x2C00-#x2FEF] | [#x3001-#xD7FF] |
                            [#xF900-#xFDCF] | [#xFDF0-#xFFFD] |
                            [#x10000-#xEFFFF]

    NameChar    ::=      NameStartChar | "-" | "." | [0-9] | #xB7
                            | [#x0300-#x036F] | [#x203F-#x2040]

    Name ::=    NameStartChar (NameChar)*

"__VIEWSTATE" matches the Name production, therefore
"id='__VIEWSTATE'" is valid XHTML 1.0.

http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Strict

http://www.w3.org/TR/REC-xml/#id

http://www.w3.org/TR/REC-xml/#NT-Name

"W3C standard section C" ("HTML Compatibility Guidelines") is marked
"informative":

http://www.w3.org/TR/xhtml1/guidelines.html#guidelines

This is a bit of spec jargon: "normative" material sets conformance
requirements, "informative" material does not.

Specifically, this section offers /advice/ for publishers serving
documents that validate to the formal grammar of XHTML 1.0 with the
MIME Type of text/html. This is - almost certainly - the category of
documents the NAO are publishing. The section does say: "Note that
the collection of legal values in XML 1.0 Section 2.3, production 5
is much larger than that permitted to be used in the ID and NAME
types defined in HTML 4. When defining fragment identifiers to be
backward-compatible, only strings matching the pattern
[A-Za-z][A-Za-z0-9:_.-]* should be used." But again this is a piece
of information, *not* an additional conformance requirement.
Whether you follow this advice does not affect whether your document
validates as XHTML 1.0 or meets the WCAG 1.0 requirement to
"validate to published formal grammars".

> However they are failing other validator programs such as the
> Sitemorse checker that we subscribe to to give us monthly
> assessments of our code quality.

The W3C validator is behaving correctly by validating
id="__VIEWSTATE" as XHTML 1.0.

If the Sitemorse checker asserts that documents do not validate as
XHTML 1.0 because an "id" attribute begins with an underscore, then
the Sitemorse checker is wrong.

Indeed, an FAQ on their website about this topic suggests they do
not understand how to read the specifications their tool purports to
validate against:

http://www.sitemorse.com/kb.html?q=1269547103

Specifically, their claim that "The XHTML 1.0 Specification requires
XHTML content served with a text/html mime type to conform to
certain Compatibility Guidelines outlined in Section C" is simply
erroneous. As I explained above, no documents are required to follow
the Compatibility Guidelines; they are just advisory.

In passing, W3C has released an update to the Compatibility
Guidelines as a (purely informative) appendix to a W3C Note (a
"Note" means a document that is not a formal W3C Recommendation):

http://www.w3.org/TR/xhtml-media-types/#compatGuidelines

It still advices: "DO ensure that the values used for the id
attribute are limited to the pattern [A-Za-z][A-Za-z0-9:_.-]*."

In practice, though "id='__VIEWSTATE'" is unlikely to cause users
any problems in popular modern web browsers. If it did, .NET would
be in serious trouble. The HTML5 draft is codifying how to process
web content for maximum compatibility with the existing web corpus,
and it constrains "id" in text/html only as follows: "The value must
be unique amongst all the IDs in the element's home subtree and must
contain at least one character. The value must not contain any space
characters."

http://dev.w3.org/html5/spec/Overview.html#the-id-attribute

In summary, id="__VIEWSTATE" is valid XHTML 1.0 Strict, is unlikely
to cause any real-world problems, and does not affect your WCAG 1.0
conformance.

Hope that helps.

--
Benjamin Hawkes-Lewis

Received on Saturday, 15 May 2010 09:44:15 UTC