- From: Michael[tm] Smith <mike@w3.org>
- Date: Mon, 13 Aug 2012 09:56:07 +0900
- To: "Tab Atkins Jr." <jackalmage@gmail.com>
- Cc: Florian Bösch <pyalot@gmail.com>, Dave Geddes <davidcgeddes@gmail.com>, public-webapps@w3.org
"Tab Atkins Jr." <jackalmage@gmail.com>, 2012-08-12 15:43 -0700: > What Dimitri said, but to address your comment directly, DTD-based > validation is long-dead, at least when applied to HTML. A DTD can't > capture the validity requirements that the HTML spec already imposes, > so it's irrelevant if it also can't validate a document containing > custom elements. The current validator used by the W3C is a > combination of (iirc) constrains expressed in Schematron and custom > Java code. The core of the backend for the W3C Nu Markup Validator (http://validator.w3.org/nu/) and validator.nu is James Clark's Jing, a Relax NG implementation. The backend doesn't actually use Schematron, for performance reasons. Instead it has some Java code to perform the equivalent the of assertions-based checking that Schematron provides but that can't be done with grammar-based checking alone (whether with Relax NG or anything else). No grammar-based schema language is capable of expressing all the constraints in HTML spec. Things like checking the data types (microsyntaxes) of attribute values requires custom code -- especially if you want to report useful messages for errors (something regexp-based checking it totally useless for). Also, more to the point here, things like the fact that arbitrary attribute names prefixed with "data-" are valid -- grammar-based checkers can't handle that at all. So the validator.nu backend has some custom code that Henri wrote that drops those data-* attributes -- basically, filters them out -- before the Jing part of the toolchain even sees them. --Mike -- Michael[tm] Smith http://people.w3.org/mike
Received on Monday, 13 August 2012 00:56:12 UTC