- From: Doug Schepers <schepers@w3.org>
- Date: Tue, 14 Jul 2009 03:16:07 -0400
- To: "public-html@w3.org" <public-html@w3.org>
Hi, HTML WG-
There are advantages and disadvantages to both the strict ("draconian")
and error-correcting parsing of markup. HTML evolved to have loose
parsing with undefined and browser-specific error correction, and XML
was designed and well-defined to have strict parsing (probably as a
reaction to the chaotic HTML approach).
We have come full circle on the matter, and the HTML5 spec marries many
of the advantages of both approaches, by offering a well-defined
error-correction model. This has the advantage that it is sometimes
easier to author (though it can make debugging more difficult), the more
profound advantage that it hides problems from the reader, and the even
more important advantage that it is more or less how browsers already
parse HTML documents.
However, it cannot gracefully address all the situations in which strict
parsing is an advantage:
* For authoring, it is often useful to know when you have validity or
well-formedness errors, which helps debug script and CSS, and doing this
on the fly in the browser is faster and easier while developing than
reiterative validation with a separate tool;
* Strict markup works predictably for mashups and mixtures of different
markup languages;
* Draconian error handling enforces structure and content models for
mission-critical applications, such as the canonical "financial
transactions" example, where the reader *wants* to know about problems
in the markup [1], and for use cases that are low-tolerance for
potential errors (such as the government and some industries).
To meet this need, I propose a new attribute, 'parsing', which, when
placed on the document root, defines the type of parsing which a UA must
use when parsing the document. The values would be "loose" and
"strict", with loose parsing as the default (an omitted @parsing
attribute would result in loose parsing).
When the parsing is loose, the error-correction algorithms defined in
HTML5 must be applied; when the parsing is strict, there must be no
error-correction (as is commonly the case for XHTML in most browsers).
This way, authors could optionally enforce strictness when they want or
need to, and then change/remove the value when they are ready for
publication, or when the needs change. It is possible that there would
be instances where strict parsing makes it out of development and into
production code, but this would have relatively few negative
consequences (the kind of author who uses this would probably product
strict code anyway, and would know it if they didn't), and would be
easily corrected. And, quite frankly, some people simply prefer
stricter parsing for aesthetic or whatever, and this would provide them
with that option while not imposing it on others.
Had this option been available in XML from the beginning, many problems
and community schisms may have been avoided. I believe that presenting
the option for strict parsing may change how the various communities
approach HTML5, and avoid further schisms. I see this as having
relatively low costs for the specification, and very little
implementation cost, since browsers will already have both modes (even
IE has a built-in XML parser, though it doesn't use it for XHTML).
Please correct me if my assumption here is wrong.
I also believe that this is backwards-compatible, since the default will
be loose parsing as is already applied, and forwards-compatible, since
any alternate future parsing models (such as the proposed XML2 or XML5,
or some use case we don't see today) can be specified as the value for
@parsing in a later specification without changing how it would be used
as defined in HTML5. It may lay the groundwork for a new formulation of
error-correcting XML, as Anne proposed.
I'm hoping that the dust has sufficiently settled about the parsing
debate that we can hold a logical discussion of this proposal on its merits.
(Meta: I chose the keywords of the attribute and values for brevity, and
I'm not at all married to them; treat them as placeholders for the
purposes of discussing this proposal; another option might be something
like @error-correction="true | false". Please don't suggest different
names quite yet unless they represent a functional difference to this
proposal. Also, I've BCC'ed the TAG just so they know.)
[1] http://www.tbray.org/ongoing/When/200x/2004/01/11/PostelPilgrim
Regards-
-Doug Schepers
W3C Team Contact, SVG and WebApps WGs
Received on Tuesday, 14 July 2009 07:17:20 UTC