Re: XHTML character entity support from Aryeh Gregor on 2009-11-11 (public-html@w3.org from November 2009)

From: Aryeh Gregor <Simetrical+w3c@gmail.com>
Date: Wed, 11 Nov 2009 11:44:33 -0500
To: John Cowan <cowan@ccil.org>
Cc: James Graham <jgraham@opera.com>, Boris Zbarsky <bzbarsky@mit.edu>, Henri Sivonen <hsivonen@iki.fi>, David Carlisle <davidc@nag.co.uk>, public-html@w3.org
Message-ID: <7c2a12e20911110844v428c06c0vaca919b2342670e3@mail.gmail.com>

On Wed, Nov 11, 2009 at 10:41 AM, John Cowan <cowan@ccil.org> wrote:
> Most programmers *want* draconian error handling of their code.

Only if the error is either impossible to recover from sensibly (e.g.,
a segfault), or possible to reliably find at authoring time (e.g., a
parse error), or if the code is being run in a testing environment.
In production code, most programmers do not want draconian error
handling where automatic recovery might be possible.  Thus why, for
instance, assert() is usually a no-op in production builds.

When documents are being constructed dynamically by scripts, it's not
possible to reliably find errors at authoring time.  A small bug in
your script might create a misplaced quotation mark, and in XML this
means the entire document becomes unusable -- even though auto-closing
the attribute will almost certainly leave you with a usable document
(albeit perhaps with one or a few elements mangled).  This means your
site crashes for no good reason.  That's undesirable from any
perspective.

Note that in contrast, source code for programs is almost never
constructed dynamically.  It's written by hand, or sometimes generated
by a tool from another language and then immediately compiled (or
interpreted).  Syntax errors can be caught immediately, before
production, so draconian error handling is fine in that case.  The
same is true for hand-authored XML documents, but many (most?) web
pages are not hand-authored.

On Wed, Nov 11, 2009 at 11:21 AM, John Cowan <cowan@ccil.org> wrote:
> James Graham scripsit:
>
>> See section 9.2 "Parsing HTML Documents" [1]
>
> "This section only applies to user agents, data mining tools, and
> conformance checkers."
>
> 9.1 is the section for documents.

9.2 specifies an algorithm that all user agents must apply to process
a given input stream.  There are no fatal errors anywhere in the
section -- given any input, the algorithm will always produce a
well-defined output tree that must be processed normally.  There are
parse errors, but recovery behavior is specified.  (IIRC, UAs are
allowed to abort on errors, but if they don't they must follow the
recovery behavior specified.)

This is precisely saying that "every single possible sequence of bytes
is a[n] HTML5 document with a fixed interpretation".  It may not be a
*valid* HTML5 document, but it does have a fixed interpretation.
Therefore HTML5 is not fragile as XML is; authoring errors will never
result in a fatal error.

Received on Wednesday, 11 November 2009 16:53:06 UTC