Re: XHTML character entity support from John Cowan on 2009-11-11 (public-html@w3.org from November 2009)

From: John Cowan <cowan@ccil.org>
Date: Wed, 11 Nov 2009 12:33:36 -0500
To: Aryeh Gregor <Simetrical+w3c@gmail.com>
Cc: John Cowan <cowan@ccil.org>, James Graham <jgraham@opera.com>, Boris Zbarsky <bzbarsky@mit.edu>, Henri Sivonen <hsivonen@iki.fi>, David Carlisle <davidc@nag.co.uk>, public-html@w3.org
Message-ID: <20091111173336.GG6506@mercury.ccil.org>

Aryeh Gregor scripsit:

> On Wed, Nov 11, 2009 at 10:41 AM, John Cowan <cowan@ccil.org> wrote:
> > Most programmers *want* draconian error handling of their code.
> 
> Only if the error is either impossible to recover from sensibly (e.g.,
> a segfault), or possible to reliably find at authoring time (e.g., a
> parse error), or if the code is being run in a testing environment.

I'm speaking of source-code syntax errors, not run-time errors.

> In production code, most programmers do not want draconian error
> handling where automatic recovery might be possible.  Thus why, for
> instance, assert() is usually a no-op in production builds.

Otherwise known as wearing your seat belt only when going for test drives,
but not during commuting.  (I'm not speaking of cases where the compiler
can prove that the check can never fail.)

> When documents are being constructed dynamically by scripts, it's not
> possible to reliably find errors at authoring time.  A small bug in
> your script might create a misplaced quotation mark, and in XML this
> means the entire document becomes unusable

That's why we have (a) XML-writing libraries (available for your favorite
language) and (b) output validation.  Not using the first is inexcusable.
Not using the second is living dangerously, though I grant that most
people don't bother.

> even though auto-closing the attribute will almost certainly leave
> you with a usable document (albeit perhaps with one or a few elements
> mangled).

"A few elements mangled" may be no big deal on a web page, but when
dealing with XML that carries financial transactions, it can be a very
big deal indeed.  On one occasion, a single dropped *dit* in a telegram
converted an encrypted sell order to a buy order in an early demonstration
of the perils of digital communication.  The sender sued Western Union
for his losses, but -- per the contract -- only got back the price of
the telegram.

Simplicity of authoring is a false excellence if the interpretation
rules take thousands of pages to express.  In the general case (and I
grant that HTML is not the general case), simplicity of interpretation
is far superior.

> Note that in contrast, source code for programs is almost never
> constructed dynamically.  It's written by hand, or sometimes generated
> by a tool from another language and then immediately compiled (or
> interpreted).  Syntax errors can be caught immediately, before
> production, so draconian error handling is fine in that case.  The
> same is true for hand-authored XML documents, but many (most?) web
> pages are not hand-authored.

I thought the whole point of the HTML5 format was to preserve legacy
hand-authored documents.  If you're generating HTML (with a library,
I hope), why bother with all the complexities of HTML5 syntax?

> Therefore HTML5 is not fragile as XML is; authoring errors will never
> result in a fatal error.

I concede this point.  It still doesn't make XML5 a good idea.

-- 
De plichten van een docent zijn divers,         John Cowan
die van het gehoor ook.                         cowan@ccil.org
      --Edsger Dijkstra                         http://www.ccil.org/~cowan

Received on Wednesday, 11 November 2009 17:34:27 UTC