Re: Error recovery spec from James Clark on 2012-12-18 (public-microxml@w3.org from December 2012)

From: James Clark <jjc@jclark.com>
Date: Tue, 18 Dec 2012 16:29:45 +0700
To: John Cowan <cowan@mercury.ccil.org>
Cc: public-microxml@w3.org
Message-ID: <CANz3_EZDw=za=b5iTu21NasTSbWBREp_vKcCt8O5w2XL2JDE_A@mail.gmail.com>

On Mon, Dec 17, 2012 at 9:24 PM, John Cowan <cowan@mercury.ccil.org> wrote:

> James Clark scripsit:
>
> > Comments are welcome.
>
> I should like to propose that the section "Start- and end-tag matching"
> be replaced with the following more complicated mechanism.  It is a
> stripped-down version of TagSoup's algorithm, and will take advantage of
> element relationships and properties derived from schemas or elsewhere,
> if they are available.

Before getting into the details of the algorithm, I think it's useful to
start by considering what element relationships/properties it makes sense
to use.

My general feeling is the PossibleChild/NonPossibleChild (ExcludedChild?)
property is the most basic. It's hard to imagine any schema language that
doesn't have something to say about which elements are possible children of
other elements.

It also makes very good sense to me to handle character data by treating it
as just another possible child.

I also think it's helpful to be able to say what are the possible top-level
elements; given the treatment of character data, it seems natural to do
this by saying what are the possible children of a pseudo root element.

Beyond this, things become less clear.  I can certainly see the use of
PreferredParent in HTML, but how would I get this from a schema?  In some
cases (where there is only one possible parent, eg html/head/title) I can
infer it from the PossibleChild properties.

So I think good starting point would be an algorithm that just uses the
PossibleChild property (including treating root and character data as
pseudo-elements at the tip and leaves of the tree respectively). Note that
the value of this property has to be a conceptually infinite set of element
names: sometimes you want to say that element X can have only elements Y, Z
as a child, but sometimes you want to say that element X can have any
element except Y, Z as a child.

James

Received on Tuesday, 18 December 2012 09:30:35 UTC