Re: Error recovery spec

James Clark scripsit:

> Comments are welcome.

I should like to propose that the section "Start- and end-tag matching"
be replaced with the following more complicated mechanism.  It is a
stripped-down version of TagSoup's algorithm, and will take advantage of
element relationships and properties derived from schemas or elsewhere,
if they are available.  These relationships and properties are given
BiCapitalized names here.  By default there are no relationships and
no properties.

1) The start-tags and end-tags are given a single scan in document
order, inserting and deleting as we go in the following ways.  A stack
is maintained of currently open elements, and a queue is maintained of
elements not currently open that are to be opened as soon as possible.

2) When the start-tag of an element that is a PossibleChild of the
currently open element is seen, the element is pushed on the stack.
Whenever the queue is non-empty, and the front element is a PossibleChild
of the newly opened element, the front element is removed from the queue
and a start-tag is generated for it.  This is iterated until the queue
is empty or the front element is not a PossibleChild.

3) When the start-tag of an element that is not a PossibleChild of the
currently open element is seen, an end-tag for the current element is
inserted and it is removed from the stack.  This is done recursively
until the start-tag is a PossibleChild, or all elements except the root
element have been closed.  If an element being closed has the ReStartable
property, its start-tag with all attributes is pushed on the front of
the queue.  Then the element is pushed on the stack.

4) However, when the start-tag of an element that is not a PossibleChild
of *any* currently open element is seen, then if the element has a
PreferredParent, a start-tag for that element with no attributes is
pushed on the stack.  This is done recursively until an element without
a PreferredParent is found.  Then the element is pushed on the stack.

5) An end-tag with no corresponding open start-tag is deleted with no
effect on the stack or queue.

6) An end-tag with a corresponding open start-tag inserts end-tags to
close all currently open elements, removing them from the stack, until and
including the corresponding start-tag.  However, if any generated end-tags
are for elements that have the ReStartable property, those elements with
all their attributes are pushed onto the front of the queue as well.


-- 
John Cowan      cowan@ccil.org        http://www.ccil.org/~cowan
        Is it not written, "That which is written, is written"?

Received on Monday, 17 December 2012 14:25:13 UTC