- From: John Cowan <cowan@mercury.ccil.org>
- Date: Mon, 17 Dec 2012 09:24:45 -0500
- To: James Clark <jjc@jclark.com>
- Cc: public-microxml@w3.org
James Clark scripsit: > Comments are welcome. I should like to propose that the section "Start- and end-tag matching" be replaced with the following more complicated mechanism. It is a stripped-down version of TagSoup's algorithm, and will take advantage of element relationships and properties derived from schemas or elsewhere, if they are available. These relationships and properties are given BiCapitalized names here. By default there are no relationships and no properties. 1) The start-tags and end-tags are given a single scan in document order, inserting and deleting as we go in the following ways. A stack is maintained of currently open elements, and a queue is maintained of elements not currently open that are to be opened as soon as possible. 2) When the start-tag of an element that is a PossibleChild of the currently open element is seen, the element is pushed on the stack. Whenever the queue is non-empty, and the front element is a PossibleChild of the newly opened element, the front element is removed from the queue and a start-tag is generated for it. This is iterated until the queue is empty or the front element is not a PossibleChild. 3) When the start-tag of an element that is not a PossibleChild of the currently open element is seen, an end-tag for the current element is inserted and it is removed from the stack. This is done recursively until the start-tag is a PossibleChild, or all elements except the root element have been closed. If an element being closed has the ReStartable property, its start-tag with all attributes is pushed on the front of the queue. Then the element is pushed on the stack. 4) However, when the start-tag of an element that is not a PossibleChild of *any* currently open element is seen, then if the element has a PreferredParent, a start-tag for that element with no attributes is pushed on the stack. This is done recursively until an element without a PreferredParent is found. Then the element is pushed on the stack. 5) An end-tag with no corresponding open start-tag is deleted with no effect on the stack or queue. 6) An end-tag with a corresponding open start-tag inserts end-tags to close all currently open elements, removing them from the stack, until and including the corresponding start-tag. However, if any generated end-tags are for elements that have the ReStartable property, those elements with all their attributes are pushed onto the front of the queue as well. -- John Cowan cowan@ccil.org http://www.ccil.org/~cowan Is it not written, "That which is written, is written"?
Received on Monday, 17 December 2012 14:25:13 UTC