Re: Error recovery spec

James Clark scripsit:

> - The HTML5 parsing algorithm of this UnClosable property is hugely
> complicated (and I don't claim to fully understand it). How much
> complexity is it worth adding to replicate features of HTML(5) parsing?
>
> - Is this going to be useful other than for HTML?  Are there any
> heuristics that could be used to infer UnClosable properties from a
> schema in, say, RELAX NG or XSD.

The first thing to say about UnClosable elements is that they are not
ReStartable, so a word about ReStartable elements.  They arise, as you
say, when an element is providing properties of individual characters
rather than specifying a container.  There is no semantic difference
between <b>abcdef</b> and <b>abc</b><b>def</b>, whereas there is a lot
of difference between <p>abc</p><p>def</p> and <p>abcdef</p>.  I don't
see any way to get this information from a schema except to interpret
a semantic annotation embedded in the schema.

TagSoup distinguishes between ReStartable and FullyReStartable elements.
A FullyReStartable element is one where nesting it in itself has
semantic meaning, like HTML "small": it is not the same to specify
<small>abc</small> and <small><small>abc</small></small>.  Operationally,
if a ReStartable element is not FullyReStartable, and it is going to be
pushed on the queue but is already there, don't push it on the queue.
Again, only a semantic annotation can distinguish these.

UnClosable elements arise where an element has children that are generally
speaking forbidden elsewhere, like "input" (which can only appear inside
"form").  So the rectification of <p>...<form>...</p>...</form> is neither
to force "form" to close (normal element), nor to force it to close and
then reopen it (ReStartable element).  It is rather to ignore the </p>
as inconsistent with the UnClosability of "form".  In that way, the next
"input" element does not create a new form.

-- 
Using RELAX NG compact syntax to        John Cowan <cowan@ccil.org>
develop schemas is one of the simple    http://www.ccil.org/~cowan
pleasures in life....
        --Jeni Tennison                 <cowan@ccil.org>

Received on Tuesday, 18 December 2012 10:07:37 UTC