- From: John Cowan <cowan@mercury.ccil.org>
- Date: Tue, 18 Dec 2012 05:07:12 -0500
- To: James Clark <jjc@jclark.com>
- Cc: public-microxml@w3.org
James Clark scripsit: > - The HTML5 parsing algorithm of this UnClosable property is hugely > complicated (and I don't claim to fully understand it). How much > complexity is it worth adding to replicate features of HTML(5) parsing? > > - Is this going to be useful other than for HTML? Are there any > heuristics that could be used to infer UnClosable properties from a > schema in, say, RELAX NG or XSD. The first thing to say about UnClosable elements is that they are not ReStartable, so a word about ReStartable elements. They arise, as you say, when an element is providing properties of individual characters rather than specifying a container. There is no semantic difference between <b>abcdef</b> and <b>abc</b><b>def</b>, whereas there is a lot of difference between <p>abc</p><p>def</p> and <p>abcdef</p>. I don't see any way to get this information from a schema except to interpret a semantic annotation embedded in the schema. TagSoup distinguishes between ReStartable and FullyReStartable elements. A FullyReStartable element is one where nesting it in itself has semantic meaning, like HTML "small": it is not the same to specify <small>abc</small> and <small><small>abc</small></small>. Operationally, if a ReStartable element is not FullyReStartable, and it is going to be pushed on the queue but is already there, don't push it on the queue. Again, only a semantic annotation can distinguish these. UnClosable elements arise where an element has children that are generally speaking forbidden elsewhere, like "input" (which can only appear inside "form"). So the rectification of <p>...<form>...</p>...</form> is neither to force "form" to close (normal element), nor to force it to close and then reopen it (ReStartable element). It is rather to ignore the </p> as inconsistent with the UnClosability of "form". In that way, the next "input" element does not create a new form. -- Using RELAX NG compact syntax to John Cowan <cowan@ccil.org> develop schemas is one of the simple http://www.ccil.org/~cowan pleasures in life.... --Jeni Tennison <cowan@ccil.org>
Received on Tuesday, 18 December 2012 10:07:37 UTC