- From: John Cowan <cowan@mercury.ccil.org>
- Date: Mon, 17 Dec 2012 10:51:40 -0500
- To: James Clark <jjc@jclark.com>
- Cc: public-microxml@w3.org
James Clark scripsit: > I assume this means that every element is a PossibleChild of every > other element. Yes. That should be reformulated in terms of NonPossibleChild properties. > In the default case (when there is no document type info available), > does this produce the same result as what's in the spec currently? I believe so but have not proved it. Someone might want to use a protocol verifier. In addition, some language is needed for what to do at EOF. Essentially, EOF is the end-tag of an element that is a NonPossibleChild of every element. TagSoup has three more wrinkles: 1) The "form" and "table" elements have the UnClosable property, which means that end-tags are never inserted for them except at EOF. 2) Character data can be a NonPossibleChild and may have a PreferredParent too (for HTML it is "p"). It is never necessary to push it on the stack, fortunately. 3) Attempts to explicitly close the root element are ignored, leaving the matter to EOF processing. This means that a second root element will be the final child of the first root element, and so on recursively. This could replace #doc-insertion. I don't know how strongly you feel about keeping it; with wrinkles #2 and #3 in place, streaming processing is now possible. TagSoup has the strong property that pushing an element on to the stack always involves emitting a start-tag (when viewed in terms of streaming), and popping an element from the stack always involves emitting an end-tag. This guarantees that the output forms a hierarchy and is thus well-formed, provided that character data and names meet the repertoire restrictions (which TagSoup makes sure they do by brute force). It is further true (again, I have not proved this formally) that provided there are no loops in the PreferredParent graph, TagSoup will always make progress and always terminate, however convoluted the input. -- As you read this, I don't want you to feel John Cowan sorry for me, because, I believe everyone cowan@ccil.org will die someday. http://www.ccil.org/~cowan --From a Nigerian-type scam spam
Received on Monday, 17 December 2012 15:52:25 UTC