- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Sun, 1 Jul 2007 21:33:14 +0300
- To: "public-html@w3.org WG" <public-html@w3.org>
(This is part of my detailed review the parsing algorithm.) In the tree construction part of the parsing algorithm, the rationale for formulating the generic [R]CDATA parsing algorithm the way it is formulated is not given. The formulation is unusual compared to the rest of the chapter, so it is reasonable to expect that there's a specific reason why it is written the way it is written. My practical concern is this: In my implementation the tokenizer owns the main processing loop. Therefore, the tree builder can only change its state on a per-token basis and cannot pull another token in response to processing one token. (Instead, it can set its own flags, return control to the tokenizer and wait for the tokenizer to call back into the tree builder again.) I have solved the problem as follows: cdataOrRcdataTimesToPop is initialized to 0. When the spec invokes the generic [R]CDATA parsing algorithm, instead of running it, do the following: 1. If the context node is the current node, 1a. Create an element for the token. 1b. Push the element. 1c. Set the content model flag of the tokenizer. 1d. Set cdataOrRcdataTimesToPop to 1. 2. Otherwise, if the context node is not the current node, 2a. Push the context node. 2b. Create an element for the token. 2c. Push the element. 2d. Set the content model flag of the tokenizer. 2e. Set cdataOrRcdataTimesToPop to 2. Modify the processing of character tokens and end tag tokens as follows: 3. If a character token is seen and cdataOrRcdataTimesToPop > 0, 3a. Append the character token to the current node. 3b. Omit the normal processing of character tokens. 4. If an end tag token is seen and cdataOrRcdataTimesToPop > 0, (The token will always be the end tag for the [R]DATA element.) 4a. Pop cdataOrRcdataTimesToPop times. 4b. Set cdataOrRcdataTimesToPop to 0. 4c. Omit normal end tag token processing. I'd like to know if this transformation breaks some important property caused by the formulation of the spec. Specifically, the spec says: > 7. If the next token is an end tag token with the same tag name as > the start tag token, ignore it. Otherwise, this is a parse error. How could you see any other token but an end tag token with the same tag name as the start tag token, a character token or EOF? -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Sunday, 1 July 2007 18:33:25 UTC