RE: CfC: publish LCWD of DOM Parsing and Serialization; deadline December 3

Internal Subset:

The latest Firefox, Chrome and IE all support the doctype.internalSubset property in the DOM. Their behavior diverges slightly when parsing and serializing:
For HTML parsing the internalSubset is ignored as specified in HTML5. This property returns null. For XHTML parsing, IE and Firefox parse the literal contents of the internal subset up until the closing angle bracket into the 'internalSubset' property. Chrome does not.
For Serializing, if the browser has stored an internalSubset property, it is serialized as part of the Doctype.

Since this is two out of three main browsers, I added this serialization step as optional, conditional on the browser storing an internalSubset. If browsers choose to remove their internalSubset support, then they will still be conformant to this specification.

CDATASection:

From what I can determine from the DOM spec (DOM4), the CDATASection object has been removed to "simplify the DOM platform" (Section 10.2). Which seems nice since CDATASections cannot be parsed by the HTML parser defined in HTML5. However, CDATASection (as a parser concept) is alive and well in XHTML and XML documents and as such these get parsed into CDATASection objects today on all browsers. In these cases (XHTML & XML documents), I presume that the DOM spec would like browsers to store parsed CDATASection content as Text objects? Today, no browser does this.

There shouldn't be any material problem that I can see for browsers to treat XHTML/XML parsed CDATASections as Text. Characters accepted without escaping in CDATASections like "<" and ">" would be put into a Text node literally, and then escaped out on serialization. This will make serialized text containing lots of angle brackets much larger than the original text content, but that's not a technical downside. There may be compat risk to making this change, but that's another story. Since it doesn't hurt browsers to leave it in the platform, I wonder whether there are browser implementations who want to make this change? It certainly isn't on IE's radar. 

I suppose I could make CDATASection serialization a historical (optional) behavior for platforms that preserve the identity of CDATASection objects in the DOM. I hate to pull it out altogether, because this is something that all platforms support interoperably today. Leaving it in the spec is not a problem because once a browser starts converting CDATASection input to Text, then the identity of the object to serialize is now Text, and the CDATASection serialization rules don't apply.

It seems like there may be a separate concern with the references though. I don't currently make a reference to DOM L3 Core for CDATASection or internalSubset. Should I be?

-Travis

From: annevankesteren@gmail.com [mailto:annevankesteren@gmail.com] 
On Wed, Nov 27, 2013 at 5:22 PM, Travis Leithead <travis.leithead@microsoft.com> wrote:
> I did end up talking about the (historical) internalSubset property of the Doctype object for serialization--since browsers will include it if they support it. Is this what you're referring to?

Do all browsers include it or only some?

I was referring to CDATASection. I had not noticed this doctype-related change, which also seems substantive. If you want to change the tree model relative to DOM, you really ought to argue that against the DOM specification, and not make willy-nilly changes on the serialization side.

--
http://annevankesteren.nl/

Received on Tuesday, 3 December 2013 22:01:17 UTC