- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Fri, 05 Aug 2011 14:24:17 +0300
- To: public-html-xml@w3.org
As promised, here's my re-review of the TF report. My apologies for being over a week late with this. - s/The principle impedement/The principal impediment/ - "Even this is not a 100% solution as is still possible to encounter HTML documents that cannot be represented perfectly in XML." I suggest downplaying the severity of this a bit: "It is still possible to encounter HTML documents whose document tree needs to be modified slightly for the document tree to be representable as XML. For conforming input, the modifications are on the level of replacing form feeds with spaces." - "HTML5 toolchains are widespread and popular." I think it would be prudent to drop "5" from that sentence at this time. - s/conent/content/ - s/difficlut/difficult/ - ", combining the resulting DOMs through some other process" I'd drop the above-quoted words, since chances are that environments that use both HTML and XML can do so just fine without combining them into one tree at any time. - s/intrinsicly/intrinsically/ - "There are still details of implementation to be considered in the case where HTML5 is represented with well-formed XML. Is the markup to be “clipped out” and handed to an HTML5 parser, or is the entire XML DOM going to be handed to the HTML5 engine?" Of these two approaches, clipping out the markup and handing it to an HTML5 parser is clearly not a correct implementation. It would be a layering violation (access to XML source from the layer that processes the XML tree and identifies which part is to be clipped out) and wouldn't work in the general case (most obviously when XHTML element names are prefixed but there are other issues). The correct solution is extracting the HTML subtree and passing it to an HTML engine if there's a tree input interface or serializing it as HTML and passing to an HTML parser if there's only a source text-based interface. (Or if the HTML subsystem support XML parsing but not tree input, serializing the extracted subtree as XML.) I suggest rewriting the paragraph like this: "If the HTML subsystem has an interface that allows document trees to be passed to it, the XHTML subtree should be extracted from the larger XML tree and passed to the HTML subsystem. If the HTML subsystem only accepts HTML source text as its input, the XHTML subtree needs to be serialized as HTML and passed to the HTML subsystem for parsing using an HTML parser. In the latter case, some non-conforming constructs may not round-trip to the same tree shape when serialized as HTML and reparsed as HTML. Also, conforming trees that have tr elements as children of table elements will be replaced with semantically equivalent but tree-wise different construct where there the tr elements gain a tbody parent which is a child of the table." - "A third solution is to process the compound messages using MIME multipart/related semantics, perhaps through facilities such as [MTOM] or [XOP]. This is very much like the escaped markup case where downstream processing must be sophisticated enough to reconstruct the authors intent." This isn't really putting HTML inside an XML document, is it? It looks to me that it's putting both XML and HTML inside a third format. I suggest removing this paragraph. - My regrets for my unavailability over the next three weeks. Please forward the report to the TAG on the planned schedule without waiting for me to agree or disagree with how you chose to handle the above feedback. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Friday, 5 August 2011 11:24:51 UTC