Re: HTML5 output method

On Jan 3, 2011, at 21:26, Michael Kay wrote:
> One of the policy questions is whether every XDM instance should have an HTML5 serialization (and if so, whether that serialization should always be prescriptively defined in the specification), or whether some instances should result in serialization errors.

I think it would be onerous to require serializers to implement a non-trivial subset of HTML validation in order to detect serialization errors if errors are defined to include non-round-trippable stuff (e.g. something other than thead, tbody, or tfood appearing as a child of table, p appearing as a descendant of p without a scoping element or a button in between, etc., etc.).

Furthermore, in streaming implementations, you can't really withdraw what's already been serialized, so what would you do with the error? Cutting the stream short, writing an error message into the stream or behaving in an implementation-dependent way are all unsatisfactory solutions. (The last one would cause vendor lock-in.)

I think the output method should have well-defined (prescriptive) output for all XDMs. This necessarily leads to a situation where there exist XDMs whose serialization when reparsed doesn't result in the same XDM. I think the serialization should have the property that if the input XDM represents a document tree that can be constructed by the HTML parsing algorithm, serializing the XDM and parsing the result yields the same XDM.

The last sentence reminds me that I forgot to mention that if pretty-printing isn't requested, the output mode shouldn't even put a line feed at the end of the stream (after the </html> tag) to please old Unixish tools, since doing so would violate the round-trippability property.

Henri Sivonen

Received on Tuesday, 4 January 2011 09:37:35 UTC