[DOM-Parsing] XMLSerializer production of non-well-formed XML

I'm trying to take an arbitrary DOM or DOM fragment and serialize it to XML.

A little while ago it appears there was some discussion on how to deal with
serializing of invalid XML, which resulted in the "require well-formed"
flag being created. However the TR still specifies that XMLSerializer
produces XML with the "require well-formed" flag turned _off_. I find it
curious that a process called XMLSerializer could produce a string that's
possibly not well-formed XML.

I don't know of any use case where I would want to serialize to a string
that contains accurate XML namespace information, but without being
well-formed XML. If there is such a use-case, could a note be added to the
TR?

Here is a test case:

var e = document.createElement('div');
e.innerHTML = '<b>Hello, world!</b><!-- foo -- bar --> baz';
(new XMLSerializer()).serializeToString(e);

Currently, this will be returned:

"<div xmlns="http://www.w3.org/1999/xhtml"><b>Hello, world!</b><!-- foo --
bar --> baz</div>"

However, this is illegal XML due to the "--" in the middle of the comment,
and trying to use it will cause Web browsers and XML parsers to end parsing
and raise an error. Both Firefox and Chromium will produce this string, but
reject it if you attempt to parse it as XML.

Perhaps the rationale is "better to produce something than an error",
however this means that I cannot reliably use the output of XMLSerializer
in a larger XML document.

The relevant bug seems to be <
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25168>, can this still be
revisited?

Is there an alternative that's guaranteed to produce well-formed XML
including xmlns information, when possible?

Austin Wright.

Received on Monday, 30 June 2014 03:09:39 UTC