[w3c/DOM-Parsing] Provide an API to serialize with the "require well-formed" parameter set to true (Issue #84)

The `serializeToString` static method of [`XMLSerializer`](https://w3c.github.io/DOM-Parsing/#the-xmlserializer-interface) is specified to "produce an [XML serialization](https://w3c.github.io/DOM-Parsing/#dfn-xml-serialization) of _root_ passing a value of `false` for the _[require well-formed](https://w3c.github.io/DOM-Parsing/#dfn-require-well-formed)_ parameter, and return the result." It's a little bit confusing that something called `XMLSerializer` might return something that isn't actually valid XML, but I understand that this can't be changed for backwards compatibility. Still, it would be useful to have a mechanism that sets the "require well-formed" parameter to be true, i.e., that throws if the node cannot be serialized to XML.

Background: I'm trying to use [the technique in this blog post](https://ronvalstar.nl/render-html-to-an-image) to render HTML to an image by creating an SVG with a `<foreignObject>` containing the HTML. As noted on the page, because SVG is XML, you need the contents of `<foreignObject>` to be valid XML. Doing this with `serializeToString`, which the post suggests, works for _most_ documents, but not certain less-than-well-formed HTML documents that successfully parse in the browser. The specific case I ran into was an attribute that unescaped quotation marks in the value:

```html
<meta property="og:description" content="I forgot to "escape" this value">
```

which gets parsed as
```html
<meta property="og:description" content="I forgot to " escape"="" this="" value"="">
```

i.e., it picks up some attributes whose names have a quotation mark in them. (You can see this by setting an element's `innerHTML` to the first string and then reading `innerHTML` again.) This can't be represented in XML, but `serializeToString` successfully returns an "XML" document with this syntax, which the browser cannot deserialize as XML (e.g., in an `<img>` with SVG source, or with `new DOMParser().parseFromString(xml, "text/xml")`).

I can try to see if `DOMParser` succeeds and throw away the parse if successful, or catch the `error` event from the `<img>`, but it would be cleanest if I could just get `serializeToString` to fail in the first place. Is it possible to add an optional boolean parameter `serializeToString(document, requireWellFormed)` that defaults to false, or a property of the `XMLSerializer`, or something?

(Originally reported as https://bugzilla.mozilla.org/1914813 because I didn't realize the spec requires this, but it does, and the behavior is the same in Firefox, Safari, and Chrome. See also mdn/content#35585.)

-- 
Reply to this email directly or view it on GitHub:
https://github.com/w3c/DOM-Parsing/issues/84
You are receiving this because you are subscribed to this thread.

Message ID: <w3c/DOM-Parsing/issues/84@github.com>

Received on Monday, 26 August 2024 19:39:08 UTC