[w3c/DOM-Parsing] DocumentType XML serialization doesn't handle the presence of double quotes in system ID (#71)

In https://w3c.github.io/DOM-Parsing/#dfn-xml-serializing-a-documenttype-node we read:
> 2. If the require well-formed flag is true and the node's systemId attribute contains characters that are not matched by the XML Char production or that contains both a """ (U+0022 QUOTATION MARK) and a "'" (U+0027 APOSTROPHE), then throw an exception; the serialization of this node would not be a well-formed document type declaration.
> ...
> 9. If the node's systemId is not the empty string then append the following, in the order listed, to markup:
> 9.1 " " (U+0020 SPACE);
> 9.2 """ (U+0022 QUOTATION MARK);
> 9.3 The value of the node's systemId attribute;
> 9.4  """ (U+0022 QUOTATION MARK). 

The intention here *seems* to be to use single-quotes to surround `systemID` if the `systemID` contains a double-quote, and double-quotes to surround `systemID` otherwise, only throwing an exception if *both* a single-quote and a double-quote are present in the `systemId` attribute.  But that good idea got lost between step 2 and step 9, and we only/always use double-quotes to surround the systemId.

One of two fixes should be made: A. Tweak step 2 to remove mention to U+0027 APOSTROPHE and just throw the exception if the systemId contains U+0022 QUOTATION MARK; or B. change steps 9.2 and 9.4 to both say "U+0022 QUOTATION MARK if the node's systemID does not contain a U+0022 QUOTATION MARK, otherwise U+0027 APOSTROPHE".

Option B is what Firefox appears to do:
```js
$doc = (new DOMParser()).parseFromString("<!DOCTYPE root SYSTEM 'foo\"bar'><root><child>text</child></root>", "text/xml");
(new XMLSerializer()).serializeToString($doc)
```
outputs
```
<!DOCTYPE root SYSTEM 'foo"bar'>
<root><child>text</child></root>
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3c/DOM-Parsing/issues/71

Received on Friday, 2 July 2021 05:55:21 UTC