[w3c/DOM-Parsing] Attribute value serialization does not take whitespace normalization into account (#59)

The steps for "to serialize an attribute value" only escape the characters `"`, `&`, `<` and `>` in the attribute value. White space characters are passed through to the serialization as-is. However, XML processors will replace each space, tab, carriage return or line feed character with a space according to https://www.w3.org/TR/xml11/#AVNormalize unless the character was present as a character reference. It seems therefore that the attribute value serialization algorithm should include a step mapping tab to `&#9;`, carriage return to '&#xD;` and line feed to `&#xA;`.

Testing this in various browsers shows that these already apply a similar substitution:

```
new XMLSerializer().serializeToString(
    new DOMParser().parseFromString('<root attr="&#x20;&#x9;&#xD;&#xA;"/>', 'text/xml')
)
// <root attr=" &#9;&#xD;&#xA;"/> in Firefox
// <root attr=" &#9;&#13;&#10;"/> in Edge / Chrome
```

The algorithm as described in this specification would generate `<root attr=" \t\r\n"/>` (where `\t` `\r` and `\n` represent tab, carriage return and line feed respectively). Only Safari seems to follow the specification here. Unfortunately, this serialization does not survive a round-trip, as it is normalized to four spaces by processors such as the DOMParser:

```
new XMLSerializer().serializeToString(
    new DOMParser().parseFromString('<root attr=" \t\r\n"/>', 'text/xml')
)
// <root attr="   "/>
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3c/DOM-Parsing/issues/59

Received on Tuesday, 7 January 2020 14:38:27 UTC