Re: [whatwg/dom] Valid/Invalid characters in document.createElement() (#849)

@mfreed7 asked me about my thoughts on security implications of being more lenient in tag names. So here are my two cents.

@domenic wrote:
> So I guess the relevance here is, if someone does
> 
> ```js
> const el = document.createElement(userSuppliedString);
> await storeInServer(el.outerHTML);
> 
> // ... later ...
> const html = await getFromServer();
> container.innerHTML = html;
> ```
> 
> and `userSuppliedString` is `p><script>alert(1);</script`, then `el.outerHTML` would end up as `<p><script>alert(1);</script></p><script>alert(1);</script>`. That is indeed bad!
> 
> If we disallow tab, LF, CR, FF, space, /, >, NULL, then nothing immediately sticks out to me as problematic. But I worry that I'm just not doing enough security analysis. It would be a shame if this simplification fix ended up causing everyone security problems.

I think it's a reasonable assumption that the user would also be able to control at least some parts of the content if they can supply the tag name. If that's the case, then `userSuppliedString` might be just `script`. Or (if we're being lenient) `_<script` which would be serialized to `<_<script>` starting in fact a `SCRIPT` element.

That being said, I've never been under impression that the design goal of DOM APIs is to disallow reparsing attacks. There are countless examples of "manually" created DOM trees that will execute JavaScript after reparsing. [Even the HTML spec admits the problem](https://html.spec.whatwg.org/multipage/parsing.html#serialising-html-fragments:the-textarea-element).  Three examples of that:

```js
// Example 1
const div = document.createElement('div');
div.append(document.createComment('--><script>alert(1)</script>'));
console.log("<div><!----><img src onerror=alert(1)>--></div>");
// logs: "<div><!----><img src onerror=alert(1)>--></div>"

// Example 2
const style = document.createElement('style');
style.textContent = '</style><img src onerror=alert(1)>';
console.log(style.outerHTML);
// logs: "<style></style><img src onerror=alert(1)></style>"

// Example 3
const textarea = document.createElement('textarea');
const div2 = document.createElement('div');
div2.setAttribute('title', '</textarea><img src onerror=alert(1)>');
textarea.append(div2);
// logs: "<textarea><div title="</textarea><img src onerror=alert(1)>"></div></textarea>"
```

Furthermore, going back to the example given by @domenic:

> ```js
> const el = document.createElement(userSuppliedString);
> await storeInServer(el.outerHTML);
> 
> // ... later ...
> const html = await getFromServer();
> container.innerHTML = html;
> ```

I'd say that no matter what happens in `document.createElement`, this piece of code is still a security issue because the `html` should be sanitized before assigning it to `innerHTML`.

So my personal take is that if we disallow tab, LF, CR, FF, space, /, >, NULL, we should be fine. This way, we disallow creating a reparsing issue from a single call to `document.createElement`. If some other conditions are met, then you can still have a mutation XSS but it's also the case for many other DOM APIs. I don't consider it that much of a problem, since in a real world mXSS usually happens when parser-created HTML is mutated on reparsing (check [this research of mine to see an example](https://research.securitum.com/mutation-xss-via-mathml-mutation-dompurify-2-0-17-bypass/)).

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/dom/issues/849#issuecomment-1090076902
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/dom/issues/849/1090076902@github.com>

Received on Wednesday, 6 April 2022 09:51:57 UTC