[whatwg/dom] Proposal: DOM APIs in web workers? (Issue #1217)

I think there are valid use cases for DOM APIs like `DOMParser`, `XMLSerializer`, `document.implementation.createDocument()` etc. to be available in web workers. I don't mean having direct access to the current document (that wouldn't make sense, of course), but being able to parse, create, modify and serialize "offscreen" documents. Use cases for this include:
- **Parsing & serializing XML files off the main thread**: For example, I'm currently working on a web-based rich text editor and Microsoft Word alternative, and I'm planning to add DOCX (Microsoft Word document file) support to it in the future. A DOCX file basically consists of a bunch of XML files zipped into a compressed archive. I can then [(un)compress the zip file with the help of (De)CompressionStream](https://dev.to/ndesmic/writing-a-simple-browser-zip-file-decompressor-with-compressionstreams-5che) and parse the XML files with `DOMParser` or create them with `XMLSerializer`. Currently, this has to be done on the main thread which will lead to the page being unresponsive while reading/writing DOCX files.
Some projects like @jakearchibald's [SVGOMG](https://github.com/jakearchibald/svgomg), an SVG optimizer & minifier based on [SVGO](hhttps://npmjs.com/package/svgo), are currently even using XML parsing libraries like [Sax](https://www.npmjs.com/package/@trysound/sax) instead of the browser's `DOMParser` – amongst other reasons, to make them work in web workers.
- **Generating HTML files off the main thread**: Applications that generate HTML files – be it website builders, [math document editors](https://github.com/BenjaminAster/PAMM), Markdown to HTML transpilers, etc. – could profit immensely from being able to convert their internal representations to HTML off the main thread.

Since only a few months, all three major browser engines support worker modules and OffscreenCanvas, so I think websites are starting to do more and more expensive stuff off the main thread, with people like @surma having [advocated](https://www.youtube.com/watch?v=7Rrv9qFMWNM) for that for years.

From a technical perspective, my proposal is that e.g. a global `self.document` property is exposed in workers, which is a stripped down version of `Document` containing only the following properties and functions:
- `self.document.implementation`
- `self.document.createAttribute()`
- `self.document.createAttributeNS()`
- `self.document.createCDATASection()`
- `self.document.createComment()`
- `self.document.createDocumentFragment()` (?)
- `self.document.createElement()`
- `self.document.createElementNS()`
- `self.document.createEvent()`
- `self.document.createExpression()`
- `self.document.createProcessingInstruction()`
- `self.document.createRange()` (?)
- `self.document.createTextNode()`

Additionally, the following interfaces should be exposed in workers:
- `Document` & `XMLDocument`
- `DocumentType`
- `DOMImplementation`
- `DocumentFragment`
- [`DOMParser`](https://html.spec.whatwg.org/multipage/dynamic-markup-insertion.html#dom-parsing-and-serialization)
- [`XMLSerializer`](https://w3c.github.io/DOM-Parsing/#the-xmlserializer-interface)
- `XSLTProcessor`
- [`Sanitizer`](https://wicg.github.io/sanitizer-api/#sanitizer-api)
- `Node`
- `ParentNode`
- `Attr`
- `CharacterData`
- `Text`
- `CDATASection`
- `Element`
- `Comment`
- [`HTMLElement`](https://html.spec.whatwg.org/multipage/dom.html#htmlelement) and all HTML element interfaces
- [`SVGElement`](https://svgwg.org/svg2-draft/types.html#InterfaceSVGElement) and all SVG element interfaces
- [`MathMLElement`](https://w3c.github.io/mathml-core/#dom-and-javascript)
- `NodeList`
- `HTMLCollection`
- `AbstractRange`, `StaticRange` & `Range`
- `MutationObserver` & `MutationRecord` (?)
- `NamedNodeMap`
- `ProcessingInstruction`
- `XPathResult`, `XPathExpression` & `XPathEvaluator`

One could then use `new DOMParser().parseFromString()` or `self.document.implementation.{createDocument(), createHTMLDocument()}` to create a new document, modify it with all the usual and beloved DOM methods, and stringify it with `new XMLSerializer().serializeToString()` or `myOffscreenDocument.documentElement.outerHTML`.

Things like [`Element.prototype.getClientRects()`](https://drafts.csswg.org/cssom-view/#ref-for-dom-element-getclientrects) or [`Element.prototype.computedStyleMap()`](https://drafts.css-houdini.org/css-typed-om/#ref-for-dom-element-computedstylemap) don't make sens with offscreen documents of course, but that is already the case with documents created on the main thread with `DOMParser` or `document.implementation.createHTMLElement`.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/dom/issues/1217
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/dom/issues/1217@github.com>

Received on Sunday, 30 July 2023 10:54:04 UTC