Native DOM way to get nodes of arbitrary type/name from Marat Tanalin on 2013-10-04 (www-dom@w3.org from October to December 2013)

From: Marat Tanalin <mtanalin@yandex.ru>
Date: Fri, 04 Oct 2013 23:27:45 +0400
To: www-dom@w3.org
Message-Id: <13101380914865@web17g.yandex.ru>
Hello.

It would be nice to have a native (usable and performant) DOM way for retrieving DOM nodes by node type (or, alternatively, by node name).

This could be represented by these two simple methods:

    * element.getNodesByType(type) -- to get _all_ nodes
      of specified type contained in the element
      (like `element.getElementsByTagName('*')` for elements);

    * element.getChildNodesByType(type) -- to get _direct child_
      nodes of the element that have the specified type
      (like `element.children` for elements).

where the `type` argument is an Integer (literal or a corresponding predefined named constant as a property of the global `Node` object [1]) representing the type of nodes we need to retrieve.

For example:

    element.getNodesByType(Node.COMMENT_NODE)

would return all comment nodes inside the element.

And:

    element.getChildNodesByType(Node.TEXT_NODE)

would return all text nodes that are direct child nodes of the element.

=====================================================
Can get elements natively, cannot get arbitrary nodes
=====================================================

Currently we have dedicated DOM methods and properties for retrieving _elements_:

    * element.getElementsByTagName();
    * element.children;

and _all_ child nodes regardless of their type:

    * element.childNodes.

But we have no native DOM way to retrieve nodes of any arbitrary type as easily.

========
Usecases
========

    * For example, server-side script could minify HTML code
      by removing all HTML comments as DOM nodes. (DOM is not
      about just JavaScript inside browser. DOM can be used
      on server side for arbitrary DOM-tree modifications.)

    * Another usecase is processing text nodes via JavaScript
      in browser.

Currently, we are forced to use pure-script way, i.e. either by using Regular Expressions (which is a recognized wrong way in general as for parsing/processing markup) or by DOM traversing through all DOM nodes and filtering them manually by checking their types one by one.

Retrieving all nodes includes first retrieving all elements via `getElementsByTagName('*')` method, then in-loop retrieving direct child nodes of all of them via `element.childNodes` property. All of this is not only not quite developer-friendly (painful actually), but also just _slow_.

With the native `getNodesByType()` and `getChildNodesByType(type)` methods, retrieving DOM nodes of arbitrary type would became trivial and much faster than using a pure-script implementation.

======================
Node type or node name
======================

Alternatively, we could have methods to search nodes not by type, but by node name. For example:

    element.getNodesByNodeName('#comment')

would return all comment nodes exactly like `element.getNodesByType(Node.COMMENT_NODE)` described above.

Using node name looks more flexible since it, for example, would allow to get all child elements of specified tag name which is impossible currently (`element.children` returns all elements regardless of their tag name, and there is no `element.getChildElementsByTagName()` method),  but will probably be possible with `findAll('> SOME_TAG_NAME')`, though `findAll()` approach would probably be anyway slower than `element.getChildNodesByType('SOME_TAG_NAME')` since `find()` involves selector parsing while `getChildNodesByType()` does not.

Maybe the best option is just to allow both node-type Integer and node-name String as argument for `getNodesByType()` / `getChildNodesByType()` without need to choose one. For example:

    element.getNodesByType('#comment')

could effectively be exact equivalent to:

    element.getNodesByType(Node.COMMENT_NODE)

Anyway, whether to search by node type or node name or both of them does matter not too much.

What really matters is the idea of native (usable and performant compared with pure-script ways) DOM way to retrieve nodes of _any_ arbitrary type/name.

Thanks.

[1] https://developer.mozilla.org/en-US/docs/Web/API/Node.nodeType
Received on Friday, 4 October 2013 19:28:14 UTC