[whatwg] Proposal: Add HTMLElement.innerText from Mike Wilcox on 2010-08-15 (public-whatwg-archive@w3.org from August 2010)

From: Mike Wilcox <mike@mikewilcox.net>
Date: Sun, 15 Aug 2010 10:17:43 -0500
Message-ID: <AF5F80C1-4C77-4023-9AF1-48282A6CE514@mikewilcox.net>
innerText is one of those things IE got right, just like innerHTML. Let's please consider making that a standard instead of removing it. Also, please don't make the mistake of thinking it is the same thing as textContent. Think of textContent as pre-formatted text, and innerText as plain text. IE even correctly handles a span with display:block; and adds a line break.


Michael, good try, but I've been down that road; it's pretty hard to do. You left in the script text, spaces were missing, and there were no line breaks. You'd almost need an HTML parser. Take a list of tags like these: 
p span span em strong p script ul li span li span

You need to know where there are line breaks, or spaces, or neither. And that's without considering all the other block or HTML5 elements, or tables, etc. However, it's still not as easy as testing for whether the node is a block (or list-item, etc), because you then need to know if it is a block compared to the next and previous nodes; else a span in a p will get line breaks.

Mike Wilcox
http://clubajax.org
mike at mikewilcox.net



On Aug 15, 2010, at 7:41 AM, Michael A. Puls II wrote:

> On Sat, 14 Aug 2010 20:03:30 -0400, Mike Wilcox <mike at mikewilcox.net> wrote:
> 
>> Wow, I was just thinking of proposing this myself a few days ago.
>> 
>> In addition to Adam's comments, there is no standard, stable way of *getting* the text from a series of nodes. textContent returns everything, including tabs, white space, and even script content.
> 
> Well, you can do stuff like this:
> 
> ------
> (function() {
>    function trim(s) {
>        return s.replace(/^\s\s*/, '').replace(/\s\s*$/, '');
>    }
>    function setInnerText(v) {
>        this.textContent = v;
>    }
>    function getInnerText() {
>        var iter = this.ownerDocument.createNodeIterator(this,
>        NodeFilter.SHOW_TEXT, null, null);
>        var ret = "";
>        var first = true;
>        for (var node; (node = iter.nextNode()); ) {
>            var fixed = trim(node.nodeValue.replace(/\r|\n|\t/g, ""));
>            if (fixed.length > 0) {
>                if (!first) {
>                    ret += " ";
>                }
>                ret += fixed;
>                first = false;
>            }
>        }
>        return ret;
>    }
>    HTMLElement.prototype.__defineGetter__('myInnerText', getInnerText);
>    HTMLElement.prototype.__defineSetter__('myInnerText', setInnerText);
> })();
> ------
> 
> and adjust how you handle spaces and build the string etc. as you see fit. Then, it's just alert(el.myInnerText).
> 
> NodeIterator's standard. __defineGetter/Setter__ is de-facto standard (and you have Object.defineProperty as standard for those that support it). How newlines and tabs and spaces are stripped/normalized just isn't standardized in this case. But that might different depending on the application.
> 
> Or, just run a regex on textContent.
> 
> -- 
> Michael
Received on Sunday, 15 August 2010 08:17:43 UTC