W3C home > Mailing lists > Public > whatwg@whatwg.org > February 2011

[whatwg] HTML-to-plaintext conversion (innerText and Selection.toString())

From: Glenn Maynard <glenn@zewt.org>
Date: Wed, 2 Feb 2011 20:16:49 -0500
Message-ID: <AANLkTi=9V98ktjA1OFF59W8qXfdXwMSvwY=ud+20HHr2@mail.gmail.com>
On Wed, Feb 2, 2011 at 5:30 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote:

> I should note that it's not clear to me how much we want to standardize
> what browsers actually copy when the user copies.  This seems like something
> that users may want to configure and where we want to let browsers
> experiment with heuristics and such; I have a really hard time believing
> that the current browser behavior here is the best we can do.
>

Given how often I've had poor results from copying (hidden blocks being
included, copying image alts, sprinkling newlines in strange places, and so
on), this seems important--browsers should be free to improve on copying
without violating the spec.

That leaves the question of whether Selection.toString should produce the
> same string as the user copying and pasting would, of course. Perhaps it
> shouldn't.  I'm not sure we'd want to make what toString produce depend on
> new CSS layout modes, for example, since that could break scripts... but the
> user-facing copied text might want to depend on those.
>

I'd intuitively expect toString to give the same results that the user would
get if he did a copy.  If the two differ, there should be a separate method
to do just  that, including any browser-specific heuristics and so on.  That
way, scripts can get the best possible text representation available, rather
than the most precisely-defined one, when that's what they want.

-- 
Glenn Maynard
Received on Wednesday, 2 February 2011 17:16:49 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:30 UTC