- From: Aryeh Gregor <Simetrical+w3c@gmail.com>
- Date: Wed, 2 Feb 2011 19:50:06 -0500
On Wed, Feb 2, 2011 at 5:30 PM, Boris Zbarsky <bzbarsky at mit.edu> wrote: > This doesn't work for disconnected subtrees. ?Or rather, it presupposes > certain things about the browser's architecture that I don't think we want > to presuppose. Specifically what? That browsers might not resolve CSS for disconnected subtrees? Note that AFAICT, WebKit treats innerText like textContent for such subtrees, and Gecko returns the empty string when you stringify a Selection that's not displayed. This seems unreasonable from an author perspective, but it's not a big deal, so I can spec something different if it would be simpler for browsers. (Not sure what it should be, though. Empty string, textContent-like behavior, or something that behaves like the normal algorithm except ignoring CSS? The latter seems like the most complicated by far. I'd lean toward an empty string, because it seems the least mysterious.) > That may be ok for Selection (though not sure it is for programmatic ones; > see https://bugzilla.mozilla.org/show_bug.cgi?id=585229), but I fail to see > why it's OK for a DOM property like innerText. In WebKit, innerText is essentially the same as selecting the node and stringifying the Selection -- they use the same code and produce almost exactly the same results in my tests (modulo stuff like trailing newlines). So maybe it shouldn't have been a DOM property, but that's how it works. IE8 behaves similarly. > Note that until recently Gecko had no such dependency in > selection.toString(). ? We made some changes because of the "it's not what > the user sees" issue, but it's a pretty complicated problem, because due to > CSS out-of-flows "what the user sees" and "a DOM range" might have very > little to do with each other. > > You may want to read https://bugzilla.mozilla.org/show_bug.cgi?id=39098 for > some background on this part. What do you mean by "out-of-flows"? Clearly we can't do better than just an approximation here, since we're not going to handle stuff like absolute positioning and so on. > Generated content is tough, because there's no way to capture it with DOM > ranges. ?So if you're using DOM ranges to represent your selections, there's > just no sane way to handle generated content. >From a UI perspective it's weird, yeah, but it doesn't seem hard. You'd have to have the selection jump, so that it includes either the whole stretch of generated content or none of it. This is the way the UI looks in Gecko right now for images that are displaying their alt text, like: data:text/html,<img alt=test> >From a programmatic perspective, it's also fairly straightforward to see how it would work, as long as you don't demand that it be possible to partially select generated content. Of course, it might not be straightforward to implement. > Looking briefly over the code we use to serialize to text for copy/paste > (but also for other purposes, so this code has several different modes, > which complicates things), there's stuff there to deal specially with tabs, > nested ordered lists, <h*> vertical spacing and indentation, non-breaking > spaces, blockquote (especially of type="cite"), noscript/noframes/iframe, > <p>, <pre> (especially inside blockquotes), <tr>, <td>/<th>, <dl>/<dt>, > <span> (nesting level affects whether pretty line-wrapping happens or > something like that), <q>, tags that are "block-level" in the HTML4 sense, > <sup> and <sub>, <code>, <strong> and <b>, <em> and <i>, <u>. > > Plus there's the black magic about when to rewrap things and when to > preserve the original whitespace or whatnot. > > See > http://hg.mozilla.org/mozilla-central/file/1c2d53a2dcfb/content/base/src/nsPlainTextSerializer.cpp > for details. Thanks, I'll test those and take a look at that code. > I should note that it's not clear to me how much we want to standardize what > browsers actually copy when the user copies. ?This seems like something that > users may want to configure and where we want to let browsers experiment > with heuristics and such; I have a really hard time believing that the > current browser behavior here is the best we can do. This occurred to me too. It seems like a must to standardize how innerText and Selection.toString() behave, because those are visible to script and pretty widely used, and the interop story right now is terrible. Of course, there's nothing to stop implementations from experimenting and passing the improvements back to the spec. > That leaves the question of whether Selection.toString should produce the > same string as the user copying and pasting would, of course. Perhaps it > shouldn't. ?I'm not sure we'd want to make what toString produce depend on > new CSS layout modes, for example, since that could break scripts... but the > user-facing copied text might want to depend on those. I'm not sure why it would break many existing pages if it only kicks in with new layout modes. But maybe I don't have a good enough grasp on how these functions are actually used. I should probably comb through a sample of web pages to see people use this stuff. (Unfortunately it's not so easy to search for Selection stringification, but I can look for innerText.)
Received on Wednesday, 2 February 2011 16:50:06 UTC