proposal to expose word breaker through JavaScript: range.expand()

I proposed to expand a range to word boundary.
Here <https://bugs.webkit.org/show_bug.cgi?id=27632>is the related webkit
bug.
Below is the spec.
Appreciated any feedbacks.


*Syntax*: expands the range to the 'unit' boundary.
interface Range {
    void  expand(in DOMString uint)
}

*Parameters*:
unit: String that specifies the units to move in the range, using one of the
following values:
word -- expand the range to include completed words. A word is the smallest
semantic form in one language. In languages use space to break word, such as
English, a word is a collection of characters terminated by a space or
punctuation.
sentence -- expand the range to include completed sentences. A sentence is a
collection of words terminated by punctuation.
block -- expand the range to include completed paragraphs.
document -- expand the range to include the whole document.

*Use case*:
To identify a semantic unit (such as a word) when user mouse over or mouse
click on a page. A specific use case is dictionary, which shows the
worddefinition when user mouse over or mouse click in a webpage.

*Example*:
This example returns the range containing the word user moused over or mouse
clicked (not double-clicked).

var range = document.caretRangeFromPoint<http://dev.w3.org/csswg/cssom-view/#dom-documentview-caretrangefrompoint>(event.clientX,
event.clientY);
range.expand('word');

*Reference*: Microsoft's
spec<http://msdn.microsoft.com/en-us/library/ms536421(VS.85).aspx>
.

Thanks,
Xiaomei


On Thu, Jul 30, 2009 at 4:55 AM, Anne van Kesteren <annevk@opera.com> wrote:

> Thanks for the reply!
>
> On Wed, 29 Jul 2009 20:17:13 +0200, Xiaomei Ji <xji@chromium.org> wrote:
> > "word" is a keyword.  Like Microsoft's
> > spec<http://msdn.microsoft.com/en-us/library/ms536421%28VS.85%29.aspx>,
> > a range could be extended to a 'character', a 'word', a 'sentence', or a
> > 'line' etc.
> >
> > As to whether it should be a method in Document or in Range, it is open
> > to discussion.
>
> I think putting it on Range similar to what Microsoft has done would be
> better. Document is already pretty bloated.
>
>
> >> And finally, use cases would help as well as a definition of "word
> >> boundary" and how this works/won't work in an international context.
> >
> > A word is the smallest semantic form in one language. In languages use
> > space to break word, such as English, a word is a collection of
> characters
> > terminated by a space or punctuation. In languages do not use space to
> > break word, such as Chinese, word breaker is needed to break a word.
> >
> > The API should work for English, at least. Whether it works in an
> > international context depends on whether the layout engine/browser
> > supports correct word breaker in that language.
>
> I do think we should define this.
>
>
> I think the main problem here is that we do not have an editor for a new
> version of DOM Range. There are several other extensions that browsers have
> implemented and emulated from each other that would be nice to document
> clearly, but so far nobody has volunteered.
>
> (It could maybe be done as a separate draft as well, similarly to how CSSOM
> View extends DOM Range, but updating DOM Range to today's standards for
> specifications would be good I think.)
>
>
> --
> Anne van Kesteren
> http://annevankesteren.nl/
>

Received on Thursday, 13 August 2009 22:05:19 UTC