W3C home > Mailing lists > Public > public-webapps@w3.org > April to June 2011

Re: SpellCheck API?

From: 坊野 博典 <hbono@google.com>
Date: Thu, 12 May 2011 14:06:53 +0900
Message-ID: <BANLkTi=LrNcPx6tPWrVZu_xVKBz5EbXjPA@mail.gmail.com>
To: Aryeh Gregor <Simetrical+w3c@gmail.com>
Cc: Olli@pettay.fi, Boris Zbarsky <bzbarsky@mit.edu>, public-webapps@w3.org
Greetings Aryeh, et al,

Thank you for your alternative suggestion.
In my honest opinion, I do not stick to my interfaces so much if there
are better alternatives. My proposal is just based on my prototype,
which has been uploaded to <http://webkit.org/b/59693>, and I wish
someone in this ML provides better alternatives.

On Thu, May 12, 2011 at 5:42 AM, Aryeh Gregor <Simetrical+w3c@gmail.com> wrote:

> Hmm, okay.  This means authors will have to reimplement a lot of things:
>
> * Word-breaking.
> * Handling changes: they want to make sure to re-check only the text
> the user changed, not the whole textarea, to avoid their checking
> being O(N) in the length of the text.
> * When text is preloaded, the custom spellchecker will have to check
> all the text, not just visible text.  Maybe this is fast enough to be
> okay, though, if it's only on load and not on every change.
>
> However, maybe this API will only be useful to very large sites
> anyway, which can do all these things.  Other sites can use the
> built-in spellchecker, or rely on a library that did all the hard
> work.  Then we want to be flexible, even if it's harder to use.
>
> But also, we'll have to specify extra things, like: how should markers
> change when the text changes?  If I type "Foo bar" and the author's
> spellchecker marks "Foo", and I type "baz" so it's now "Foo bar baz",
> does the marker on "Foo" get cleared automatically?  What if I change
> it to "Fooo bar"?  Or "Floo bar"?

Yes, it is a difficult question and it was out of my scope when I sent
my original e-mail. When a web application need to use my API to
handle this case, the web application needs to compare text in the
focused node when we receive a DOM event (such as keydown, and a
keyup), clean up all markers, re-check all text, and add markers. (It
is indeed inefficient even when th web application has a cache.)

> Anyway, here's some more detailed feedback on your original idea,
> taking the above into account:
>
> 2011/5/9 Hironori Bono (坊野 博典) <hbono@google.com>:
>> This example adds two methods.
>>  * The window.spellCheckController.removeMarkers() method
>>     Removes the all misspelled underlines and suggestions in the specified node.
>>     The node parameter represents the DOM node in which a web
>> application like to remove all the misspelling underlines and
>> suggestions.
>
> Why do you want to put it on a new global object?  Wouldn't it make
> more sense as a method on the node itself?  Like
> HTMLElement.removeSpellingMarkers().
>
> Also, what if the author wants to remove only one spelling marker?  If
> markers don't get automatically cleared, and the user changed some
> text, maybe the author wants to only clear a few existing markers
> without recalculating all the others.

Thank you for noticing it. I do not assume it since I have added this
method just before sending my original e-mail.
As written in your alternative, it is much better to have a method
that removes a misspelled marker.

>>  * The window.spellCheckController.addMarker() method
>>    Attaches a misspelled underline and suggestions to the specified
>> range of a node.
>>     The node parameter represents a DOM node in which a user agent
>> adds a misspelled underline.
>>     The start and length parameters represent a range of text in the
>> DOM node specified by the node parameter. (We do not use a Range
>> object here because it is hard to specify a range of text in a
>> <textarea> element or an <input> element with it.)
>>     The suggestions parameter represents a list of words suggested by
>> the custom spellchecker. When a custom spellchecker does not provide
>> any suggestions, this parameter should be an empty list.
>
> Do we want this to be usable for contenteditable/designMode documents
> as well as textarea/input?  If so, we also need an API that supports
> Ranges, or something equivalent.
>
>> This example adds two more methods to merge the results of the
>> spellcheckers integrated to user agents.
>>  * The window.spellCheckController.checkWord() method
>>    Checks the spellings of the specified word with the spellchecker
>> integrated to the hosting user agent. When the specified word is a
>> well-spelled one, this method returns true. When the specified word is
>> a misspelled one or the user agent does not have integrated
>> spellcheckers, this method returns false.
>>    The word parameter represents the DOM string to check its spelling.
>>    The language parameter represents a BCP-47
>> <http://www.rfc-editor.org/rfc/bcp/bcp47.txt> tag indicating the
>> language code used by the integrated spellchecker.
>>  * The window.spellCheckController.getSuggestionsForWord() method
>>    Returns the list of suggestions for the specified word. This
>> method returns a DOMStringList object consisting of words suggested by
>> the integrated spellchecker.  When the specified words is a
>> well-spelled word, this method returns an empty list. When the user
>> agent does not have integrated spellcheckers, this method returns
>> null.
>>    The word parameter represents the DOM string to check its spelling.
>>    The language parameter represents a BCP-47
>> <http://www.rfc-editor.org/rfc/bcp/bcp47.txt> tag indicating the
>> language code used by the integrated spellchecker.
>
> As noted, this has privacy problems.  Is it really needed?  Instead,
> why don't we let the author specify how they want their suggestions to
> interact with the default ones?  E.g., let them specify 1) all default
> spellchecking for this textarea should be ignored, 2) suggestions for
> a particular range should be added to the default suggestions (if
> any), 3) suggestions for a particular range should replace the default
> suggestions, 4) a particular range should not be marked as misspelled
> at all.

In brief, I have added this method just because of requests from
web-application developers who develop SVG-based (or canvas-based)
text editors, such as SVG-edit <http://code.google.com/p/svg-edit/>,
without consideration about privacy concerns. Even though they would
like to integrate spellcheckers in their web application, they cannot
afford to implement it all by themselves and like to use the one
integrated into a user agent. (My proposal is a stew of requests from
many web-application developers.) This method is not needed for
implementing a custom spellchecker.

>> [Supplemental, NoInterfaceObject]
>> interface SpellCheckController {
>>  void removeMarkers(Node node);
>>  bool addMarker(Node node, long start, long length, DOMStringList suggestions);
>>  void checkWord(DOMString word, DOMString language);
>>  DOMStringList getSuggestionsForWord(DOMString word, DOMString language);
>> };
>
> Here's an alternative suggestion that addresses the issues I had
> above, while (I think) still addressing all your use-cases.  Create a
> new interface:
>
> interface SpellcheckRange {
>  readonly unsigned long start;
>  readonly unsigned long length;
>  readonly DOMStringList suggestions;
>  readonly unsigned short options = 0;
>  const unsigned short NO_ERROR = 1;
>  const unsigned short ADD_SUGGESTIONS = 2;
> }
>
> "length" could be "end" instead, whichever is more consistent.
> options is a bitfield.  NO_ERROR means that there is no error in this
> range, and the UA should not mark any words there as being errors even
> if the spellcheck attribute is enabled.  (If the author wants to
> completely disable built-in suggestions, they can set
> spellcheck=false.)  ADD_SUGGESTIONS means that the provided
> suggestions should be given in addition to the UA's suggestions,
> instead of replacing them -- by default, the UA's suggestions for that
> range are replaced.  (The default could be the other way around if
> that's better.)  These two features allow the author to control
> default UA suggestions without being able to know what they are, so
> there's no privacy violation.
>
> Then add this to HTMLTextareaElement and HTMLInputElement:
>
>  readonly sequence<SpellcheckRange> spellcheckRanges;
>  void addSpellcheckRange(unsigned long start, unsigned long length,
> DOMStringList suggestions);
>  void addSpellcheckRange(unsigned long start, unsigned long length,
> DOMStringList suggestions, unsigned short options);
>  void removeSpellcheckRange(SpellcheckRange range);
>
> On getting, spellcheckRanges must return an array of SpellcheckRange
> objects, representing the current spellcheck suggestions for that
> textarea or input.  These include only suggestions set by the author,
> not built-in suggestions.  addSpellcheckRange() must add the given
> SpellcheckRange to the list, first removing any ranges that the new
> range overlaps with ("options" defaults to 0 if not provided).
> removeSpellcheckRange() must remove the given SpellcheckRange from the
> list, unless it's not in the list, in which case it must throw a
> NOT_FOUND_ERR exception.  All of these should throw NOT_SUPPORTED_ERR
> if used on an input of a type where they don't make sense (e.g.,
> number).
>
> When the content of the textarea/input changes, the ranges need to be
> updated.  I suggest the following behavior:
>
> * If a character is added or deleted inside an existing range, remove
> the range from spellcheckRanges.  "Inside" means it's inserted at an
> start < offset < start + length, or deleted at start <= offset < start
> + length.
> * If a character is added at or before the start offset of a range,
> increment the start offset of the range, so it still contains the same
> text.
> * If a character is deleted before the start offset of a range,
> decrement the start offset of the range, so it still contains the same
> text.
>
> If support for contenteditable/designMode regions is needed, that
> could be done the same way, except that instead of start/length
> members in SpellcheckRange, you'd have a Range.  Maybe the name
> "SpellcheckRange" is bad, though.
>
> Advantages this API has:
>
> * Allows authors to see all existing ranges they've marked as
> misspelled, and remove them selectively
> * Does not have privacy implications, but still allows authors to
> control how their suggestions interact with the UA's suggestions
>
> I'd be interested to hear what you think about this approach.

I think your alternative makes more sense than mine. (Thank you so
much.) It is cleaner and satisfies all requests from web-application
developers that develop custom spellcheckers. (It would be fantastic
to have a discussion how to satisfy requests from web-application
developers that develop SVG-based applications as noted above.)

Regards,

Hironori Bono
E-mail: hbono@google.com
Received on Thursday, 12 May 2011 05:07:18 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:45 GMT