Re: SpellCheck API?

On Tue, May 10, 2011 at 1:42 PM, Olli Pettay <Olli.Pettay@helsinki.fi> wrote:
> Something like that might be better. Do you have the exact API in mind?

Well, just the same as I originally proposed, except with arrays
instead of scalars.  But Hironori Bono's reply has mooted this idea
anyway.

2011/5/11 Hironori Bono (坊野 博典) <hbono@google.com>:
> When I talked with web-application developers, some of them liked to
> check spellings of words not only when a user types words but also
> when JavaScript code creates an editable element with prepopulated
> text. For example, a web application checks text in a To: field with
> its custom spellchecker and adds misspelled underlines under "invalid"
> e-mail addresses. (This example is mainly for notifying users that
> they may be replying phishing e-mails.) Some other web-application
> developers also like to check spelling of words in an editable element
> before sending text to a server. To satisfy these requests, a user
> agent may need to send spellcheck events also when JavaScript code
> creates an editable node or changes text in an editable node. (Even
> though I have not measured how much time it takes to send these events
> without JavaScript execution, it may hurt the speed of JavaScript
> code.)

This shouldn't be a problem to do.  For instance, we could have a
method like .spellcheck() that asks the browser to fire spellcheck
events for particular nodes.

> When I talked with web-application developers, some of them liked to
> integrate n-gram spellcheckers so they can correct simple grammatical
> errors, such as article-noun mismatches ("a apple" -> "an apple") and
> subject-verb mismatches ("he have" -> "he has"). To satisfy their
> requests, a user agent may need to send two words or more (up to all
> words in an editable element).

Hmm, okay.  This means authors will have to reimplement a lot of things:

* Word-breaking.
* Handling changes: they want to make sure to re-check only the text
the user changed, not the whole textarea, to avoid their checking
being O(N) in the length of the text.
* When text is preloaded, the custom spellchecker will have to check
all the text, not just visible text.  Maybe this is fast enough to be
okay, though, if it's only on load and not on every change.

However, maybe this API will only be useful to very large sites
anyway, which can do all these things.  Other sites can use the
built-in spellchecker, or rely on a library that did all the hard
work.  Then we want to be flexible, even if it's harder to use.

But also, we'll have to specify extra things, like: how should markers
change when the text changes?  If I type "Foo bar" and the author's
spellchecker marks "Foo", and I type "baz" so it's now "Foo bar baz",
does the marker on "Foo" get cleared automatically?  What if I change
it to "Fooo bar"?  Or "Floo bar"?


Anyway, here's some more detailed feedback on your original idea,
taking the above into account:

2011/5/9 Hironori Bono (坊野 博典) <hbono@google.com>:
> This example adds two methods.
>  * The window.spellCheckController.removeMarkers() method
>     Removes the all misspelled underlines and suggestions in the specified node.
>     The node parameter represents the DOM node in which a web
> application like to remove all the misspelling underlines and
> suggestions.

Why do you want to put it on a new global object?  Wouldn't it make
more sense as a method on the node itself?  Like
HTMLElement.removeSpellingMarkers().

Also, what if the author wants to remove only one spelling marker?  If
markers don't get automatically cleared, and the user changed some
text, maybe the author wants to only clear a few existing markers
without recalculating all the others.

>  * The window.spellCheckController.addMarker() method
>    Attaches a misspelled underline and suggestions to the specified
> range of a node.
>     The node parameter represents a DOM node in which a user agent
> adds a misspelled underline.
>     The start and length parameters represent a range of text in the
> DOM node specified by the node parameter. (We do not use a Range
> object here because it is hard to specify a range of text in a
> <textarea> element or an <input> element with it.)
>     The suggestions parameter represents a list of words suggested by
> the custom spellchecker. When a custom spellchecker does not provide
> any suggestions, this parameter should be an empty list.

Do we want this to be usable for contenteditable/designMode documents
as well as textarea/input?  If so, we also need an API that supports
Ranges, or something equivalent.

> This example adds two more methods to merge the results of the
> spellcheckers integrated to user agents.
>  * The window.spellCheckController.checkWord() method
>    Checks the spellings of the specified word with the spellchecker
> integrated to the hosting user agent. When the specified word is a
> well-spelled one, this method returns true. When the specified word is
> a misspelled one or the user agent does not have integrated
> spellcheckers, this method returns false.
>    The word parameter represents the DOM string to check its spelling.
>    The language parameter represents a BCP-47
> <http://www.rfc-editor.org/rfc/bcp/bcp47.txt> tag indicating the
> language code used by the integrated spellchecker.
>  * The window.spellCheckController.getSuggestionsForWord() method
>    Returns the list of suggestions for the specified word. This
> method returns a DOMStringList object consisting of words suggested by
> the integrated spellchecker.  When the specified words is a
> well-spelled word, this method returns an empty list. When the user
> agent does not have integrated spellcheckers, this method returns
> null.
>    The word parameter represents the DOM string to check its spelling.
>    The language parameter represents a BCP-47
> <http://www.rfc-editor.org/rfc/bcp/bcp47.txt> tag indicating the
> language code used by the integrated spellchecker.

As noted, this has privacy problems.  Is it really needed?  Instead,
why don't we let the author specify how they want their suggestions to
interact with the default ones?  E.g., let them specify 1) all default
spellchecking for this textarea should be ignored, 2) suggestions for
a particular range should be added to the default suggestions (if
any), 3) suggestions for a particular range should replace the default
suggestions, 4) a particular range should not be marked as misspelled
at all.

> [Supplemental, NoInterfaceObject]
> interface SpellCheckController {
>  void removeMarkers(Node node);
>  bool addMarker(Node node, long start, long length, DOMStringList suggestions);
>  void checkWord(DOMString word, DOMString language);
>  DOMStringList getSuggestionsForWord(DOMString word, DOMString language);
> };

Here's an alternative suggestion that addresses the issues I had
above, while (I think) still addressing all your use-cases.  Create a
new interface:

interface SpellcheckRange {
  readonly unsigned long start;
  readonly unsigned long length;
  readonly DOMStringList suggestions;
  readonly unsigned short options = 0;
  const unsigned short NO_ERROR = 1;
  const unsigned short ADD_SUGGESTIONS = 2;
}

"length" could be "end" instead, whichever is more consistent.
options is a bitfield.  NO_ERROR means that there is no error in this
range, and the UA should not mark any words there as being errors even
if the spellcheck attribute is enabled.  (If the author wants to
completely disable built-in suggestions, they can set
spellcheck=false.)  ADD_SUGGESTIONS means that the provided
suggestions should be given in addition to the UA's suggestions,
instead of replacing them -- by default, the UA's suggestions for that
range are replaced.  (The default could be the other way around if
that's better.)  These two features allow the author to control
default UA suggestions without being able to know what they are, so
there's no privacy violation.

Then add this to HTMLTextareaElement and HTMLInputElement:

  readonly sequence<SpellcheckRange> spellcheckRanges;
  void addSpellcheckRange(unsigned long start, unsigned long length,
DOMStringList suggestions);
  void addSpellcheckRange(unsigned long start, unsigned long length,
DOMStringList suggestions, unsigned short options);
  void removeSpellcheckRange(SpellcheckRange range);

On getting, spellcheckRanges must return an array of SpellcheckRange
objects, representing the current spellcheck suggestions for that
textarea or input.  These include only suggestions set by the author,
not built-in suggestions.  addSpellcheckRange() must add the given
SpellcheckRange to the list, first removing any ranges that the new
range overlaps with ("options" defaults to 0 if not provided).
removeSpellcheckRange() must remove the given SpellcheckRange from the
list, unless it's not in the list, in which case it must throw a
NOT_FOUND_ERR exception.  All of these should throw NOT_SUPPORTED_ERR
if used on an input of a type where they don't make sense (e.g.,
number).

When the content of the textarea/input changes, the ranges need to be
updated.  I suggest the following behavior:

* If a character is added or deleted inside an existing range, remove
the range from spellcheckRanges.  "Inside" means it's inserted at an
start < offset < start + length, or deleted at start <= offset < start
+ length.
* If a character is added at or before the start offset of a range,
increment the start offset of the range, so it still contains the same
text.
* If a character is deleted before the start offset of a range,
decrement the start offset of the range, so it still contains the same
text.

If support for contenteditable/designMode regions is needed, that
could be done the same way, except that instead of start/length
members in SpellcheckRange, you'd have a Range.  Maybe the name
"SpellcheckRange" is bad, though.

Advantages this API has:

* Allows authors to see all existing ranges they've marked as
misspelled, and remove them selectively
* Does not have privacy implications, but still allows authors to
control how their suggestions interact with the UA's suggestions

I'd be interested to hear what you think about this approach.

Received on Wednesday, 11 May 2011 20:49:01 UTC