W3C home > Mailing lists > Public > public-webapps@w3.org > April to June 2011

Re: SpellCheck API?

From: Olli Pettay <Olli.Pettay@helsinki.fi>
Date: Mon, 09 May 2011 13:41:52 +0300
Message-ID: <4DC7C4F0.1070607@helsinki.fi>
To: "Hironori Bono (坊野 博典)" <hbono@google.com>
CC: public-webapps@w3.org
On 05/09/2011 11:58 AM, Hironori Bono (坊野 博典) wrote:
> Greetings,
>
> I'm Hironori Bono, a software engineer for Google Chrome.
> We recently received requests from web-application developers (and
> extension developers) that they would like to use the spellchecker
Quite different targets.

> integrated into Google Chrome and to replace the spellchecker with
> their spellcheckers implemented in JavaScript as written in the
> following document. To satisfy their requests, I would like to propose
> to add an API that controls spellcheckers integrated into a user agent
> if it has. Even though I'm wondering if all user agents need this API,
> it would be great to give me feedback.
>
> Thank you for your interest in advance.
>
> 1. Introduction
> HTML5 provides a spellcheck attribute to enable or disable the
> spellcheckers integrated into user agents in an editable element. This
> attribute prevents the spellcheckers from checking text in an editable
> element where web applications do not like it, e.g. e-mail addresses,
> URLs, etc. Some user agents provide scripting access to spellcheckers.
Providing scripting access to built-in spellchecker is a privacy
violation (this has been discussed in @whatwg mailing list) -
web page could know which language users uses/has for spellchecking
and if user has added new word to the known-words list.



> For example, Internet Explorer allows using the spellchecker
> integrated into Microsoft Word via ActiveX as listed in the following
> code snippet.
>
> function CheckText(text) {
>    var result = new Array;
>    var app = new ActiveXObject('Word.Application');
>    var doc = app.Documents.Add();
>    doc.Content = text;
>    for (var i = 1; i<= doc.SpellingErrors.Count; i++) {
>      var spellingError = doc.SpellingErrors.Item(i);
>      for (var j = 1; j<= spellingError.Words.Count; j++) {
>        var word = spellingError.Words.Item(j);
>        var error = {};
>        error.word = word.Text;
>        error.start = word.Start;
>        error.length = word.Text.length;
>        error.suggestions = new Array;
>        var suggestions = word.GetSpellingSuggestions();
>        for (var k = 1; k<= suggestions.Count; k++) {
>          error.suggestions.push(suggestions.Item(k).Name);
>        }
>        result.push(error);
>      }
>    }
>    return result;
> }
>
> On the other hand, it is not so easy for web-application developers to
> integrate custom spellcheckers (e.g. a spellchecker that uses a
> contact list to check e-mail addresses, names, street addresses, etc.)
> into their web applications. Even though several web applications
> (such as GMails)
Oh, I didn't know that if I teach my browser's spellchecker to know
the words I use commonly, GMail can't handle that. Interesting.


  have integrated custom spellcheckers, such web
> applications use content-editable<div>  elements to render misspelled
> underlines and the ‘z-index’ properties to show suggestions,
> respectively. Unfortunately, it is not so easy to apply these
> techniques when web applications use<textarea>  elements or<input>
> elements for user input because it is pretty hard to identify the
> position of misspelled words in these elements. To solve this problem,
> it would be great for user agents to provide scripting access to their
> spell-checker framework so web-application developers can integrate
> their custom spellcheckers
Adding support for custom spellcheckers seems reasonable.
Need to just make sure that web page doesn't get access to the native
spellcheck data (at least not without permission).


> to their web applications as listed in the
> following code snippet.
>
> function CheckTextOfNode(node) {
>    // Remove all the previous spellchecking results.
>    window.spellCheckController.removeMarkers(node);
>
>    // Check the text in the specified node.
>    var result = CheckText(node.innerText ? node.innerText : node.value);
>    for (var i = 0; i<  result.length; i++) {
>      // Add a misspelled underline and suggestions to the specified word.
>      window.spellCheckController.addMarker(
>          node, result[i].start, result[i].length, result[i].suggestions);
>    }
> }
>
> This example adds two methods.
>    * The window.spellCheckController.removeMarkers() method
>      Removes the all misspelled underlines and suggestions in the specified node.
>      The node parameter represents the DOM node in which a web
> application like to remove all the misspelling underlines and
> suggestions.
>    * The window.spellCheckController.addMarker() method
>      Attaches a misspelled underline and suggestions to the specified
> range of a node.
>      The node parameter represents a DOM node in which a user agent
> adds a misspelled underline.
>      The start and length parameters represent a range of text in the
> DOM node specified by the node parameter. (We do not use a Range
> object here because it is hard to specify a range of text in a
> <textarea>  element or an<input>  element with it.)
>      The suggestions parameter represents a list of words suggested by
> the custom spellchecker. When a custom spellchecker does not provide
> any suggestions, this parameter should be an empty list.
>
> Even though these functions are sufficient for web-application
> developers who use only their custom spellcheckers, they are not
> sufficient for ones who use both their custom spellcheckers and the
> one integrated to user agents. (For example, web applications that use
> the integrated spellcheckers only for words which their custom
> spellcheckers treat as misspelled.)
>
> function CheckTextOfNode(node) {
>    // Reset all the previous spellcheck results.
>    Window.spellCheckController.removeMarkers(node);
>
>    // Check the text with our custom spellchecker.
>    var result = CheckText(node.innerText ? node.innerText : node.value);
>    for (var i = 0; i<  result.length; i++) {
>      // Use the intergrated spellchecker to check a misspelled word.
>      if (!window.spellCheckController.checkWord(result.word)) {
>        result[i].suggestions.concat(
>            window.spellCheckController.getSuggestionsForWord(result.text));
>        window.spellCheckController.addMarker(
>            node, result[i].start, result[i].length, result[i].suggestions);
>      }
>    }
> }
>
> This example adds two more methods to merge the results of the
> spellcheckers integrated to user agents.
>    * The window.spellCheckController.checkWord() method
>      Checks the spellings of the specified word with the spellchecker
> integrated to the hosting user agent. When the specified word is a
> well-spelled one, this method returns true. When the specified word is
> a misspelled one or the user agent does not have integrated
> spellcheckers, this method returns false.
>      The word parameter represents the DOM string to check its spelling.
>      The language parameter represents a BCP-47
> <http://www.rfc-editor.org/rfc/bcp/bcp47.txt>  tag indicating the
> language code used by the integrated spellchecker.

This is the privacy violation, and not acceptable as such.
I wonder how to not expose native spellchecker data to web page, yet
support this use case. Or do we need yet another permission, which user
has to give to the page before the spellchecker API fully working.



>    * The window.spellCheckController.getSuggestionsForWord() method
>      Returns the list of suggestions for the specified word. This
> method returns a DOMStringList object consisting of words suggested by
> the integrated spellchecker.  When the specified words is a
> well-spelled word, this method returns an empty list. When the user
> agent does not have integrated spellcheckers, this method returns
> null.
>      The word parameter represents the DOM string to check its spelling.
>      The language parameter represents a BCP-47
> <http://www.rfc-editor.org/rfc/bcp/bcp47.txt>  tag indicating the
> language code used by the integrated spellchecker.
This is also part of the privacy problem.



>
> 2. Intefaces
>
> Window implements SpellCheckController;
>
> [Supplemental, NoInterfaceObject]
> interface SpellCheckController {
>    void removeMarkers(Node node);
>    bool addMarker(Node node, long start, long length, DOMStringList suggestions);
>    void checkWord(DOMString word, DOMString language);
>    DOMStringList getSuggestionsForWord(DOMString word, DOMString language);
> };
>
> Regards,
>
> Hironori Bono
> E-mail: hbono@google.com
>
>
>


So the API itself looks reasonable, but the privacy problem is quite 
major one.


-Olli
Received on Monday, 9 May 2011 10:42:20 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:45 GMT