Re: SpellCheck API? from 坊野博典 on 2011-05-11 (public-webapps@w3.org from April to June 2011)

From: 坊野博典 <hbono@google.com>
Date: Wed, 11 May 2011 14:38:41 +0900
To: public-webapps@w3.org
Message-ID: <BANLkTin0dU3AhaCkbWPZ9z9JLqLoG2HAUg@mail.gmail.com>
Greetings all,

Thank you so much for all of your comments.
Even though I cannot answer all of them, I have added my responses to
some comments.

On Mon, May 9, 2011 at 5:58 PM, Hironori Bono (坊野 博典) <hbono@google.com> wrote:

> function CheckText(text) {
>   var result = new Array;
>   var app = new ActiveXObject('Word.Application');
>   var doc = app.Documents.Add();
>   doc.Content = text;
>   for (var i = 1; i <= doc.SpellingErrors.Count; i++) {
>     var spellingError = doc.SpellingErrors.Item(i);
>     for (var j = 1; j <= spellingError.Words.Count; j++) {
>       var word = spellingError.Words.Item(j);
>       var error = {};
>       error.word = word.Text;
>       error.start = word.Start;
>       error.length = word.Text.length;
>       error.suggestions = new Array;
>       var suggestions = word.GetSpellingSuggestions();
>       for (var k = 1; k <= suggestions.Count; k++) {
>         error.suggestions.push(suggestions.Item(k).Name);
>       }
>       result.push(error);
>     }

Sorry. I forgot adding app.Quit(false) here to terminate Microsoft Word.

>   }
>   return result;
> }

On Mon, May 9, 2011 at 7:41 PM, Olli Pettay <Olli.Pettay@helsinki.fi> wrote:

> Providing scripting access to built-in spellchecker is a privacy
> violation (this has been discussed in @whatwg mailing list) -
> web page could know which language users uses/has for spellchecking
> and if user has added new word to the known-words list.

Thank you for noticing it. I remembered this discussion. Even though I
do not have clear solutions for this privacy violation now, it may be
a good idea to focus on methods that help implementing custom
spellcheckers until we find one. (That is, I will remove methods that
have privacy concerns until we find solutions for them.)

On Tue, May 10, 2011 at 4:39 AM, Aryeh Gregor <Simetrical+w3c@gmail.com> wrote:

> It would be much simpler for authors if the UA just fired an event
> every time it did a spellcheck.  The event might work like this:

Thank you for your comment. Even though I initially thought of this
option, I abandoned it because I did not find good ideas that
satisfied all the requests from web-application developers with this
event.

> * Every time the UA would normally invoke its spellchecker on a word,
> it fires a spellcheck event at the element in question, which bubbles
> (so authors can set a handler on the body if they like).  This has to
> occur when a spellcheckable element first loads, if an element becomes
> spellcheckable when it wasn't before, or whenever the user modifies a
> spellcheckable element such that the spellchecker would normally fire
> (e.g., when they finish typing a word).

When I talked with web-application developers, some of them liked to
check spellings of words not only when a user types words but also
when JavaScript code creates an editable element with prepopulated
text. For example, a web application checks text in a To: field with
its custom spellchecker and adds misspelled underlines under "invalid"
e-mail addresses. (This example is mainly for notifying users that
they may be replying phishing e-mails.) Some other web-application
developers also like to check spelling of words in an editable element
before sending text to a server. To satisfy these requests, a user
agent may need to send spellcheck events also when JavaScript code
creates an editable node or changes text in an editable node. (Even
though I have not measured how much time it takes to send these events
without JavaScript execution, it may hurt the speed of JavaScript
code.)

> * The event object should provide the text of the word whose spelling
> needs to be checked.  It should give the node and start/end offsets,
> either of the input/textarea or the text node.  (Not sure what should
> happen for a misspelled word that's not all in one text node.)

When I talked with web-application developers, some of them liked to
integrate n-gram spellcheckers so they can correct simple grammatical
errors, such as article-noun mismatches ("a apple" -> "an apple") and
subject-verb mismatches ("he have" -> "he has"). To satisfy their
requests, a user agent may need to send two words or more (up to all
words in an editable element).

> This means authors wouldn't have to do word-breaking themselves, which
> is a big advantage, since word-breaking can be very complicated.

I agree a word-breaking algorithm is complicated, especially languages
that do not insert space characters between words (e.g. Chinese,
Japanese, Thai, etc.) The word-breaking algorithm becomes more
complicated if it includes breaking a compound word (used by German,
Hungarian, Turkish, etc.) into words. Some web-application developers
like a user agent to do it. (Maybe a user agent needs to provide
another method that breaks text into words?) On the other hand, other
web-application developers would like to split text into words by
themselves. As noted above, some developers like to integrate n-gram
spellcheckers and they do not like for a user agent to split text into
words.

>  It  would be *much* simpler to just plug in a spell-checker, without
> having to write a lot of scaffolding code to track what text the user
> is entering.  The only downside I can see is that it does force you to
> use the browser's word-breaking behavior, but I don't think that's a
> big disadvantage compared to the advantages.

My proposal started from satisfying requests from web-application
developers, including the ones described above. That is, my proposal
sacrificed simpleness to satisfy all their requests. I'm not sure what
is the best option, though. (This is exactly the reason why I sent my
draft idea here.)

Regards,

Hironori Bono
E-mail: hbono@google.com
Received on Wednesday, 11 May 2011 05:39:05 UTC