W3C home > Mailing lists > Public > whatwg@whatwg.org > June 2006

[whatwg] Spellchecking mark III

From: Mikko Rantalainen <mikko.rantalainen@peda.net>
Date: Fri, 30 Jun 2006 13:23:49 +0300
Message-ID: <44A4FBB5.3050603@peda.net>
The more I think about this the more I believe that the correct 
choise would be to describe the expected content more accurately. 
The UA may then proceed to accurately turn spellchecking on or off. 
The problem is that the lang attribute allows only stuff defined in 
RFC 3066, which seems to support only ISO 639 defined language tags. 
That is, the expressable languages are limited to *spoken* languages.

Ian Hickson wrote:
> On Sun, 11 Jun 2006, Alexey Feldgendler wrote:
>> > Information like "this input field should have autoindent" is 
>> > presentational.
> 
> Yeah, but you'd have to say "auto-indent this like C++", which isn't. 
> IMHO.

Perhaps instead of using |spellcheck| attribute as a toggle, allow 
white space separated list of expected input languages. If user is 
expected to enter C++ code with English comments, then author should 
use markup such as

<textarea lang="zzz" spellcheck="c++ en">

for "no linguistic content" with spell checking for c++ and English.

An another option would be to expand the lang attribute to allow 
languages outside human languages. This has the added bonus that the 
lang attribute could describe also other content more accurately. 
RFC 3066 reserves language codes starting with "x-" for private use 
and that could be used to aid spellchecking, too. Unfortunately only 
A-Z,0-9 are allowed so perhaps something like

<textarea lang="x-cpp-en">

for private language cpp-en or "C++ with English comments". Or if 
lang attribute is extended to allow multiple languages listed then 
one could write

<textarea lang="en x-cpp">

for English text mixed with C++ code (which is less accurate than 
the x-cpp-en above).

The GMail "To:" input field could be expressed as

<textarea lang="x-mail-to">

and UAs that don't regognize language "x-mail-to" should turn off 
the spellchecking.

A typical blog input field could be encoded as

<textarea lang="x-html-fragment-en">

Here one sees more need for multiple language tags inside the "lang" 
attribute. It would make more sense to use lang="x-html-fragment en" 
or there would be need for *very* many private languages starting 
with "x-html-fragment-" including "x-html-fragment-sv-fi".

> On Fri, 23 Jun 2006, Sander Tekelenburg wrote:
>> 	[AUTHOR REQUIREMENTS]
>>
>>> Authors should set the document's language information, to enable user 
>>> agents to accurately determine which dictionary to use when checking 
>>> the spelling or grammar of user input.
>> IMO this "should" should be a "must".
> 
> What about if the author doesn't know the language?

ISO 639 Part 2 includes "und" for "undetermined language". A sane 
default for UA is to disable the spell checking. Or use some unknown 
heuristic to define the language itself.

> On Sat, 24 Jun 2006, Alexey Feldgendler wrote:
>> Even worse: when entering text in textarea, the user actually has a 
>> choice which language to write in. I think the user agent should 
>> provide, besides just the control to turn spellchecking on and off, a 
>> choice of languages.
> 
> Agreed.

If a form expects some English text to be entered, it would be wise 
to mark text written with any other language as incorrectly spelled. 
If author expects any language then he should specify lang="mul" for 
"multiple languages" (again, defined by ISO 639 part 2).

Again, a list of acceptable languages would be nice here.

-- 
Mikko
Received on Friday, 30 June 2006 03:23:49 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:28 UTC