Re: [css-text] word-break for Korean from Florian Rivoal on 2015-03-05 (www-style@w3.org from March 2015)

From: Florian Rivoal <florian@rivoal.net>
Date: Thu, 5 Mar 2015 11:17:50 +0100
To: Koji Ishii <kojiishi@gmail.com>
Cc: Asmus Freytag <asmusf@ix.netcom.com>, "Jungshik SHIN (신정식)" <jshin1987+w3@gmail.com>, www-style list <www-style@w3.org>, www International <www-international@w3.org>
Message-Id: <D497DDD1-504B-4A10-A412-C5BB589D7FA7@rivoal.net>

> On 05 Mar 2015, at 06:41, Koji Ishii <kojiishi@gmail.com> wrote:
> 
> Maybe the better approach is to have lang="auto", and do the content
> sniffing; e.g., if the paragraph contains a Hangul character, consider
> it a Korean, and enable lang(ko) selector?

Interesting idea, which could be useful not only for language dependent text-formatting, but also for spell checkers.

For it to be useful though, you would definitely need more sophisticated heuristics than switching to Korean if there is one or more hangul. Maybe languages share the same characters, so you'll need dictionaries, and in case of contradictory information (hangul and hiragana, English and Spanish words...) decide one way or the other.

But it's certainly doable, as you can easily check by playing around with google translate's language auto-detection.

But we don't want to encourage people to use this instead of tagging their documents, even if we add this, it certainly shouldn't be the default, we may want to restrict its use to contexts where the user is entering the text (form controls, contenteditable).

Arguably, this could be done by piece of author-provided javascript, besides the fact that making this work reliably and for many languages is beyond the reach of most authors (but they can use a third party service, so that's not a blocker), this is probably something where you'd want consistency across websites, especially as it ties to spellchecking.

 - Florian

Received on Thursday, 5 March 2015 10:18:15 UTC