W3C home > Mailing lists > Public > public-css-archive@w3.org > January 2017

Re: [csswg-drafts] [css-text] Should 'hyphens: auto' work if lang="" is not declared?

From: Simon Pieters via GitHub <sysbot+gh@w3.org>
Date: Fri, 20 Jan 2017 13:01:24 +0000
To: public-css-archive@w3.org
Message-ID: <issue_comment.created-274066878-1484917283-sysbot+gh@w3.org>
F2F discussion: 
https://logs.csswg.org/irc.w3.org/css/2017-01-11/#e757028

> in httparchive hyphens:auto appears in 27,173 resources from 494,891
 pages

Did some more analysis today. (The January data set is 494,956 pages.)

The number of *pages* using `hyphens: auto` in total is 24,246. 
(~4.9%)

138,458 pages specify a language in `<html lang>`. (~28.0%)

Of those that do *not* specify a language, 18,282 resources in 16,402 
pages use `hyphens: auto`. (~3.3% of total; ~67.6% of all `hyphens: 
auto` pages).

https://gist.githubusercontent.com/zcorpan/150389ac7804f648202e3b6f99eb1c67/raw/5450a7d1e344d8311fc323cd3dfe8333fcf18615/no_lang_hyphens_auto.csv

---

So 3.3% of the top 500,000 pages, or two thirds of pages using 
`hyphens: auto`, are affected here, which is quite a lot. At the f2f 
it was argued that this is most likely to have a negative impact for 
European users with non-English system language reading English pages.

These matches can be further analyzed to determine which behavior is a
 net win for users. For example applying language heuristic, selecting
 English-detected pages, and applying hyphenation rules for some 
European languages, and making a judgement if the hyphenations that 
happen can cause confusion or unintended meaning. I do not have the 
bandwidth to do this myself at this time, so up for grabs.

(As an anecdote that applying hyphenation with the wrong language can 
be a real problem, see 
http://indesignsecrets.com/words-hyphenating-wrong-indesign.php# )

cc @litherum

-- 
GitHub Notification of comment by zcorpan
Please view or discuss this issue at 
https://github.com/w3c/csswg-drafts/issues/869#issuecomment-274066878 
using your GitHub account
Received on Friday, 20 January 2017 13:01:31 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 06:41:07 UTC