W3C home > Mailing lists > Public > www-validator@w3.org > September 2016

Re: wrong language warning

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Sat, 24 Sep 2016 16:23:52 +0300
To: Marcus Beyer <contact@take-a-screenshot.org>, www-validator@w3.org
Message-ID: <bc475810-5198-b21d-3d02-81b90f4c2a14@cs.tut.fi>
22.9.2016, 23:43, Marcus Beyer wrote:

> Nu Html Checker thinks my Chinese page is in English:
>
> https://validator.w3.org/nu/?showoutline=yes&showimagereport=yes&doc=http%3A%2F%2Fwww.take-a-screenshot.org%2Fzh%2F
> <https://validator.w3.org/nu/?showoutline=yes&showimagereport=yes&doc=http://www.take-a-screenshot.org/zh/>

It’s just a wrong guess. Ignore it.

If you are interested in knowing why the tool makes the wrong guess, 
here’s the start of textual content of your page:

    English
    Español
    Português
    Deutsch
    Nederlands
    ­

    close
    Tweet

    close

      * Mac
      * Windows
      * iOS
      * Android
      * Chrome OS
      * KDE Plasma
      * GNOME
      * Websites

Windows

Looks mostly English (and surely not Chinese) to me, and apparently to 
the checker, which seems to look at the start of the content only.

I think the experimental language guesser in the checker should be 
disabled. It guesses wrong too often, and it causes confusion like this 
and makes it more difficult to deal with real problems.

And when it guesses right, detecting a mismatch between actual content 
language and declared content language (typically caused by authoring 
tools that routinely insert lang="en"), what’s the use? Sometimes it may 
help people to fix the lang attribute value, but such attributes are 
generally ignored by relevant software anyway. For example, Google 
ignores it and uses its own language-guessing technology.

Yucca
Received on Saturday, 24 September 2016 13:24:18 UTC

This archive was generated by hypermail 2.3.1 : Saturday, 24 September 2016 13:24:21 UTC