Re: wrong language warning from Michael[tm] Smith on 2016-09-26 (www-validator@w3.org from September 2016)

From: Michael[tm] Smith <mike@w3.org>
Date: Mon, 26 Sep 2016 21:30:58 +0900
To: Marcus Beyer <contact@take-a-screenshot.org>
Cc: www-validator@w3.org
Message-ID: <20160926123058.o7pz6emailygqcvo@sideshowbarker.net>

Marcus Beyer <contact@take-a-screenshot.org>, 2016-09-22 22:43 +0200:
> Archived-At: <http://www.w3.org/mid/3D58E0B8-3886-4CD4-A35B-CDD2B2DFC227@take-a-screenshot.org>
> 
> Nu Html Checker thinks my Chinese page is in English:
> 
> https://validator.w3.org/nu/?showoutline=yes&showimagereport=yes&doc=http%3A%2F%2Fwww.take-a-screenshot.org%2Fzh%2F

Thanks for taking time to report this. I’ve pushed a change that should
cause the checker to no longer misidentify the language of that document.

> I’m sorry, but this is not correct.

Yeah for a small number of cases—mostly for documents with a relatively
small amount of text—the checker sometimes misidentifies the language. I’ve
dealt with it for now by raising the minimum number of (non-whitespace)
characters it needs to see before it will attempt to do language detection.
I previously had that number set to 256 characters but have now raised it
to 512. But it may be that I still need to raise it further.

So if you run into other cases where it is still misidentifying the
language of any document, please do report it on this mailing list or at
https://github.com/validator/validator/issues/new

  —Mike

-- 
Michael[tm] Smith https://people.w3.org/mike

Received on Monday, 26 September 2016 12:31:33 UTC