W3C home > Mailing lists > Public > www-validator@w3.org > May 2017

Re: HTML checker has misidentified the language of this document

From: Jukka K. Korpela <jukkakk@gmail.com>
Date: Mon, 15 May 2017 09:41:15 +0300
Message-ID: <CAGHxYa7sHJSmwqC1Ygj0ZBxXac+-PRPhLT8cOiaJyE6_WXn7JA@mail.gmail.com>
To: "dean.bullen@googlemail.com" <dean.bullen@googlemail.com>
Cc: "www-validator@w3.org" <www-validator@w3.org>
On Fri, May 12, 2017 at 5:20 PM, dean.bullen@googlemail.com <
dean.bullen@googlemail.com> wrote:

>
>
> Just to let you know the checker thinks the following page is French –
> it’s not, it’s English.
>
>
>
> http://www.impartica-training.co.uk/course-schedule.aspx
>
>
>
>
> This is a mysterious case, as wrong language guesses often are. There is
nothing in the textual content that suggests French as the language (except
that a few English words in it are also French). I tried to isolate the
issue by validating a document that essentially consists of just a part of
the <table> element there. If I take about 600 lines, it does not make the
complaint; if I take about 1,000 lines, it does. I don’t think it’s
anything specific in the lines between; they look very similar in textual
content to the other lines. It’s probably some oddity in the heuristics
that computes frequencies of letters or letter combinations, or something
like that.

Nothing to worry about, of course. The validator’s heuristic language
guesser just made an odd wrong guess.

Yucca
Received on Monday, 15 May 2017 06:41:49 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:00:00 UTC