- From: Michael[tm] Smith <mike@w3.org>
- Date: Mon, 10 Nov 2014 12:07:24 +0900
- To: "Jukka K. Korpela" <jkorpela@cs.tut.fi>, Roman Grinyov <w3lifer@gmail.com>
- Cc: www-validator@w3.org
- Message-ID: <20141110030724.GQ4173@jay.w3.org>
Hi Jukka, > Date: Mon, 27 Oct 2014 00:29:57 +0200 > From: "Jukka K. Korpela" <jkorpela@cs.tut.fi> > Archived-At: <http://www.w3.org/mid/544D75E5.9030602@cs.tut.fi> ... > The culprit appears to be on line 48: > > <p>10 строка | &w3 </p> > > Validating this line in isolation, with a minimal document around it, > results in a correct message that points to the “&w3” construct. > > The bug in the validator is that it does not report this properly at > all in the given context but instead flags completely correct > character references *before* it as erroneous. > > The bug is reproducible at http://validator.nu too. Thanks for examining this, and thanks to Roman for reporting it. It's definitely a bug. The message is this case is coming from the HTML parser but I can't reproduce it in "View source" in Firefox (which uses the same HTML parser): view-source:http://websnippets.ru/article.php?id=30 (mouse over the "&w3") ...so it seems a problem specific to the validator usage of the HTML parser. This is minimally reproducible with the following document: <!doctype html><title>test</title>><textarea>&w3</textarea> http://validator.w3.org/nu/?showsource=yes&doc=data%3Atext%2Fhtml%3Bcharset%3Dutf-8%2C%3C%2521doctype%2520html%3E%3Ctitle%3Etest%3C%252Ftitle%3E%2526gt%253B%3Ctextarea%3E%2526w3%3C%252Ftextarea%3E If you replace the `texarea` with a `span` or whatever, you can't reproduce it. That makes some sense because the `textarea` elements have special code path in the parser, along with `title` elements. So I kinda expect the core problem here is, the validator code isn't passing on line-number info correctly to the parser when processing `textarea` and `title` elements. Here's an even more minimal case: <!doctype html><title>&w3</title> http://validator.w3.org/nu/?showsource=yes&doc=data%3Atext%2Fhtml%3Bcharset%3Dutf-8%2C%3C%2521doctype%2520html%3E%3Ctitle%3E%2526w3%3C%252Ftitle%3E For that case, the validator just reports "Error: & did not start a character reference. (& probably should have been escaped as &.)", without reporting line+col numbers at all or flagging the position. So I think the root cause of the problem Roman ran into is that the validator doesn't have any line-number info to report in this case, and then the parser's character-reference reporting isn't getting re-initialized correctly, so it reports the position of the last character reference it checked that did have a line+col numbers. Anyway, I've filed a bug http://bugzilla.validator.nu/show_bug.cgi?id=1010 and I'll try to make some time soon to investigate the code around this. --Mike -- Michael[tm] Smith https://people.w3.org/mike
Received on Monday, 10 November 2014 03:07:26 UTC