- From: Jaime Iniesta <jaimeiniesta@gmail.com>
- Date: Sun, 29 Sep 2013 20:37:11 +0200
- To: "Michael[tm] Smith" <mike@w3.org>
- Cc: "www-validator@w3.org" <www-validator@w3.org>
- Message-ID: <CAKFFWV-5dU6aWJKNrQoCL6zNqzxogyhQToaTXXDLD8hzEp2XYw@mail.gmail.com>
Hey Mike, 2013/9/29 Michael[tm] Smith <mike@w3.org> > Hi Jaime, > > Jaime Iniesta <jaimeiniesta@gmail.com>, 2013-09-25 21:15 +0200: > > > Hi all, > > > > Validating this URL on Validator.nu makes it crash with a "Fatal Error: > > Cannot recover after last error. Any further errors will be ignored." > > > > > http://validator.nu/?doc=http%3A%2F%2Fvalidationhell.com%2Fpages%2Fabyss%2F99 > > > > Please notice that this site is intentionally invalid, it's the one I use > > to test validators, but I thought you'd like to be notified of this > issue. > > The reason for any "Cannot recover after last error" message you get from > the validator is that in the backend we run the HTML parser in truly > streaming mode. That's for performance reasons, among other reasons. > > However, there are some cases of parsing behavior defined in the spec > which require non-streaming behavior. The document at > http://validationhell.com/pages/abyss/99 has an instance of such a case. > Another example is <table><input></table> which per the HTML spec needs to > end up in the DOM with the <input> element foster-parented to be before the > <table> start tag. > > If you want the HTML parser in a validator instance to not stop when in > encounters errors that require non-streaming recovery, you need to alter > the code for your instance to not have the parser call > setStreamabilityViolationPolicy(XmlViolationPolicy.FATAL), which is the > thing in the existing code that causes the validator to run in truly > streaming mode. > > Of course if you do that you're no longer going to be running the validator > in streaming mode, and no longer going to be getting the benefits of that. > > Thanks a lot for the detailed explanation! Is there a measure of the performance boost we get when working on streaming mode? Is it worth not being able to fully validate many documents? Don't take http://validationhell.com for more than what it is: a site intentionally invalid to test validators, but consider that real sites out there have this same kind of errors, they can't be fully validated because of this streaming mode. For example, http://twitter.com http://validator.w3.org/nu/?doc=http%3A%2F%2Ftwitter.com And a Google search for the "Cannot recover for the last error"... message throws more than 200K results: https://www.google.com/search?q=%22cannot+recover+after+last+error.+any+further+errors+will+be+ignored%22 I think that, as Validator.nu would be able to validate those documents in non-streaming mode, a good solution would be first trying the streaming mode, and, if this particular error happens, retry in non-streaming mode. If Validator.nu can't be fixed so it can fully validate documents, then I think having a clearer error message would help. Something that would include that it couldn't recover after the last error because of the streaming mode. Anyway, just my 2 cents about improving the validator. Thanks! Jaime
Received on Sunday, 29 September 2013 18:37:39 UTC