- From: Sam Ruby <rubys@intertwingly.net>
- Date: Wed, 06 Dec 2006 09:13:26 -0500
James Graham wrote: > Ian Hickson wrote: >> On Tue, 5 Dec 2006, James Graham wrote: >>> As someone in the process of implementing a HTML5 parser from the >>> spec, my _only_ complaint so far is that there aren't (yet) any >>> testcases. >> >> If you could get together with the other people writing parsers and >> come up with a standard format for test cases, that would be really >> helpful. I have a few tests I could contribute, but I'd need a format >> to provide them in (they're currently not in a form that would be >> useful to you). > > Did you have a list for implementers somewhere? I think it would be a > very worthwhile effort to come up with a set of implementation > independent, self-describing (i.e. where the testcase itself contains > the expected parse tree in some form), testcases - but I think the > discussion should be on a separate list. Count me in. This is actually closer to the original reason why I originally subscribed to this list. If given a few tests, I could convert them into a useful form,and this form could serve as a model for future tests. My original interest was to write a replacement for Python's SGMLLIB, i.e., one that was not based on the theoretical ideal of how SGML vocabularies work, but one based on the practical notion of how HTML actually is parsed. My background: I originally wrote most of the back end for the feed validator, and continue to be its primary maintainer. I also contribute to the universal feed parser. The format of the test cases for both validator and parser are very similar: a standalone document with a structured comment. In the structured comment is an assertion. In the validator's case, it describes a message that is, or is not, expected to occur. In the parser's case, it describes what amounts to an XPath expression. I do believe that a similar approach could work here, not for 100% of the test cases, but close enough to handle the bulk of the cases. The rest can be handled separately. Additional things like mime type overrides could also be specified in this header. Samples: http://feedvalidator.org/testcases/ http://feedparser.org/tests/ My goal would be to produce something that I could use within the feedparser (and therefore, planet). - Sam Ruby
Received on Wednesday, 6 December 2006 06:13:26 UTC