- From: Simon Sapin <simon.sapin@kozea.fr>
- Date: Tue, 12 Jun 2012 16:48:31 +0200
- To: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>
- CC: "Tab Atkins Jr." <jackalmage@gmail.com>, WWW Style <www-style@w3.org>
Le 12/06/2012 08:52, Kang-Hao (Kenny) Lu a écrit : > (12/06/12 14:32), Simon Sapin wrote: >> Le 12/06/2012 08:15, Kang-Hao (Kenny) Lu a écrit : >>> I only feel strongly that we should document the difference between >>> "Parse Error" and the CSS 2.1 "Core Grammar", so for whoever implements >>> this grammar (e.g. tinycss) this is still trackable. >> >> I plan to update tinycss as soon as css3-syntax is stable enough. >> >> I realize this might be a breaking change for pretty much any usage of >> tinycss, but I think that the project is still young enough to afford it. > > Can you provide some examples about this? Some objections to Core > Grammar changes are based on the assumption that changing it is breaking > tools, so it would be helpful to understand more about it. Ok, here is an example: tinycss 0.2 does not implement exactly the CSS 2.1 core grammar but something based on it. In particular, it has different token types for INTEGER and (non-integer) NUMBER. In WeasyPrint 0.9 I have a function for each property that takes a list of tokens and parses the value. For example, the 'orphans' property checks that there is a single INTEGER token. https://github.com/Kozea/WeasyPrint/blob/v0.9/weasyprint/css/validation.py#L652 Now if tinycss 0.3 changes to match css3-syntax, the INTEGER token type will disappear and NUMBER tokens will get an 'is_integer' flag. When WeasyPrint 0.9 gets such a token for 'orphans', it will incorrectly reject it as invalid. Therefore, tinycss 0.3 will be backward-incompatible with 0.2 and WeasyPrint will need to be adapted. This is not too much of a problem because I maintain both, but breaking stuff like this is not very nice to other users of tinycss. (I don’t know of any, but maybe they just don’t tell me.) > Also, I have some questions out of curiosity. > > 1. What is the benefit of making the CSS 2.1 parser throw when there's > an input not following the core grammar? Would giving warnings be a > better approach? If you give it a string, tinycss is never supposed to raise an exception. (This is the Python name for what I assume you mean by "throw".) If it does, it’s a bug. Instead, it is supposed to return a Stylesheet object. In additions to rules (statements), this object has a list of "parse errors". On an invalid input (that does not matches the core grammar), tinycss should read until the end of the declaration or rule and continue (this is the specified error recovery behavior) after logging a "parse error". Maybe the "parse error" name is bad, because these are effectively warnings. Nothing fatal. > 2. Is it possible to build a parser on top of tinycss which never throws > and follows the error handling rules of CSS 2.1 like a browser? That is what it should do. And what it does, as far as I know. I use exceptions internally for flow control, but these are not supposed to interrupt the parser or to be propagated to the user. If you have a specific input that causes tinycss to raise an exception that is propagated to the user, it is a bug. I am interested to know about these. (Reports can go to the github issue tracker, the WeasyPrint mailing list, or private email to me.) Selectors however are another story. I took over maintenance of cssselect after extracting it from lxml, but I’m not the original author. cssselect has its own tokenizer and parser which (in the current version, 0.6.1) is broken is more ways than I know. The git version is better (with backslash-escapes actually implemented) but can still produce XPath expressions (which in turn cause exceptions). This is work-in-progress. > 3. Does tinycss, as it is, need a special conformance class so that it > can be considered conforming (e.g. The HTML spec defines a bunch of > non-browser conformance classes. It also says a UA can do the error > handling *or* fail at the first error encountered.), I don’t think that a special conformance class is needed. Actually css3-syntax already has this: # Certain points in the parsing algorithm are said to be parse errors. # The error handling for parse errors is well-defined: user agents # must either act as described below when encountering such problems, # or must abort processing at the first error that they encounter for # which they do not wish to apply the rules described below. But I’m not sure that allowing to stop at the first "error" is a good idea. At least this is not what I want to do in my implementation. Error recovery and fallback are pretty fundamental in CSS. > since there are a > bunch of test cases in the test suite which will just make tinycss throw? Such test cases just mean that I haven’t spent enough time testing. Is this in the CSS 2.1 test suite? By the way, we have a "test runner": python -m weasyprint.tests.w3_test_suite.web It’s not really polished, packaged or documented but it is better than nothing. Please ask if you’re interested and I can help. -- Simon Sapin
Received on Tuesday, 12 June 2012 14:48:58 UTC