- From: Joe Gregorio <jcgregorio@google.com>
- Date: Mon, 24 Feb 2014 13:11:19 -0500
- To: "Dr. Olaf Hoffmann" <Dr.O.Hoffmann@gmx.de>
- Cc: www-svg@w3.org, bugzilla@jessica.w3.org
- Message-ID: <CAA6qSUBaxcAjUgT3vWmyHZjQk6=Xkjk=U2YCHhmSqeg5eLzOpg@mail.gmail.com>
On Sat, Feb 22, 2014 at 9:25 AM, Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de>wrote: > Hello, > > I'm no expert in BNF notations, but the '|' obviously means only 'or' > and 'a or b' or 'b or a' is equivalent. Per meaning of 'or' the order in > such a list implies never an order of importance or relevance or > preference. > Because many W3C recommendations (for example) CSS use > a similar syntax of '|' without implying an order, all these should > be in trouble, in case the bug applies. > > About nonnegative-number: > In the given example one has the alternative between integers > and floating point numbers. > Examples for an integer according to the BNF: 117, 000117 > According to the definition of the floating point number, one can > note for example: 117E0, .117E+3, 11700e-2, 117., 117.0, 000117., > but not 117 or 000117 > > About number: > The examples above apply as well, one just can add an additional > sign: +117, -117 > and +117., -117. etc > > By analysing the number, it should be always possible to decide, > which alternative is noted, in case this matters. > OK, so I think part of the problem here is my use of the ambiguous terms first-match vs longest-match. What we are really seeing here is a difference between a PEG (parsing expression grammar) and a CFG (context free grammar). http://en.wikipedia.org/wiki/Parsing_expression_grammar http://en.wikipedia.org/wiki/Context-free_grammar http://en.wikipedia.org/wiki/Comparison_of_parser_generators The BNF supplied is a valid CFG, parsers built for CFGs do backtracking, which is one way to handle the issue around the ambiguity in the "nonnegative-number" rule. The other way to handle it is to rearrange the grammar so that it is non-ambiguous. >From the page http://en.wikipedia.org/wiki/Parsing_expression_grammar: """Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG.""" This isn't a theoretical problem, I am building a parser for SVG paths and as a start I copied the supplied BNF into a parser generator, unfortunately the parser generator I am using is a PEG and not a CFG parser generator: https://github.com/dmajda/pegjs#expression1--expression2----expressionn I am apparently not the only one to get tripped up by this, as this parser also has the same problem: https://www.npmjs.org/package/svg-path-parser If you change the following line in their example: , 'A25,35 -80 1,1 450,220 Z' to: , 'A25.0,35 -80 1,1 450,220 Z' then the parser will crash on what should be a valid path, since the 25 gets parsed as an"integer-constant" and leaves .0 on the input, which fails to parse as anything in the grammar. In this case I think the best solution is to rearrange the grammar slightly as suggested, as it makes it valid for both PEG and CFG. The alternative would be to add a sentence to the spec that the BNF is a CFG and not to be used in a PEG. > Can you give an example for an ambiguity, where a rule like > 'first match wins' would be meaningful or helpful for something? > I think, if this is the case and causes a difference for presentation, > indeed, this might be not intended for paths. > > If we have for example 'M117 117.' > Does it really matter, that the first one is noted as integer, the second > as > floating point number? For analysing coordinate pairs after M, I think it > is not relevant for presentation, if all coordinates are interpreted as > floating point numbers to simplify processing of the data. > > > Olaf >
Received on Monday, 24 February 2014 21:45:30 UTC