Re: [Bug 24769] New: SVG Path BNF is ambigious

On Sat, Feb 22, 2014 at 9:25 AM, Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de>wrote:

> Hello,
>
> I'm no expert in BNF notations, but the '|' obviously means only 'or'
> and 'a or b' or 'b or a' is equivalent. Per meaning of 'or' the order in
> such a list implies never an order of importance or relevance or
> preference.
> Because many W3C recommendations (for example) CSS use
> a similar syntax of '|' without implying an order, all these should
> be in trouble, in case the bug applies.
>
> About nonnegative-number:
> In the given example one has the alternative between integers
> and floating point numbers.
> Examples for an integer according to the BNF: 117, 000117
> According to the definition of the floating point number, one can
> note for example: 117E0, .117E+3, 11700e-2, 117., 117.0,  000117.,
> but not 117 or 000117
>
> About number:
> The examples above apply as well, one just can add an additional
> sign: +117, -117
> and +117., -117. etc
>
> By analysing the number, it should be always possible to decide,
> which alternative is noted, in case this matters.
>

OK, so I think part of the problem here is my use of the ambiguous terms
first-match vs longest-match. What we are really seeing here is a difference
between a PEG (parsing expression grammar) and a CFG (context free grammar).

http://en.wikipedia.org/wiki/Parsing_expression_grammar
http://en.wikipedia.org/wiki/Context-free_grammar
http://en.wikipedia.org/wiki/Comparison_of_parser_generators

The BNF supplied is a valid CFG, parsers built for CFGs do backtracking,
which
is one way to handle the issue around the ambiguity in the
"nonnegative-number" rule.
The other way to handle it is to rearrange the grammar so that it is
non-ambiguous.
>From the page http://en.wikipedia.org/wiki/Parsing_expression_grammar:

"""Syntactically, PEGs also look similar to context-free grammars (CFGs),
but they have a different
   interpretation: the choice operator selects the first match in PEG,
while it is ambiguous in CFG."""

This isn't a theoretical problem, I am building a parser for SVG paths and
as a start
I copied the supplied BNF into a parser generator, unfortunately the parser
generator
I am using is a PEG and not a CFG parser generator:

  https://github.com/dmajda/pegjs#expression1--expression2----expressionn

I am apparently not the only one to get tripped up by this, as this parser
also has the same problem:

  https://www.npmjs.org/package/svg-path-parser

If you change the following line in their example:

    , 'A25,35 -80 1,1 450,220 Z'

to:

    , 'A25.0,35 -80 1,1 450,220 Z'

then the parser will crash on what should be a valid path, since the 25
gets
parsed as an"integer-constant" and leaves .0 on the input,
which fails to parse as anything in the grammar.

In this case I think the best solution is to rearrange the grammar
slightly as suggested, as it makes it valid for both PEG and CFG.
The alternative would be to add a sentence to the spec that the BNF
is a CFG and not to be used in a PEG.


> Can you give an example for an ambiguity, where a rule like
> 'first match wins' would be meaningful or helpful for something?
> I think, if this is the case and causes a difference for presentation,
> indeed, this might be not intended for paths.
>
> If we have for example 'M117 117.'
> Does it really matter, that the first one is noted as integer, the second
> as
> floating point number? For analysing coordinate pairs after M, I think it
> is not relevant for presentation, if all coordinates are interpreted as
> floating point numbers to simplify  processing of the data.
>
>
> Olaf
>

Received on Monday, 24 February 2014 21:45:30 UTC