[svgwg] Change arc grammar to coordinates rather than numbers and flags. (#755)

tatarize has just created a new issue for https://github.com/w3c/svgwg:

== Change arc grammar to coordinates rather than numbers and flags. ==
Change the Path EBNF grammar for arc arguments to eliminate other number classes.

https://www.w3.org/TR/SVG/paths.html#PathElement

> elliptical-arc-argument:
>     nonnegative-number comma-wsp? nonnegative-number comma-wsp? 
>         number comma-wsp flag comma-wsp? flag comma-wsp? coordinate-pair

Current draft SVG 2.0

> elliptical_arc_argument::=
>     number comma_wsp? number comma_wsp? number comma_wsp
>     flag comma_wsp? flag comma_wsp? coordinate_pair

I request that the grammar be tweaked so that it instead accepts:

> elliptical_arc_argument::=
>     coordinate comma_wsp? coordinate comma_wsp? coordinate comma_wsp
>     coordinate comma_wsp? coordinate comma_wsp? coordinate_pair

Only arc wants the signless `number` or the `flag` values. The arc grammar is decidedly weird. Everything else just takes some number of coordinate number operands. And it would simplify things if everything overtly did this. I am suggesting that everything be able to be explicitly parsed as a float.

The problem is this path:

```
 <path d="M200,120 h-25 a25,25 0 1125,25 z" fill="lime"/>
```

This path breaks a really nice pattern for SVG path grammar. I've used similar compact grammar for other things too and it's actually great. Even outside of the context of SVG the compactified path-like grammar is highly useful, because it uses a single letter then a sequence of numbers. These all fit the same pattern which makes parsing them easy. Except this path, blows that all up.

We can almost always find the letter assigned to a command read in the floats that follow that command and do what you need, shoe-horn the numbers into the set ranges as the need arises. I thought it was always valid to parse the easy grammar, and it would just be a more liberal superset of the full grammar, which sometimes grammatically excluded values that didn't fall into a particular range (positive numbers, or 0/1). Doing it this way makes capturing the grammar easy, or so I thought. In the above case, since flag is only `0` or `1` then `1125` is unambiguously `1 1 25`.  So by the rules you can just scrunch the numbers together like that, which breaks the larger pattern.  A mere float parser would read that as 1125 and wonder where the remaining values are.

Reading the grammar we are told that it's an overt parsing error if you try to express negative rx and ry. But, in this case we are also told exactly how to get these values into range.

- "If either rx or ry have negative signs, these are dropped; the absolute value is used instead."

The same is true for the flags, but they are either 0 or 1. And any non-0 value is taken as a 1. Though this should be changing in SVG 2.0 to flip at 0.5.

But, literally everywhere other than in arc the parsing floats is a superset. In fact, everywhere else it's 1:1. So you can do `re.compile('[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?').find_all(string)` and get every operand you need. Except for this here. Because the flags can unambiguously scrunch together, they fail the otherwise lovely. It means SVG grammar cannot be a subset of greedy regex float parsing.

I request that everything be treated as a signed floating point number, such that the flags cannot be unambiguous like that, and thus ambiguously look like parts of numbers.

Doing this will also simplifies the grammar. Basically we accept any numbers for any operands. So we only really need one operand type. Which would just be a signed number, the same thing CSS takes as a number. We then just say how out-of-range values are coaxed into the correct range. We are breaking robustness a bit by grammatically excluding things like negative rx, ry or 1.0 as a flag in grammar when these things aren't ambiguous. These values can be more easily excluded by explaining how they are brought into spec. Rather than throw a parse error for these out of the gate. We should obviously produce paths that fit this grammar, but we should read paths as supersets of floats.

It seems like grammar is being used to exclude these values, when it's easier to say they are grammatically correct but functionally wrong. But, it also seems like doing that weirdly permits things like `M0,0A 5,5 0 10.050.05` which I would read wrongly if I didn't know the grammar. Since everywhere else in a path `10.050.05" is 10.05 and 0.05.

Please view or discuss this issue at https://github.com/w3c/svgwg/issues/755 using your GitHub account

Received on Tuesday, 26 November 2019 07:44:07 UTC