[svgwg] Subtleties of the path data grammar (#752)

dalboris has just created a new issue for https://github.com/w3c/svgwg:

== Subtleties of the path data grammar ==
As a lot of us know, the path data grammar is a little convoluted for the sake of path data conciseness.

For example, the spec makes it clear that something like "M0.6.5L100-200" is an intentionally valid path data, equivalent to "M 0.6 0.5 L 100 -200".

In this GitHub issue, I'd like to dig deeper in the rabbit hole, by giving more examples of contrived (or not so contrived) examples, and asking the following three questions for each:

1. Is it currently formally allowed by the grammar?
2. Is it intentionally allowed (or disallowed)?
3. If answers to 2. is "no", should we change the grammar to reflect intent?

I'll just start with a few examples about whitespaces, and we can add more later. I name the examples so we can reference them. This GitHub issue could be used as clarification of intent.

References:

SVG 1.1 grammar: https://www.w3.org/TR/SVG11/paths.html#PathDataBNF
SVG 2-draft grammar: https://svgwg.org/svg2-draft/paths.html#PathDataBNF

# Examples about whitespaces (W)

## Example W1 - trailing whitespaces after path

```
"M1 2L3 4 "
         ^
```

**1. Is it currently formally allowed by the grammar?**

I believe it is allowed by SVG 1.1 grammar, but forbidden by the SVG 2 draft. Relevant parts of the grammar:

```
SVG 1.1      svg-path ::= wsp* moveto-drawto-command-groups? wsp*
SVG 2-draft  svg_path ::= wsp* moveto? (moveto drawto_command*)?
```

(of course, one has to dig deeper into SVG 2-draft's definition to conclude)

**2. Is it intentionally disallowed in SVG 2?**

I hope not. Whitespaces can be used if we prefer to place the closing `"` on its own line, for example, for manually edited SVGs. Besides, leading whitespaces are allowed, so this would be inconsistent.

**3. Should we change the grammar to reflect intent?**

I personally think we should.

## Example W2 - trailing whitespaces after command

```
"M1 2 L3 4"
     ^
```

**1. Is it currently formally allowed by the grammar?**

I believe it is allowed by SVG 1.1 grammar, but forbidden by the SVG 2 draft. Relevant parts of the grammar:

```
SVG 1.1      moveto-drawto-command-group ::= moveto wsp* drawto-commands?
SVG 2-draft  svg_path ::= wsp* moveto? (moveto drawto_command*)?
```

**2. Is it intentionally disallowed in SVG 2?**

I believe this must be an accidental oversight, since even the [first example](https://svgwg.org/svg2-draft/paths.html#PathDataGeneralInformation) of path data contains those whitespaces: `"M 100 100 L 300 100 L 200 300 z"`,

**3. Should we change the grammar to reflect intent?**

Obviously, we should. Together with W1, I believe we should have:

```
path ::= wsp* (moveto wsp* commands)? wsp*
commands ::= command | (command wsp* commands)
```

That is, reverting back to something more similar to SVG 1.1 (but still slightly simpler), and removing from SVG 2 the initial `moveto?` which is completely redundant with the one within the `(moveto drawto_command*)?` group, unless I'm missing something.


## Example W3 - optional whitespaces between arguments

`"M12L34"`

**1. Is it currently formally allowed by the grammar?**

Yes and no.

Formally, it is allowed by the [E]BNF in either SVG 1.1 or SVG 2, since whitespaces are almost always optional between argument (the only exception is before the first `flag` of an elliptical arc, where it is mandatory). So this example can be parsed as `"M 1 2 L 3 4"`.

However, the prose below says:

> The processing of the EBNF **must** consume as much of a given EBNF production as possible, stopping at the point when a character is encountered which no longer satisfies the production.
>
> (emphasis mine) 

which means that when parsing the number after the L, we are not allowed to stop at `1`, we must continue and also consume the `2`. So the first number becomes `12`, and the lack of a second number makes the path invalid.

**2. What's the intent?**

I believe the intent is that `"M12L34"` should be invalid, of which I would whole-fully agree. 

Although interestingly, this example is unambiguous: there is only one valid way the syntax would make it valid, which is "M 1 2 L 3 4". The same is not true for `"M123L45"`, which could be *in theory* parsed as either `"M 1 23 L 4 5"` or `"M 12 3  L 4 5"`.

This is because the EBNF is in fact ambiguous as is, and only disambiguated via the prose, mandating parsers to be "greedy".

**3. Should we change the grammar to reflect intent?**

Changing the grammar to formally allow `"M0.6.5"` but disallow `"M12"` would be extremely difficult and make the grammar quite unreadable, I believe. Which I guess is why we have the prose "clarification".

However, this whole grammar ambiguity is because we allowed things like `"M0.6.5"`in the first place. I would personally advocate to force the presence of comma_wsp at least between command arguments (to make the grammar unambiguous):

```
"M0.6.5L100-200"   =>   "M0.6 .5L100 -200"
```

But also force the presence of whitespaces between commands and between a command character and its arguments (to make it I believe more consistent with CSS tokenizer -- to be confirmed by someone who knows better):

```
"M0.6 .5L100 -200"   =>   "M 0.6 .5 L 100 -200"
```

This is obviously a backward incompatible change, like the one of disallowing the trailing decimal `"nn."`. However, we could simply note in the documentation that it has been disallowed in SVG 2 since the importance of data compactness isn't as important today as it was in the early days of SVG, and that consistency with CSS parsing rules, as well as making the formal grammar unambiguous, was deemed more important.

It's a rather trivial change for tooling generating SVG 2 documents to force the generation of these whitespaces. And obviously, they already know how to parse them.

Any thoughts?

Please view or discuss this issue at https://github.com/w3c/svgwg/issues/752 using your GitHub account

Received on Tuesday, 19 November 2019 18:45:22 UTC