[svgwg] Use shorter symbol names in SVG path data grammar (#751)

dalboris has just created a new issue for https://github.com/w3c/svgwg:

== Use shorter symbol names in SVG path data grammar ==
# TL;DR

I propose the following grammar. It is exactly the same as the current one, but with shorter symbol names to make rules fit in single lines, increasing readability.

```
path ::= wsp* moveto? (moveto command*)?

command ::=
    moveto
    | closepath
    | lineto
    | hlineto
    | vlineto
    | ccurveto
    | scurveto
    | qcurveto
    | tcurveto
    | arcto

moveto    ::= ("M" | "m") wsp* coord_2s
closepath ::= ("Z" | "z")
lineto    ::= ("L" | "l") wsp* coord_2s
hlineto   ::= ("H" | "h") wsp* coords
vlineto   ::= ("V" | "v") wsp* coords
ccurveto  ::= ("C" | "c") wsp* coord_6s
scurveto  ::= ("S" | "s") wsp* coord_4s
qcurveto  ::= ("Q" | "q") wsp* coord_4s
tcurveto  ::= ("T" | "t") wsp* coord_2s
arcto     ::= ("A" | "a") wsp* arcto_args

coord     ::= sign? unsigned
coord_2   ::= coord cw? coord
coord_4   ::= coord cw? coord cw? coord cw? coord
coord_6   ::= coord cw? coord cw? coord cw? coord cw? coord cw? coord
arcto_arg ::= unsigned cw? unsigned cw? coord cw flag cw? flag cw? coord_2

coords     ::= coord     | (coord     cw? coords)
coord_2s   ::= coord_2   | (coord_2   cw? coord_2s)
coord_4s   ::= coord_4   | (coord_4   cw? coord_4s)
coord_6s   ::= coord_6   | (coord_6   cw? coord_6s)
arcto_args ::= arcto_arg | (arcto_arg cw? arcto_args)

unsigned ::= frac exp?
frac     ::= (digit* "." digit+) | digit+
exp      ::= ("e" | "E") sign? digit+

cw    ::= (wsp+ ","? wsp*) | ("," wsp*)

sign  ::= "+" | "-"
flag  ::= "0" | "1"
wsp   ::= #x9 | #xA | #xC | #xD | #x20
digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
```

# Long Version

I would like to suggest purely stylistic changes to the path data grammar to make it more readable. Importantly, I believe the current symbol names are way too long, and that this is the most significant barrier to readability.

For comparison, here is the currently published grammar draft, as of 2019-11-18:

https://svgwg.org/svg2-draft/paths.html#PathDataBNF

```
svg_path::= wsp* moveto? (moveto drawto_command*)?

drawto_command::=
    moveto
    | closepath
    | lineto
    | horizontal_lineto
    | vertical_lineto
    | curveto
    | smooth_curveto
    | quadratic_bezier_curveto
    | smooth_quadratic_bezier_curveto
    | elliptical_arc

moveto::=
    ( "M" | "m" ) wsp* coordinate_pair_sequence

closepath::=
    ("Z" | "z")

lineto::=
    ("L"|"l") wsp* coordinate_pair_sequence

horizontal_lineto::=
    ("H"|"h") wsp* coordinate_sequence

vertical_lineto::=
    ("V"|"v") wsp* coordinate_sequence

curveto::=
    ("C"|"c") wsp* curveto_coordinate_sequence

curveto_coordinate_sequence::=
    coordinate_pair_triplet
    | (coordinate_pair_triplet comma_wsp? curveto_coordinate_sequence)

smooth_curveto::=
    ("S"|"s") wsp* smooth_curveto_coordinate_sequence

smooth_curveto_coordinate_sequence::=
    coordinate_pair_double
    | (coordinate_pair_double comma_wsp? smooth_curveto_coordinate_sequence)

quadratic_bezier_curveto::=
    ("Q"|"q") wsp* quadratic_bezier_curveto_coordinate_sequence

quadratic_bezier_curveto_coordinate_sequence::=
    coordinate_pair_double
    | (coordinate_pair_double comma_wsp? quadratic_bezier_curveto_coordinate_sequence)

smooth_quadratic_bezier_curveto::=
    ("T"|"t") wsp* coordinate_pair_sequence

elliptical_arc::=
    ( "A" | "a" ) wsp* elliptical_arc_argument_sequence

elliptical_arc_argument_sequence::=
    elliptical_arc_argument
    | (elliptical_arc_argument comma_wsp? elliptical_arc_argument_sequence)

elliptical_arc_argument::=
    number comma_wsp? number comma_wsp? number comma_wsp
    flag comma_wsp? flag comma_wsp? coordinate_pair

coordinate_pair_double::=
    coordinate_pair comma_wsp? coordinate_pair

coordinate_pair_triplet::=
    coordinate_pair comma_wsp? coordinate_pair comma_wsp? coordinate_pair

coordinate_pair_sequence::=
    coordinate_pair | (coordinate_pair comma_wsp? coordinate_pair_sequence)

coordinate_sequence::=
    coordinate | (coordinate comma_wsp? coordinate_sequence)

coordinate_pair::= coordinate comma_wsp? coordinate

coordinate::= sign? number

sign::= "+"|"-"

exponent::= ("e" | "E") sign? digit+

fractional-constant::= (digit* "." digit+) | digit+

number::= fractional-constant exponent?

flag::= ("0" | "1")

comma_wsp::= (wsp+ ","? wsp*) | ("," wsp*)

wsp::= (#x9 | #x20 | #xA | #xC | #xD)

digit::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
```

Don't get me wrong: when designing an API, or choosing variable names when coding, I'm 100% in favor of long descriptive names. It makes them self-documenting (for the poor maintainer who comes across your function name or implementation out of context ten years later), and it potentially avoids name conflicts.

However, none of these reasons are relevant here. Anyone bothering to read the SVG path data grammar is most likely already familiar with the concepts each symbol of the grammar represents. In other words, there is no need for self-documentation: we've already read the prose documentation and master the concepts. We just want to look at the intricate details of the grammar (where are commas/whitespaces/signs allowed, etc.). These details are captured by the rules themselves, not by the symbol names, and longer names just get in the way of mentally parsing the rules. So what we need are the **shortest possible names which are still instantly recognizable for anyone familiar with the concepts**.

I'll give a few specific examples and justifications below,  then make a proposition of name changes, then you can see the result in the TL;DR at the top of this post.

## Command names

A pet peeve of mine is the name length/pattern inconsistency between `[smooth_]curveto` and `[smooth_]quadratic_bezier_curveto`. The only difference between these concepts is that one is a degree-3 polynomial while the other is a degree-2 polynomial, but they look like completely different beasts from their names. To this regards, the [SVG 1.1 DOM spec](https://www.w3.org/TR/SVG11/paths.html#InterfaceSVGPathSeg) is at least consistent, if lengthy: `PATHSEG_CURVETO_CUBIC_ABS` vs `PATHSEG_CURVETO_QUADRATIC_ABS`. But in the case of the grammar, once you add the `_coordinate_sequence` suffix, even getting rid of the useless `bezier` gives something like `[smooth_]quadratic_curveto_coordinate_sequence` which prevents to write rules as one-liners, preventing them from being easily parsed mentally.

Since anyone reading the grammar already knows the list of existing commands and what they do, we can in fact shorten this:

```
drawto_command::=
    moveto
    | closepath
    | lineto
    | horizontal_lineto
    | vertical_lineto
    | curveto
    | smooth_curveto
    | quadratic_bezier_curveto
    | smooth_quadratic_bezier_curveto
    | elliptical_arc
```

to:

```
drawto_command::=
    moveto
    | closepath
    | lineto
    | hlineto
    | vlineto
    | ccurveto
    | scurveto
    | qcurveto
    | tcurveto
    | arcto
```

See how all the `curveto`s are now beautifully aligned, and how it intuitively hints you that they will have very similar syntax. Also, I've removed `elliptical` which is 100% noise, but I've added the `to` suffix which improves consistency. Indeed, short names are important, but consistency is also important, and in this case consistency only costs 2 characters so it's a no-brainer.


## comma_wsp => cw

I actually do like `comma_wsp` a lot. It is reasonably short and descriptive. However, it used so often throughout the grammar, and sometimes multiple times in a given rule, that it prevents to create one-liners, which I believe are essential for readability (vertical alignment between rules, etc.). I think renaming this to `cw` is a good tradeoff. Indeed, it makes it completely non-obvious what it is, but you just have to look it once, and once you've learned it, it makes everything else much more concise.


## Pairs, doubles, triplets, and sequences of coordinates

The name `coordinate` is nice, but `coord` is just as nice: twice as short, and as easily understood.

Then, a "sequence of things" is simply, well, "thingS", so `coordinate_sequence` can simply become `coords`.

Then we have pairs, doubles, and triplets. While adding semantic meaning to symbol names might seem like a good idea for readability, the truth is that there is no syntax difference at all between "a sequence of 4 coordinates", and "a double of pairs of coordinates". In each case, its just four coords separated by optional commas/whitespaces. If the syntax was something like `[a; b; c; d]` vs `{(a, b), (c, d)}`, then it would make sense to call them differently. But since they really are equivalent syntaxically, it is a mistake to hide this equivalence to the reader by choosing different names.

Therefore, I rather advocate for:

```
coord   ::= ...
coord_2 ::= coord cw? coord
coord_4 ::= coord cw? coord cw? coord cw? coord
coord_6 ::= coord cw? coord cw? coord cw? coord cw? coord cw? coord
```

This already removes three terms from our cognitive loads (pair, double, triplet), while being instantly understood.

(Oh, and see, you're already falling in love with `cw`, it removes so much noise and clutter from the structure!)

We still need to do something about the `*_sequence` symbols. In the current draft, there are 6 of them:

```
1. coordinate_sequence
2. coordinate_pair_sequence
3. curveto_coordinate_sequence
4. smooth_curveto_coordinate_sequence
5. quadratic_bezier_curveto_coordinate_sequence
6. elliptical_arc_argument_sequence
```

First of all, 4 and 5 have the exact same definition, so we don't need both. Then, some of these complicated names, apart from 6, are just sequences of the `coord_x` we just defined. So we can just do:

```
1. coordinate_sequence                           -> coords
2. coordinate_pair_sequence                      -> coord_2s
3. curveto_coordinate_sequence                   -> coord_6s
4. smooth_curveto_coordinate_sequence            -> coord_4s
5. quadratic_bezier_curveto_coordinate_sequence  -> coord_4s
6. elliptical_arc_argument_sequence              -> arcto_args
```

Yeah, there's nothing smart we can really do about 6, aside from just shortening the name.


## Signed and unsigned numbers

In the draft, a signed number is called a "coordinate", and an unsigned number is called a "number". The latter is a terrible mistake in my opinion, given that:

1. The general assumption from all programming languages and markup languages is that a number can be negative unless specified otherwise.

2. In the path data grammar specifically, unsigned numbers are the exception: they only appear twice in the grammar (three times if you count its use to define signed numbers).

3. The [`<number>`](https://www.w3.org/TR/css3-values/#numbers) type in the CSS3 specification does allow for signed number.

Therefore, my opinion is that the SVG 1.1 choice of `nonnegative-number` was much better than the current SVG 2 draft. It makes the exceptional case jumps out to the reader when encountering it, making sure this important subtle detail is not missed. Even better, we can use `unsigned_number` which is shorter and more precise: it makes it somewhat clearer that `+1` isn't allowed either. And since there is no `unsigned_integer`, we can also simply use `unsigned` without risk of confusion. The important bit is really to explicitly let the reader know that a sign isn't allowed.

Whether or not `coord` should be renamed `number` (matching the CSS3 definition), I do not have a strong opinion, so I'll leave it at `coord` here, we gain 1 character, why not.


## Name change proposal

```
svg_path                                       -> path
drawto_command                                 -> command
moveto                                         -> moveto
closepath                                      -> closepath
lineto                                         -> lineto
horizontal_lineto                              -> hlineto
vertical_lineto                                -> vlineto
curveto                                        -> ccurveto
smooth_curveto                                 -> scurveto
quadratic_bezier_curveto                       -> qcurveto
smooth_quadratic_bezier_curveto                -> tcurveto
elliptical_arc                                 -> arcto
curveto_coordinate_sequence                    -> coord_6s
smooth_curveto_coordinate_sequence             -> coord_4s
quadratic_bezier_curveto_coordinate_sequence   -> coord_4s   (duplicate)
elliptical_arc_argument_sequence               -> arcto_args
elliptical_arc_argument                        -> arcto_arg
coordinate_sequence                            -> coords
coordinate_pair                                -> coord_2
coordinate_pair_double                         -> coord_4
coordinate_pair_triplet                        -> coord_6
coordinate_pair_sequence                       -> coord_2s
coordinate                                     -> coord
sign                                           -> sign
exponent                                       -> exp
fractional-constant (sic: '-')                 -> frac
number                                         -> unsigned
flag                                           -> flag
comma_wsp                                      -> cw
wsp                                            -> wsp
digit                                          -> digit
```


Please view or discuss this issue at https://github.com/w3c/svgwg/issues/751 using your GitHub account

Received on Monday, 18 November 2019 18:27:07 UTC