- From: Boris Dalstein via GitHub <sysbot+gh@w3.org>
- Date: Mon, 18 Nov 2019 18:27:05 +0000
- To: public-svg-issues@w3.org
dalboris has just created a new issue for https://github.com/w3c/svgwg: == Use shorter symbol names in SVG path data grammar == # TL;DR I propose the following grammar. It is exactly the same as the current one, but with shorter symbol names to make rules fit in single lines, increasing readability. ``` path ::= wsp* moveto? (moveto command*)? command ::= moveto | closepath | lineto | hlineto | vlineto | ccurveto | scurveto | qcurveto | tcurveto | arcto moveto ::= ("M" | "m") wsp* coord_2s closepath ::= ("Z" | "z") lineto ::= ("L" | "l") wsp* coord_2s hlineto ::= ("H" | "h") wsp* coords vlineto ::= ("V" | "v") wsp* coords ccurveto ::= ("C" | "c") wsp* coord_6s scurveto ::= ("S" | "s") wsp* coord_4s qcurveto ::= ("Q" | "q") wsp* coord_4s tcurveto ::= ("T" | "t") wsp* coord_2s arcto ::= ("A" | "a") wsp* arcto_args coord ::= sign? unsigned coord_2 ::= coord cw? coord coord_4 ::= coord cw? coord cw? coord cw? coord coord_6 ::= coord cw? coord cw? coord cw? coord cw? coord cw? coord arcto_arg ::= unsigned cw? unsigned cw? coord cw flag cw? flag cw? coord_2 coords ::= coord | (coord cw? coords) coord_2s ::= coord_2 | (coord_2 cw? coord_2s) coord_4s ::= coord_4 | (coord_4 cw? coord_4s) coord_6s ::= coord_6 | (coord_6 cw? coord_6s) arcto_args ::= arcto_arg | (arcto_arg cw? arcto_args) unsigned ::= frac exp? frac ::= (digit* "." digit+) | digit+ exp ::= ("e" | "E") sign? digit+ cw ::= (wsp+ ","? wsp*) | ("," wsp*) sign ::= "+" | "-" flag ::= "0" | "1" wsp ::= #x9 | #xA | #xC | #xD | #x20 digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ``` # Long Version I would like to suggest purely stylistic changes to the path data grammar to make it more readable. Importantly, I believe the current symbol names are way too long, and that this is the most significant barrier to readability. For comparison, here is the currently published grammar draft, as of 2019-11-18: https://svgwg.org/svg2-draft/paths.html#PathDataBNF ``` svg_path::= wsp* moveto? (moveto drawto_command*)? drawto_command::= moveto | closepath | lineto | horizontal_lineto | vertical_lineto | curveto | smooth_curveto | quadratic_bezier_curveto | smooth_quadratic_bezier_curveto | elliptical_arc moveto::= ( "M" | "m" ) wsp* coordinate_pair_sequence closepath::= ("Z" | "z") lineto::= ("L"|"l") wsp* coordinate_pair_sequence horizontal_lineto::= ("H"|"h") wsp* coordinate_sequence vertical_lineto::= ("V"|"v") wsp* coordinate_sequence curveto::= ("C"|"c") wsp* curveto_coordinate_sequence curveto_coordinate_sequence::= coordinate_pair_triplet | (coordinate_pair_triplet comma_wsp? curveto_coordinate_sequence) smooth_curveto::= ("S"|"s") wsp* smooth_curveto_coordinate_sequence smooth_curveto_coordinate_sequence::= coordinate_pair_double | (coordinate_pair_double comma_wsp? smooth_curveto_coordinate_sequence) quadratic_bezier_curveto::= ("Q"|"q") wsp* quadratic_bezier_curveto_coordinate_sequence quadratic_bezier_curveto_coordinate_sequence::= coordinate_pair_double | (coordinate_pair_double comma_wsp? quadratic_bezier_curveto_coordinate_sequence) smooth_quadratic_bezier_curveto::= ("T"|"t") wsp* coordinate_pair_sequence elliptical_arc::= ( "A" | "a" ) wsp* elliptical_arc_argument_sequence elliptical_arc_argument_sequence::= elliptical_arc_argument | (elliptical_arc_argument comma_wsp? elliptical_arc_argument_sequence) elliptical_arc_argument::= number comma_wsp? number comma_wsp? number comma_wsp flag comma_wsp? flag comma_wsp? coordinate_pair coordinate_pair_double::= coordinate_pair comma_wsp? coordinate_pair coordinate_pair_triplet::= coordinate_pair comma_wsp? coordinate_pair comma_wsp? coordinate_pair coordinate_pair_sequence::= coordinate_pair | (coordinate_pair comma_wsp? coordinate_pair_sequence) coordinate_sequence::= coordinate | (coordinate comma_wsp? coordinate_sequence) coordinate_pair::= coordinate comma_wsp? coordinate coordinate::= sign? number sign::= "+"|"-" exponent::= ("e" | "E") sign? digit+ fractional-constant::= (digit* "." digit+) | digit+ number::= fractional-constant exponent? flag::= ("0" | "1") comma_wsp::= (wsp+ ","? wsp*) | ("," wsp*) wsp::= (#x9 | #x20 | #xA | #xC | #xD) digit::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ``` Don't get me wrong: when designing an API, or choosing variable names when coding, I'm 100% in favor of long descriptive names. It makes them self-documenting (for the poor maintainer who comes across your function name or implementation out of context ten years later), and it potentially avoids name conflicts. However, none of these reasons are relevant here. Anyone bothering to read the SVG path data grammar is most likely already familiar with the concepts each symbol of the grammar represents. In other words, there is no need for self-documentation: we've already read the prose documentation and master the concepts. We just want to look at the intricate details of the grammar (where are commas/whitespaces/signs allowed, etc.). These details are captured by the rules themselves, not by the symbol names, and longer names just get in the way of mentally parsing the rules. So what we need are the **shortest possible names which are still instantly recognizable for anyone familiar with the concepts**. I'll give a few specific examples and justifications below, then make a proposition of name changes, then you can see the result in the TL;DR at the top of this post. ## Command names A pet peeve of mine is the name length/pattern inconsistency between `[smooth_]curveto` and `[smooth_]quadratic_bezier_curveto`. The only difference between these concepts is that one is a degree-3 polynomial while the other is a degree-2 polynomial, but they look like completely different beasts from their names. To this regards, the [SVG 1.1 DOM spec](https://www.w3.org/TR/SVG11/paths.html#InterfaceSVGPathSeg) is at least consistent, if lengthy: `PATHSEG_CURVETO_CUBIC_ABS` vs `PATHSEG_CURVETO_QUADRATIC_ABS`. But in the case of the grammar, once you add the `_coordinate_sequence` suffix, even getting rid of the useless `bezier` gives something like `[smooth_]quadratic_curveto_coordinate_sequence` which prevents to write rules as one-liners, preventing them from being easily parsed mentally. Since anyone reading the grammar already knows the list of existing commands and what they do, we can in fact shorten this: ``` drawto_command::= moveto | closepath | lineto | horizontal_lineto | vertical_lineto | curveto | smooth_curveto | quadratic_bezier_curveto | smooth_quadratic_bezier_curveto | elliptical_arc ``` to: ``` drawto_command::= moveto | closepath | lineto | hlineto | vlineto | ccurveto | scurveto | qcurveto | tcurveto | arcto ``` See how all the `curveto`s are now beautifully aligned, and how it intuitively hints you that they will have very similar syntax. Also, I've removed `elliptical` which is 100% noise, but I've added the `to` suffix which improves consistency. Indeed, short names are important, but consistency is also important, and in this case consistency only costs 2 characters so it's a no-brainer. ## comma_wsp => cw I actually do like `comma_wsp` a lot. It is reasonably short and descriptive. However, it used so often throughout the grammar, and sometimes multiple times in a given rule, that it prevents to create one-liners, which I believe are essential for readability (vertical alignment between rules, etc.). I think renaming this to `cw` is a good tradeoff. Indeed, it makes it completely non-obvious what it is, but you just have to look it once, and once you've learned it, it makes everything else much more concise. ## Pairs, doubles, triplets, and sequences of coordinates The name `coordinate` is nice, but `coord` is just as nice: twice as short, and as easily understood. Then, a "sequence of things" is simply, well, "thingS", so `coordinate_sequence` can simply become `coords`. Then we have pairs, doubles, and triplets. While adding semantic meaning to symbol names might seem like a good idea for readability, the truth is that there is no syntax difference at all between "a sequence of 4 coordinates", and "a double of pairs of coordinates". In each case, its just four coords separated by optional commas/whitespaces. If the syntax was something like `[a; b; c; d]` vs `{(a, b), (c, d)}`, then it would make sense to call them differently. But since they really are equivalent syntaxically, it is a mistake to hide this equivalence to the reader by choosing different names. Therefore, I rather advocate for: ``` coord ::= ... coord_2 ::= coord cw? coord coord_4 ::= coord cw? coord cw? coord cw? coord coord_6 ::= coord cw? coord cw? coord cw? coord cw? coord cw? coord ``` This already removes three terms from our cognitive loads (pair, double, triplet), while being instantly understood. (Oh, and see, you're already falling in love with `cw`, it removes so much noise and clutter from the structure!) We still need to do something about the `*_sequence` symbols. In the current draft, there are 6 of them: ``` 1. coordinate_sequence 2. coordinate_pair_sequence 3. curveto_coordinate_sequence 4. smooth_curveto_coordinate_sequence 5. quadratic_bezier_curveto_coordinate_sequence 6. elliptical_arc_argument_sequence ``` First of all, 4 and 5 have the exact same definition, so we don't need both. Then, some of these complicated names, apart from 6, are just sequences of the `coord_x` we just defined. So we can just do: ``` 1. coordinate_sequence -> coords 2. coordinate_pair_sequence -> coord_2s 3. curveto_coordinate_sequence -> coord_6s 4. smooth_curveto_coordinate_sequence -> coord_4s 5. quadratic_bezier_curveto_coordinate_sequence -> coord_4s 6. elliptical_arc_argument_sequence -> arcto_args ``` Yeah, there's nothing smart we can really do about 6, aside from just shortening the name. ## Signed and unsigned numbers In the draft, a signed number is called a "coordinate", and an unsigned number is called a "number". The latter is a terrible mistake in my opinion, given that: 1. The general assumption from all programming languages and markup languages is that a number can be negative unless specified otherwise. 2. In the path data grammar specifically, unsigned numbers are the exception: they only appear twice in the grammar (three times if you count its use to define signed numbers). 3. The [`<number>`](https://www.w3.org/TR/css3-values/#numbers) type in the CSS3 specification does allow for signed number. Therefore, my opinion is that the SVG 1.1 choice of `nonnegative-number` was much better than the current SVG 2 draft. It makes the exceptional case jumps out to the reader when encountering it, making sure this important subtle detail is not missed. Even better, we can use `unsigned_number` which is shorter and more precise: it makes it somewhat clearer that `+1` isn't allowed either. And since there is no `unsigned_integer`, we can also simply use `unsigned` without risk of confusion. The important bit is really to explicitly let the reader know that a sign isn't allowed. Whether or not `coord` should be renamed `number` (matching the CSS3 definition), I do not have a strong opinion, so I'll leave it at `coord` here, we gain 1 character, why not. ## Name change proposal ``` svg_path -> path drawto_command -> command moveto -> moveto closepath -> closepath lineto -> lineto horizontal_lineto -> hlineto vertical_lineto -> vlineto curveto -> ccurveto smooth_curveto -> scurveto quadratic_bezier_curveto -> qcurveto smooth_quadratic_bezier_curveto -> tcurveto elliptical_arc -> arcto curveto_coordinate_sequence -> coord_6s smooth_curveto_coordinate_sequence -> coord_4s quadratic_bezier_curveto_coordinate_sequence -> coord_4s (duplicate) elliptical_arc_argument_sequence -> arcto_args elliptical_arc_argument -> arcto_arg coordinate_sequence -> coords coordinate_pair -> coord_2 coordinate_pair_double -> coord_4 coordinate_pair_triplet -> coord_6 coordinate_pair_sequence -> coord_2s coordinate -> coord sign -> sign exponent -> exp fractional-constant (sic: '-') -> frac number -> unsigned flag -> flag comma_wsp -> cw wsp -> wsp digit -> digit ``` Please view or discuss this issue at https://github.com/w3c/svgwg/issues/751 using your GitHub account
Received on Monday, 18 November 2019 18:27:07 UTC