Re: [svgwg] SVG2 path data coordinates are currently spec'd as integers.

Step 2: new SVG 2 grammar
------------------------------
```
svg_path                        ::= wsp* moveto wsp* (command wsp*)*

command                         ::= closepath
                                    | moveto
                                    | lineto
                                    | horizontal_lineto
                                    | vertical_lineto
                                    | curveto
                                    | smooth_curveto
                                    | quadratic_bezier_curveto
                                    | smooth_quadratic_bezier_curveto
                                    | elliptical_arc
                                    | bearing

closepath                       ::= [Zz]
moveto                          ::= [Mm] wsp* moveto_argument (delimiter? lineto_argument)*
lineto                          ::= [Ll] wsp* coordpair_singlet_sequence
horizontal_lineto               ::= [Hh] wsp* number_sequence
vertical_lineto                 ::= [Vv] wsp* number_sequence
curveto                         ::= [Cc] wsp* coordpair_triplet_sequence
smooth_curveto                  ::= [Ss] wsp* coordpair_doublet_sequence
quadratic_bezier_curveto        ::= [Qq] wsp* coordpair_doublet_sequence
smooth_quadratic_bezier_curveto ::= [Tt] wsp* coordpair_singlet_sequence
elliptical_arc                  ::= [Aa] wsp* arc_argument_sequence
bearing                         ::= [Bb] wsp* number_sequence


number_sequence                 ::= number (delimiter? number)*
coordpair_singlet_sequence      ::= coordpair (delimiter? coordpair)*
                                    | closepath
coordpair_doublet_sequence      ::= coordpair_doublet (delimiter? coordpair_doublet)*
                                    (delimiter? incomplete_coordpair_doublet)?
                                    | incomplete_coordpair_doublet
                                    | closepath
coordpair_triplet_sequence      ::= coordpair_triplet (delimiter? coordpair_triplet)*
                                    (delimiter? incomplete_coordpair_triplet)?
                                    | incomplete_coordpair_triplet
                                    | closepath
arc_argument_sequence           ::= arc_argument (delimiter? arc_argument)*
                                    (delimiter? incomplete_arc_argument)?
                                    | incomplete_arc_argument


moveto_argument                 ::= coordpair
lineto_argument                 ::= coordpair
coordpair                       ::= number delimiter? number

coordpair_doublet               ::= coordpair delimiter? coordpair
incomplete_coordpair_doublet    ::= coordpair wsp* closepath

coordpair_triplet               ::= coordpair delimiter? coordpair delimiter? coordpair
incomplete_coordpair_triplet    ::= ( coordpair_doublet | coordpair ) wsp* closepath

arc_argument                    ::= number delimiter? number delimiter? number delimiter
                                    flag delimiter? flag delimiter? coordpair
incomplete_arc_argument         ::= number delimiter? number delimiter? number delimiter
                                    flag delimiter? flag wsp* closepath

delimiter                       ::= wsp+ comma_wsp? | comma_wsp
comma_wsp                       ::= "," wsp*
flag                            ::= [01]
number                          ::= sign? fraction exponent?
fraction                        ::= digits ( dot digits? )? | dot digits
exponent                        ::= [Ee] sign? digits
sign                            ::= "+" | "-"
digits                          ::= [0-9]+
dot                             ::= "."
wsp                             ::= [#x9#xA#xD#x20]
```

- I've followed the 2.0 EBNF in using underscores, rather than hyphens, to divide identifiers.

- I have zero idea what's going on with `svg_path` in the extant 2.0 grammar, where it's defined as:
  ```
  svg_path::= wsp* moveto? (moveto drawto_command*)?
  ```
  1. I can't understand why it begins with an optional `moveto` command and then has a mandatory one in the bracket. The optional one seems redundant, so I've eliminated it.  But maybe I've missed something.
  2. There's no support for trailing whitespace, so `"M0,0  "` is invalid.
  3. And the spaces in `"M0,0 L1,1"` and `"M0,0L1,1 T2,2"` are illegal as there's no trailing or leading whitespace on the `drawto_command`,

  I've fixed all that and renamed `drawto_command` to `command` as it no longer excludes `moveto`, as was the case in 1.1.

- While we're on the subject of whitespace, the existing 2.0 grammar doesn't allow whitespace between most coordinates and the final `closepath`; e.g. `M0,0C0,0 Z` is illegal.

  The exception is arcs, where a "closing" command can have spaces and commas before the "Z"; e.g. `M0,0 A0,0,0,0,0,Z`.

  I've settled on allowing whitespace only before the Z. That seems (to me) most consistent with established practice. That said, it makes the EBNF more complicated.

- And on the subject of arcs and commas, the existing 2.0 grammar doesn't allow commas before the closing arc comamd. So the comma between the 17 and 21 in `M0,0 A11 12 13 1 1 16 17,21 22 23 1 1 26 27` is currently illegal. Obviously, I've fixed that.

- I've not given each command its own argument type. The existing 2.0 grammar doesn't do it and it was making the grammar convoluted. Moreover, as a dev, I just want to be able to read out of the grammar the number arguments for each command, and this approach facilitates that.

  Exceptions are made for elliptical arcs, because they have a bespoke argument type, and `moveto`, because I wanted to stress that the first coordinate is a `moveto` and the rest are `lineto`.

- In the same spirit of simplification, `coordinate` and `coordinate_sequence` have been eliminated in favour of `number` and `number_sequence`.

- When I started trying to unravel the grammar I found myself confusing `coordinate_pair_sequence` and `coordinate_sequence`. The preceding changes may have helped with that. But I've also fused `coordinate_pair` into `coordpair` -- making it a single "type" that's modified by its suffixes, and I've introduced the term "singlet" to regularise and further eliminate confusion. (I toyed with calling things `coordpair_2-tuple_list` So you got off lightly.)

- I switched `comma_wsp` for `delimiter`. I think it's quicker to absorb a single word than a phrase. It also makes it less easy to confuse with `wsp`.

- I've shortened `elliptical_arc_xxx` to `arc_xxx`, retaining only the top level `elliptical_arc` to keep us in sync with the text. I did this because these strings were getting tooooo loooonnnnggggg. According to wikipedia, all our curves (when not flat) are arcs, but I imagine people know what we mean. 

   (`quadratic_bezier_curveto` is another name I would love to shorten. When it's a cubic bezier well call it `curveto`.  So why do we stress the "bezierness" of quadratic beziers? Couldn't we just call them `quadratic_curveto`? But that's out of scope as it's a problem in the main text.)

- I've included the `closepath` only where is has to be present to terminate the sequence. So for example, it's not on `moveto` because `MZ` is illegal and `M0,0Z` and `M0,0,1,1Z` are permitted without special magic. (The Z is just a regular `closepath` command.) Whereas `M0,0LZ` has to be specially coded into the grammar otherwise the user could write `M0,0 L M1,1` (Aside: there's no reason we couldn't allow this -- defaulting the missing arguments and inferring a closed path when it happens.)

- And a related insight: we could allow every command to have an optional `closepath` and do away with it at the top level . I think this would make the grammar simpler but, as it would bury the existance of `closepath`, I've stuck with the 1.1 precedent.

- And, on the technical front, I think this grammar is non-ambiguous. (It's very easy to create a grammar where one string can match two different productions.) In particular, the  `|`-separated clauses are genuine disjunctions: if one matches it's certain no other will. (There have been times when that wasn't the case and my parser had to test every branch and pick the longest.)

- However, the grammar could be written more compactly. For example, `coordpair_doublet_sequence` could be phrased:

   ```
   coordpair_doublet_sequence   ::= ( coordpair_doublet delimiter?)*
                                    ( coordpair_doublet |  incomplete_coorpair_doublet ) 
                                    | closepath
   incomplete_coordpair_doublet ::= coordpair wsp* closepath
   ```
    I don't think the above is ambiguous. (The regex `/(\d*)(\d)/.exec( "1" )` will put "1" in the second capture.) But my parser can't handle it (it assigns all the `coordpair_doublet`s to the first bracket and leaves none for the second) and I'm unwilling to use this form until I've added it to my EBNF parser and had it check there are no ambiguities. However that does leave me guilty of letting implementation decisions shape the grammar, much as was happening with 1.1. So it goes.

-- 
GitHub Notification of comment by fuchsia
Please view or discuss this issue at https://github.com/w3c/svgwg/issues/335#issuecomment-325377260 using your GitHub account

Received on Monday, 28 August 2017 14:56:20 UTC