- From: Boris Dalstein <dalboris@gmail.com>
- Date: Sun, 17 Nov 2019 18:16:05 +0100
- To: www-svg@w3.org
- Message-ID: <39580370-b8ec-fbce-c3b8-6dd15122d4e2@gmail.com>
And by the way, I'd propose to simplify the readability of the syntax. Just renaming the grammar identifiers, from the SVG 1.1 spec, we have: unsigned: int | float number: (sign? int) | (sign? float) int: digits float: (frac exp?) | (digits exp) frac: (digits? "." digits) | (digits ".") exp: ("e" | "E") sign? digits sign: "+" | "-" digits: digit | digit digits digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" Which we could simplify to the much more concise and readable, while equivalent, following syntax: number: sign? unsigned unsigned: ((digits "."?) | (digits? "." digits)) exp? exp: ("e" | "E") sign? digits sign: "+" | "-" digits: digit+ digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" From which we can more easily derive the following regex, which we could also add to the spec: [+\-]?(([0-9]+\.?)|([0-9]*\.[0-9]+))([eE][+\-][0-9]+)? Any thoughts? Best regards, Boris PS: below is the proof of the equivalence of the two grammars. First, since it is indicated that in the grammar, the symbol `+` means "one or more", instead of: digit | digit digits we can simply write: digit+ Also, the following: number: (sign? int) | (sign? float) can be factorized in: number: sign? (int | float) So the rules now look like this: unsigned: int | float number: sign? (int | float) int: digits float: (frac exp?) | (digits exp) frac: (digits? "." digits) | (digits ".") exp: ("e" | "E") sign? digits sign: "+" | "-" digits: digit+ digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" Then, there is the (int | float) repetition, which can be avoided by simply defining signed numbers in terms of unsigned numbers: number: sign? unsigned unsigned: int | float int: digits float: (frac exp?) | (digits exp) frac: (digits? "." digits) | (digits ".") exp: ("e" | "E") sign? digits sign: "+" | "-" digits: digit+ digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" Then, since "int" is just an alias for "digits", we can just remove it: number: sign? unsigned unsigned: digits | float float: (frac exp?) | (digits exp) frac: (digits? "." digits) | (digits ".") exp: ("e" | "E") sign? digits sign: "+" | "-" digits: digit+ digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" Then, there is the "exp" appearing twice. We can be smarter and make it appear only once, so it's easier to build a regexp. For example, the following two rules: unsigned: digits | float float: (frac exp?) | (digits exp) Can be more simply rewritten as one rule: unsigned: digits | (frac exp?) | (digits exp) For which it becomes clear that is is in fact: unsigned: (digits exp?) | (frac exp?) And even more simply: unsigned: (digits | frac) exp? So all the rules now become: number: sign? unsigned unsigned: (digits | frac) exp? frac: (digits? "." digits) | (digits ".") exp: ("e" | "E") sign? digits sign: "+" | "-" digits: digit+ digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" But since frac only appears in one rule, let's just substitute it: number: sign? unsigned unsigned: (digits | (digits? "." digits) | (digits ".")) exp? exp: ("e" | "E") sign? digits sign: "+" | "-" digits: digit+ digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" Oh, but now there's more simplifications we can do! If you look at the following: digits | (digits? "." digits) | (digits ".") You can see that the third OR clause can be integrated into the first, by simply making the "." optional: (digits "."?) | (digits? "." digits) So here we are, the quite unreadable rules we started with are in fact equivalent to these ones, much more readable: number: sign? unsigned unsigned: ((digits "."?) | (digits? "." digits)) exp? exp: ("e" | "E") sign? digits sign: "+" | "-" digits: digit+ digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" On 17/11/2019 17:14, Boris Dalstein wrote: > Bouncing back on this as I'm currently writing a parser. > > The grammar on the SVG 1.1 spec includes fractional numbers and > numbers with an exponent part: > > https://www.w3.org/TR/SVG11/paths.html#PathDataBNF > > number: > sign? integer-constant > | sign? floating-point-constant > integer-constant: > digit-sequence > floating-point-constant: > fractional-constant exponent? > | digit-sequence exponent > fractional-constant: > digit-sequence? "." digit-sequence > | digit-sequence "." > exponent: > ( "e" | "E" ) sign? digit-sequence > sign: > "+" | "-" > digit-sequence: > digit > | digit digit-sequence > digit: > "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" > > While the grammar on the latest SVG 2 CR only contains integers: > > https://www.w3.org/TR/2018/CR-SVG2-20181004/paths.html#PathDataBNF > > number ::= ([0-9])+ > > This sounds like an important omission. > > Best regards, > Boris > > On 29/04/2017 19:26, Jirka Kosek wrote: >> On 29.4.2017 18:57, Paul LeBeau wrote: >>> Paths of the form that I presented do exist and are actually common. I >>> wasn't around when the grammar was originally written, so I don't know the >>> reason why it was written the way it was. >> Seems that grammar is only illustrational because there are other issues >> with it -- for example grammar accepts only integers not decimal numbers. >> >
Received on Sunday, 17 November 2019 17:16:12 UTC