- From: Boris Dalstein <dalboris@gmail.com>
- Date: Sun, 17 Nov 2019 18:16:05 +0100
- To: www-svg@w3.org
- Message-ID: <39580370-b8ec-fbce-c3b8-6dd15122d4e2@gmail.com>
And by the way, I'd propose to simplify the readability of the syntax.
Just renaming the grammar identifiers, from the SVG 1.1 spec, we have:
unsigned: int | float
number: (sign? int) | (sign? float)
int: digits
float: (frac exp?) | (digits exp)
frac: (digits? "." digits) | (digits ".")
exp: ("e" | "E") sign? digits
sign: "+" | "-"
digits: digit | digit digits
digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
Which we could simplify to the much more concise and readable, while
equivalent, following syntax:
number: sign? unsigned
unsigned: ((digits "."?) | (digits? "." digits)) exp?
exp: ("e" | "E") sign? digits
sign: "+" | "-"
digits: digit+
digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
From which we can more easily derive the following regex, which
we could also add to the spec:
[+\-]?(([0-9]+\.?)|([0-9]*\.[0-9]+))([eE][+\-][0-9]+)?
Any thoughts?
Best regards,
Boris
PS: below is the proof of the equivalence of the two grammars.
First, since it is indicated that in the grammar, the symbol `+` means
"one or more",
instead of:
digit | digit digits
we can simply write:
digit+
Also, the following:
number: (sign? int) | (sign? float)
can be factorized in:
number: sign? (int | float)
So the rules now look like this:
unsigned: int | float
number: sign? (int | float)
int: digits
float: (frac exp?) | (digits exp)
frac: (digits? "." digits) | (digits ".")
exp: ("e" | "E") sign? digits
sign: "+" | "-"
digits: digit+
digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
Then, there is the (int | float) repetition, which can be avoided by simply
defining signed numbers in terms of unsigned numbers:
number: sign? unsigned
unsigned: int | float
int: digits
float: (frac exp?) | (digits exp)
frac: (digits? "." digits) | (digits ".")
exp: ("e" | "E") sign? digits
sign: "+" | "-"
digits: digit+
digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
Then, since "int" is just an alias for "digits", we can just remove it:
number: sign? unsigned
unsigned: digits | float
float: (frac exp?) | (digits exp)
frac: (digits? "." digits) | (digits ".")
exp: ("e" | "E") sign? digits
sign: "+" | "-"
digits: digit+
digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
Then, there is the "exp" appearing twice. We can be smarter and make it
appear
only once, so it's easier to build a regexp. For example, the following two
rules:
unsigned: digits | float
float: (frac exp?) | (digits exp)
Can be more simply rewritten as one rule:
unsigned: digits | (frac exp?) | (digits exp)
For which it becomes clear that is is in fact:
unsigned: (digits exp?) | (frac exp?)
And even more simply:
unsigned: (digits | frac) exp?
So all the rules now become:
number: sign? unsigned
unsigned: (digits | frac) exp?
frac: (digits? "." digits) | (digits ".")
exp: ("e" | "E") sign? digits
sign: "+" | "-"
digits: digit+
digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
But since frac only appears in one rule, let's just substitute it:
number: sign? unsigned
unsigned: (digits | (digits? "." digits) | (digits ".")) exp?
exp: ("e" | "E") sign? digits
sign: "+" | "-"
digits: digit+
digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
Oh, but now there's more simplifications we can do! If you look at
the following:
digits | (digits? "." digits) | (digits ".")
You can see that the third OR clause can be integrated into the first,
by simply making the "." optional:
(digits "."?) | (digits? "." digits)
So here we are, the quite unreadable rules we started with are in fact
equivalent to these ones, much more readable:
number: sign? unsigned
unsigned: ((digits "."?) | (digits? "." digits)) exp?
exp: ("e" | "E") sign? digits
sign: "+" | "-"
digits: digit+
digit: "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
On 17/11/2019 17:14, Boris Dalstein wrote:
> Bouncing back on this as I'm currently writing a parser.
>
> The grammar on the SVG 1.1 spec includes fractional numbers and
> numbers with an exponent part:
>
> https://www.w3.org/TR/SVG11/paths.html#PathDataBNF
>
> number:
> sign? integer-constant
> | sign? floating-point-constant
> integer-constant:
> digit-sequence
> floating-point-constant:
> fractional-constant exponent?
> | digit-sequence exponent
> fractional-constant:
> digit-sequence? "." digit-sequence
> | digit-sequence "."
> exponent:
> ( "e" | "E" ) sign? digit-sequence
> sign:
> "+" | "-"
> digit-sequence:
> digit
> | digit digit-sequence
> digit:
> "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
>
> While the grammar on the latest SVG 2 CR only contains integers:
>
> https://www.w3.org/TR/2018/CR-SVG2-20181004/paths.html#PathDataBNF
>
> number ::= ([0-9])+
>
> This sounds like an important omission.
>
> Best regards,
> Boris
>
> On 29/04/2017 19:26, Jirka Kosek wrote:
>> On 29.4.2017 18:57, Paul LeBeau wrote:
>>> Paths of the form that I presented do exist and are actually common. I
>>> wasn't around when the grammar was originally written, so I don't know the
>>> reason why it was written the way it was.
>> Seems that grammar is only illustrational because there are other issues
>> with it -- for example grammar accepts only integers not decimal numbers.
>>
>
Received on Sunday, 17 November 2019 17:16:12 UTC