Re: [svgwg] Subtleties of the path data grammar (#752)

From the programming perspective it's generally irrelevant. You pattern search for a letter that isn't an 'e' in the path, then you split the data at that point. All numbers and non-letter (except 'e') symbols between that letter and the next letter are part of that command. You then parse that command individually. You get the command operands by running a greedy regex match across the data and finding everything that matches the generally accepted float regex. `[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?`

Or at least I thought that until I ran into `M200,120 h-25 a25,25 0 1125,25 z` and raised issue #755 . The truth is regex is so ubiquitous that that is going to be how most everybody parses that grammar. Though throwing it into LEX could certainly be a thing.

For the most part since I'm going to fed it all to a greedy regex anyway and cast it to float. The syntax doesn't matter that much, just that it's compact and easy to feed to well established parsers.  My objection with the arc flags is that it breaks that pattern. There's valid svg path patterns that I'm not going to rewrite everything just to support because you could legally remove some spaces that made suddenly makes the context matter. Because I'm in an A value at the 4th position 1125 parsed as 1 1 25 rather than 1125 which is what it would be everywhere else. That sort of context sensitive parsing is horrible.

1) M 12 34 could be taken as M 1,2 L 3, 4 or M 12,34 without greedy parsing. Solving M 123 L... would be impossible and we'd have to detect the exactly two characters things out of the gate. If we have M12 as our path already and append a point to that path, we shouldn't have to care where we came from. But, without greedy parsing we'd need to know we didn't perform one of these unambiguous correction things. Also, it's a bit out there, but if somebody wanted to adapt pathing to 3d or 1d not discretizing the operands will cause ambiguity. M 12 34 in a 1d path makes sense. Go to position (12) then to position (34) it's a line (as are all 1d things). D1 M12 34 (with D being a hypothetical dimensionalization command) would be unambiguous. Also, the general grammar is so nice that fiddling with it like that seems to be a non-starter.

2) Greedy parsing is better parsing. Because then I don't need to consider the context. I simply slurp up any logically reasonable character and I'm set. If you have more data, that's fine it won't change the info I've already gobbled up.

3. No. Grammars should not reflect intent. They should be the the expression of what is and isn't permitted by expressions. Is something grammatically valid "Time flies like an arrow. Fruit flies like a banana." is different than is does that make actual sense: "When is the palace starting?" "Who murdered the wind?" -- I think excluding things through mere grammar is a bit strange. You are saying this element is unparsable rather than this element is malformed. `M0,0A -25, 25 0 1 1 50,0` as a path is technically unparsable. The `-` sign there in front of the radius, does not compute with the grammar. We are admonished exactly how to deal with it. Namely absolute value of radius. But, we're supposed to, according to the grammar, not know what that says, a minus there is a syntax error. And I think basically everybody reads it and deals with it correctly.

I would actually prefer if the grammar was made to be *more* general. Namely:

> wsp::= (#x20 | #x9 | #xD | #xA | comment)
> comment::= \([^\)]\)
> non-e-letter::= [abcdfghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]
> css-number::= [-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?
> path::= (non-e-letter wsp* (wsp* css-number wsp*)*)*

Exclude anything in brackets as being parsed `(` unparsed `)`

Then specify that the letter is a command and the numbers are operands and here's what commands SVG supports and what range or values are supported and how to make out of range values fit these in range values. If another command is expressed it's an error unless some other namespace supports it. 

Now, more to your point. The documentation should outline preferred export syntax. With regards to robustness we should be liberal with what we accept and conservative with what we produce. There I would suggest all point pairs be divided by comma, all other operands be explicitly separated by spaces, we exclusively use absolute positioning, L be preferred over V and H (in absolute they get confusing). T and S be used wherever possible.

d="
(draw a closed semicircle)
M 0,0 (move to origin)
A 25 25 0 1 1 50,0 (semicircle arc)
Z (close to 0,0)
"

-- 
GitHub Notification of comment by tatarize
Please view or discuss this issue at https://github.com/w3c/svgwg/issues/752#issuecomment-559314580 using your GitHub account

Received on Thursday, 28 November 2019 02:21:45 UTC