Re: completed A-635-03: parsing colons

I like 1a.

- It makes a:b:c, a:*:c, and *:b:c behave consistently (key expressions a:b, a:*, and *:b)
- It doesn’t introduce the unwanted side-effects that you mentioned
- It appears to be a low risk, low effort change (would only involve a small change to the grammar)
- Lots of valid queries are made easier to read by adding insignificant whitespace and I see map{a:b:c} as another case of this.

Thanks,
Josh

> On Mar 22, 2016, at 2:07 PM, Michael Dyck <jmdyck@ibiblio.org> wrote:
> 
>> ACTION A-635-03: MDyck to propose a solution to Bug 29501 re parsing colons.
> 
> ref: https://www.w3.org/Bugs/Public/show_bug.cgi?id=29501
> 
> Although the action item tells me to propose *a* solution, this post will give 4 or 5 alternatives, as I think the WG needs to mull over the implications of the various choices (which I've tried to work out here).
> 
> I'll use Josh's examples of
>    map{a:b:c}
>    map{a:*:c}
>    map{*:b:c}
> but mostly leave out the map{} for the rest of this post.
> 
> I'll use "~~" to mean "is parsed the same as".
> 
> I agree with Mike Kay's analysis in comment 3, that according to the current spec,
>    *:b:c   ~~   *  :  b:c
>    a:*:c   ~~   a  :  *:c
> 
> (But note that
>    a:b:c   ~~   a:b : c
> because QName a:b is, at the starting position, the longest valid token.)
> 
> -------------------------------------------------------------------------
> (1)
> Say we want these expressions to stay legal, but the ones involving '*' to be parsed differently:
>    *:b:c   ~~   *:b  :  c
>    a:*:c   ~~   a:*  :  c
> 
> In comment 6, Mike suggested two ways to accomplish this, both by altering the Wildcard production such that the "longest possible match" rule forces the parser to interpret the first 3 characters as a Wildcard. One way (1a) is to change
>    ... | NCName ":" "*" | "*" ":" NCName | ...
> to
>    ... | NCName ":*"    | "*:" NCName    | ...
> 
> And the other way (1b) is to make Wildcard a terminal symbol (i.e., move the production from A.1 to A.2.1).
> 
> One side-effect of 1b is that, with Wildcard a terminal symbol, it would have to be added to either the list of delimiting or non-delimiting terminals. Presumably we'd make it non-delimiting, since it can start or end with a letter. But consider the examples:
>    a:*div 3
>    $x union*:b
> I believe these are currently syntactically legal and have been since XPath 1.0 or 2.0, because the "*" terminal is delimiting so doesn't require spacing from the keyword. But if Wildcard is made a non-delimiting terminal, these examples would become syntax errors (because successive non-delimiting terminals require intervening spacing).
> 
> Although I'm doubtful that anyone would mind such examples becoming syntax errors, it *is* backwards-incompatible, so we might want to choose the first solution (1a) instead (and make ":*" and "*:" delimiting, which is no problem, since the symbols ":" and "*" are already delimiting).
> 
> Alternatively (1b'), we could replace the delimiting/non-delimiting rule with a rule that allows Wildcard to be a terminal without making those examples illegal. (Though I'm not exactly sure how that would look.)
> 
> -------------------------------------------------------------------------
> (2)
> One the other hand, we might want all three of the problem cases:
>    map{a:b:c}
>    map{a:*:c}
>    map{*:b:c}
> to be illegal (syntax errors). (Note that here, such a change is *not* backwards-incompatible, because MapConstructors are new in 3.1.)
> 
> E.g., we might somehow require the user to insert at least one space, to make their intent clear:
>    a :b:c    or    a: b:c    or    a:b :c    or    a:b: c
>    a :*:c    or    a: *:c    or    a:* :c    or    a:*: c
>    * :b:c    or    *: b:c    or    *:b :c    or    *:b: c
> 
> In comment 6, Mike Kay suggested adding another special case to the very end of A.2.2 Terminal Delimitation, though he wasn't entirely certain what it should be. Here's one possibility (2a):
>    Also, when a QName or NCName or "*" is followed by a ":",
>    an intervening separator is required.
> (This would cause the 2nd and 4th columns above to be illegal, but the user would still have columns 1 and 3 to express the two different semantics.)
> 
> Or the other way around (2b), the rule could target ":" followed by QName or NCName or "*", which would make columns 1 and 3 illegal, 2 and 4 okay.
> 
> (In English at least, it's more common to put a space after a colon than before, so I think people would be more inclined to write columns 2 and 4, which argues for 2b over 2a.)
> 
> However, each rule has possibly unwanted side-effects. 2a would disallow examples such as:
>    map{a:1}
>    map{*:/b}
> and 2b would disallow:
>    map{1:b}
>    map{"a":b}
> All of these seem reasonable to me and not worthy of a syntax error, though I guess I'd get used to always putting in the space.
> 
> Editorially, with either 2a or 2b, I think we'd have to make it clearer that the ":" in the rule is not the ":" in a Wildcard or QName. E.g., we could make it clearer that the whole delimiting/non-delimiting mess only applies in cases when ws:explicit doesn't apply.
> 
> -Michael
> 

Received on Tuesday, 22 March 2016 21:37:56 UTC