- From: Josh Spiegel <josh.spiegel@oracle.com>
- Date: Tue, 22 Mar 2016 14:37:17 -0700 (PDT)
- To: Michael Dyck <jmdyck@ibiblio.org>
- Cc: Public Joint XSLT XQuery XPath <public-xsl-query@w3.org>
I like 1a.
- It makes a:b:c, a:*:c, and *:b:c behave consistently (key expressions a:b, a:*, and *:b)
- It doesn’t introduce the unwanted side-effects that you mentioned
- It appears to be a low risk, low effort change (would only involve a small change to the grammar)
- Lots of valid queries are made easier to read by adding insignificant whitespace and I see map{a:b:c} as another case of this.
Thanks,
Josh
> On Mar 22, 2016, at 2:07 PM, Michael Dyck <jmdyck@ibiblio.org> wrote:
>
>> ACTION A-635-03: MDyck to propose a solution to Bug 29501 re parsing colons.
>
> ref: https://www.w3.org/Bugs/Public/show_bug.cgi?id=29501
>
> Although the action item tells me to propose *a* solution, this post will give 4 or 5 alternatives, as I think the WG needs to mull over the implications of the various choices (which I've tried to work out here).
>
> I'll use Josh's examples of
> map{a:b:c}
> map{a:*:c}
> map{*:b:c}
> but mostly leave out the map{} for the rest of this post.
>
> I'll use "~~" to mean "is parsed the same as".
>
> I agree with Mike Kay's analysis in comment 3, that according to the current spec,
> *:b:c ~~ * : b:c
> a:*:c ~~ a : *:c
>
> (But note that
> a:b:c ~~ a:b : c
> because QName a:b is, at the starting position, the longest valid token.)
>
> -------------------------------------------------------------------------
> (1)
> Say we want these expressions to stay legal, but the ones involving '*' to be parsed differently:
> *:b:c ~~ *:b : c
> a:*:c ~~ a:* : c
>
> In comment 6, Mike suggested two ways to accomplish this, both by altering the Wildcard production such that the "longest possible match" rule forces the parser to interpret the first 3 characters as a Wildcard. One way (1a) is to change
> ... | NCName ":" "*" | "*" ":" NCName | ...
> to
> ... | NCName ":*" | "*:" NCName | ...
>
> And the other way (1b) is to make Wildcard a terminal symbol (i.e., move the production from A.1 to A.2.1).
>
> One side-effect of 1b is that, with Wildcard a terminal symbol, it would have to be added to either the list of delimiting or non-delimiting terminals. Presumably we'd make it non-delimiting, since it can start or end with a letter. But consider the examples:
> a:*div 3
> $x union*:b
> I believe these are currently syntactically legal and have been since XPath 1.0 or 2.0, because the "*" terminal is delimiting so doesn't require spacing from the keyword. But if Wildcard is made a non-delimiting terminal, these examples would become syntax errors (because successive non-delimiting terminals require intervening spacing).
>
> Although I'm doubtful that anyone would mind such examples becoming syntax errors, it *is* backwards-incompatible, so we might want to choose the first solution (1a) instead (and make ":*" and "*:" delimiting, which is no problem, since the symbols ":" and "*" are already delimiting).
>
> Alternatively (1b'), we could replace the delimiting/non-delimiting rule with a rule that allows Wildcard to be a terminal without making those examples illegal. (Though I'm not exactly sure how that would look.)
>
> -------------------------------------------------------------------------
> (2)
> One the other hand, we might want all three of the problem cases:
> map{a:b:c}
> map{a:*:c}
> map{*:b:c}
> to be illegal (syntax errors). (Note that here, such a change is *not* backwards-incompatible, because MapConstructors are new in 3.1.)
>
> E.g., we might somehow require the user to insert at least one space, to make their intent clear:
> a :b:c or a: b:c or a:b :c or a:b: c
> a :*:c or a: *:c or a:* :c or a:*: c
> * :b:c or *: b:c or *:b :c or *:b: c
>
> In comment 6, Mike Kay suggested adding another special case to the very end of A.2.2 Terminal Delimitation, though he wasn't entirely certain what it should be. Here's one possibility (2a):
> Also, when a QName or NCName or "*" is followed by a ":",
> an intervening separator is required.
> (This would cause the 2nd and 4th columns above to be illegal, but the user would still have columns 1 and 3 to express the two different semantics.)
>
> Or the other way around (2b), the rule could target ":" followed by QName or NCName or "*", which would make columns 1 and 3 illegal, 2 and 4 okay.
>
> (In English at least, it's more common to put a space after a colon than before, so I think people would be more inclined to write columns 2 and 4, which argues for 2b over 2a.)
>
> However, each rule has possibly unwanted side-effects. 2a would disallow examples such as:
> map{a:1}
> map{*:/b}
> and 2b would disallow:
> map{1:b}
> map{"a":b}
> All of these seem reasonable to me and not worthy of a syntax error, though I guess I'd get used to always putting in the space.
>
> Editorially, with either 2a or 2b, I think we'd have to make it clearer that the ":" in the rule is not the ":" in a Wildcard or QName. E.g., we could make it clearer that the whole delimiting/non-delimiting mess only applies in cases when ws:explicit doesn't apply.
>
> -Michael
>
Received on Tuesday, 22 March 2016 21:37:56 UTC