- From: Josh Spiegel <josh.spiegel@oracle.com>
- Date: Tue, 22 Mar 2016 14:37:17 -0700 (PDT)
- To: Michael Dyck <jmdyck@ibiblio.org>
- Cc: Public Joint XSLT XQuery XPath <public-xsl-query@w3.org>
I like 1a. - It makes a:b:c, a:*:c, and *:b:c behave consistently (key expressions a:b, a:*, and *:b) - It doesn’t introduce the unwanted side-effects that you mentioned - It appears to be a low risk, low effort change (would only involve a small change to the grammar) - Lots of valid queries are made easier to read by adding insignificant whitespace and I see map{a:b:c} as another case of this. Thanks, Josh > On Mar 22, 2016, at 2:07 PM, Michael Dyck <jmdyck@ibiblio.org> wrote: > >> ACTION A-635-03: MDyck to propose a solution to Bug 29501 re parsing colons. > > ref: https://www.w3.org/Bugs/Public/show_bug.cgi?id=29501 > > Although the action item tells me to propose *a* solution, this post will give 4 or 5 alternatives, as I think the WG needs to mull over the implications of the various choices (which I've tried to work out here). > > I'll use Josh's examples of > map{a:b:c} > map{a:*:c} > map{*:b:c} > but mostly leave out the map{} for the rest of this post. > > I'll use "~~" to mean "is parsed the same as". > > I agree with Mike Kay's analysis in comment 3, that according to the current spec, > *:b:c ~~ * : b:c > a:*:c ~~ a : *:c > > (But note that > a:b:c ~~ a:b : c > because QName a:b is, at the starting position, the longest valid token.) > > ------------------------------------------------------------------------- > (1) > Say we want these expressions to stay legal, but the ones involving '*' to be parsed differently: > *:b:c ~~ *:b : c > a:*:c ~~ a:* : c > > In comment 6, Mike suggested two ways to accomplish this, both by altering the Wildcard production such that the "longest possible match" rule forces the parser to interpret the first 3 characters as a Wildcard. One way (1a) is to change > ... | NCName ":" "*" | "*" ":" NCName | ... > to > ... | NCName ":*" | "*:" NCName | ... > > And the other way (1b) is to make Wildcard a terminal symbol (i.e., move the production from A.1 to A.2.1). > > One side-effect of 1b is that, with Wildcard a terminal symbol, it would have to be added to either the list of delimiting or non-delimiting terminals. Presumably we'd make it non-delimiting, since it can start or end with a letter. But consider the examples: > a:*div 3 > $x union*:b > I believe these are currently syntactically legal and have been since XPath 1.0 or 2.0, because the "*" terminal is delimiting so doesn't require spacing from the keyword. But if Wildcard is made a non-delimiting terminal, these examples would become syntax errors (because successive non-delimiting terminals require intervening spacing). > > Although I'm doubtful that anyone would mind such examples becoming syntax errors, it *is* backwards-incompatible, so we might want to choose the first solution (1a) instead (and make ":*" and "*:" delimiting, which is no problem, since the symbols ":" and "*" are already delimiting). > > Alternatively (1b'), we could replace the delimiting/non-delimiting rule with a rule that allows Wildcard to be a terminal without making those examples illegal. (Though I'm not exactly sure how that would look.) > > ------------------------------------------------------------------------- > (2) > One the other hand, we might want all three of the problem cases: > map{a:b:c} > map{a:*:c} > map{*:b:c} > to be illegal (syntax errors). (Note that here, such a change is *not* backwards-incompatible, because MapConstructors are new in 3.1.) > > E.g., we might somehow require the user to insert at least one space, to make their intent clear: > a :b:c or a: b:c or a:b :c or a:b: c > a :*:c or a: *:c or a:* :c or a:*: c > * :b:c or *: b:c or *:b :c or *:b: c > > In comment 6, Mike Kay suggested adding another special case to the very end of A.2.2 Terminal Delimitation, though he wasn't entirely certain what it should be. Here's one possibility (2a): > Also, when a QName or NCName or "*" is followed by a ":", > an intervening separator is required. > (This would cause the 2nd and 4th columns above to be illegal, but the user would still have columns 1 and 3 to express the two different semantics.) > > Or the other way around (2b), the rule could target ":" followed by QName or NCName or "*", which would make columns 1 and 3 illegal, 2 and 4 okay. > > (In English at least, it's more common to put a space after a colon than before, so I think people would be more inclined to write columns 2 and 4, which argues for 2b over 2a.) > > However, each rule has possibly unwanted side-effects. 2a would disallow examples such as: > map{a:1} > map{*:/b} > and 2b would disallow: > map{1:b} > map{"a":b} > All of these seem reasonable to me and not worthy of a syntax error, though I guess I'd get used to always putting in the space. > > Editorially, with either 2a or 2b, I think we'd have to make it clearer that the ":" in the rule is not the ":" in a Wildcard or QName. E.g., we could make it clearer that the whole delimiting/non-delimiting mess only applies in cases when ws:explicit doesn't apply. > > -Michael >
Received on Tuesday, 22 March 2016 21:37:56 UTC