- From: Michael Dyck <jmdyck@ibiblio.org>
- Date: Tue, 22 Mar 2016 17:07:27 -0400
- To: Public Joint XSLT XQuery XPath <public-xsl-query@w3.org>
> ACTION A-635-03: MDyck to propose a solution to Bug 29501 re parsing colons. ref: https://www.w3.org/Bugs/Public/show_bug.cgi?id=29501 Although the action item tells me to propose *a* solution, this post will give 4 or 5 alternatives, as I think the WG needs to mull over the implications of the various choices (which I've tried to work out here). I'll use Josh's examples of map{a:b:c} map{a:*:c} map{*:b:c} but mostly leave out the map{} for the rest of this post. I'll use "~~" to mean "is parsed the same as". I agree with Mike Kay's analysis in comment 3, that according to the current spec, *:b:c ~~ * : b:c a:*:c ~~ a : *:c (But note that a:b:c ~~ a:b : c because QName a:b is, at the starting position, the longest valid token.) ------------------------------------------------------------------------- (1) Say we want these expressions to stay legal, but the ones involving '*' to be parsed differently: *:b:c ~~ *:b : c a:*:c ~~ a:* : c In comment 6, Mike suggested two ways to accomplish this, both by altering the Wildcard production such that the "longest possible match" rule forces the parser to interpret the first 3 characters as a Wildcard. One way (1a) is to change ... | NCName ":" "*" | "*" ":" NCName | ... to ... | NCName ":*" | "*:" NCName | ... And the other way (1b) is to make Wildcard a terminal symbol (i.e., move the production from A.1 to A.2.1). One side-effect of 1b is that, with Wildcard a terminal symbol, it would have to be added to either the list of delimiting or non-delimiting terminals. Presumably we'd make it non-delimiting, since it can start or end with a letter. But consider the examples: a:*div 3 $x union*:b I believe these are currently syntactically legal and have been since XPath 1.0 or 2.0, because the "*" terminal is delimiting so doesn't require spacing from the keyword. But if Wildcard is made a non-delimiting terminal, these examples would become syntax errors (because successive non-delimiting terminals require intervening spacing). Although I'm doubtful that anyone would mind such examples becoming syntax errors, it *is* backwards-incompatible, so we might want to choose the first solution (1a) instead (and make ":*" and "*:" delimiting, which is no problem, since the symbols ":" and "*" are already delimiting). Alternatively (1b'), we could replace the delimiting/non-delimiting rule with a rule that allows Wildcard to be a terminal without making those examples illegal. (Though I'm not exactly sure how that would look.) ------------------------------------------------------------------------- (2) One the other hand, we might want all three of the problem cases: map{a:b:c} map{a:*:c} map{*:b:c} to be illegal (syntax errors). (Note that here, such a change is *not* backwards-incompatible, because MapConstructors are new in 3.1.) E.g., we might somehow require the user to insert at least one space, to make their intent clear: a :b:c or a: b:c or a:b :c or a:b: c a :*:c or a: *:c or a:* :c or a:*: c * :b:c or *: b:c or *:b :c or *:b: c In comment 6, Mike Kay suggested adding another special case to the very end of A.2.2 Terminal Delimitation, though he wasn't entirely certain what it should be. Here's one possibility (2a): Also, when a QName or NCName or "*" is followed by a ":", an intervening separator is required. (This would cause the 2nd and 4th columns above to be illegal, but the user would still have columns 1 and 3 to express the two different semantics.) Or the other way around (2b), the rule could target ":" followed by QName or NCName or "*", which would make columns 1 and 3 illegal, 2 and 4 okay. (In English at least, it's more common to put a space after a colon than before, so I think people would be more inclined to write columns 2 and 4, which argues for 2b over 2a.) However, each rule has possibly unwanted side-effects. 2a would disallow examples such as: map{a:1} map{*:/b} and 2b would disallow: map{1:b} map{"a":b} All of these seem reasonable to me and not worthy of a syntax error, though I guess I'd get used to always putting in the space. Editorially, with either 2a or 2b, I think we'd have to make it clearer that the ":" in the rule is not the ":" in a Wildcard or QName. E.g., we could make it clearer that the whole delimiting/non-delimiting mess only applies in cases when ws:explicit doesn't apply. -Michael
Received on Tuesday, 22 March 2016 21:07:58 UTC