completed A-635-03: parsing colons

> ACTION A-635-03: MDyck to propose a solution to Bug 29501 re parsing colons.

ref: https://www.w3.org/Bugs/Public/show_bug.cgi?id=29501

Although the action item tells me to propose *a* solution, this post will 
give 4 or 5 alternatives, as I think the WG needs to mull over the 
implications of the various choices (which I've tried to work out here).

I'll use Josh's examples of
     map{a:b:c}
     map{a:*:c}
     map{*:b:c}
but mostly leave out the map{} for the rest of this post.

I'll use "~~" to mean "is parsed the same as".

I agree with Mike Kay's analysis in comment 3, that according to the current 
spec,
     *:b:c   ~~   *  :  b:c
     a:*:c   ~~   a  :  *:c

(But note that
     a:b:c   ~~   a:b : c
because QName a:b is, at the starting position, the longest valid token.)

-------------------------------------------------------------------------
(1)
Say we want these expressions to stay legal, but the ones involving '*' to 
be parsed differently:
     *:b:c   ~~   *:b  :  c
     a:*:c   ~~   a:*  :  c

In comment 6, Mike suggested two ways to accomplish this, both by altering 
the Wildcard production such that the "longest possible match" rule forces 
the parser to interpret the first 3 characters as a Wildcard. One way (1a) 
is to change
     ... | NCName ":" "*" | "*" ":" NCName | ...
to
     ... | NCName ":*"    | "*:" NCName    | ...

And the other way (1b) is to make Wildcard a terminal symbol (i.e., move the 
production from A.1 to A.2.1).

One side-effect of 1b is that, with Wildcard a terminal symbol, it would 
have to be added to either the list of delimiting or non-delimiting 
terminals. Presumably we'd make it non-delimiting, since it can start or end 
with a letter. But consider the examples:
     a:*div 3
     $x union*:b
I believe these are currently syntactically legal and have been since XPath 
1.0 or 2.0, because the "*" terminal is delimiting so doesn't require 
spacing from the keyword. But if Wildcard is made a non-delimiting terminal, 
these examples would become syntax errors (because successive non-delimiting 
terminals require intervening spacing).

Although I'm doubtful that anyone would mind such examples becoming syntax 
errors, it *is* backwards-incompatible, so we might want to choose the first 
solution (1a) instead (and make ":*" and "*:" delimiting, which is no 
problem, since the symbols ":" and "*" are already delimiting).

Alternatively (1b'), we could replace the delimiting/non-delimiting rule 
with a rule that allows Wildcard to be a terminal without making those 
examples illegal. (Though I'm not exactly sure how that would look.)

-------------------------------------------------------------------------
(2)
One the other hand, we might want all three of the problem cases:
     map{a:b:c}
     map{a:*:c}
     map{*:b:c}
to be illegal (syntax errors). (Note that here, such a change is *not* 
backwards-incompatible, because MapConstructors are new in 3.1.)

E.g., we might somehow require the user to insert at least one space, to 
make their intent clear:
     a :b:c    or    a: b:c    or    a:b :c    or    a:b: c
     a :*:c    or    a: *:c    or    a:* :c    or    a:*: c
     * :b:c    or    *: b:c    or    *:b :c    or    *:b: c

In comment 6, Mike Kay suggested adding another special case to the very end 
of A.2.2 Terminal Delimitation, though he wasn't entirely certain what it 
should be. Here's one possibility (2a):
     Also, when a QName or NCName or "*" is followed by a ":",
     an intervening separator is required.
(This would cause the 2nd and 4th columns above to be illegal, but the user 
would still have columns 1 and 3 to express the two different semantics.)

Or the other way around (2b), the rule could target ":" followed by QName or 
NCName or "*", which would make columns 1 and 3 illegal, 2 and 4 okay.

(In English at least, it's more common to put a space after a colon than 
before, so I think people would be more inclined to write columns 2 and 4, 
which argues for 2b over 2a.)

However, each rule has possibly unwanted side-effects. 2a would disallow 
examples such as:
     map{a:1}
     map{*:/b}
and 2b would disallow:
     map{1:b}
     map{"a":b}
All of these seem reasonable to me and not worthy of a syntax error, though 
I guess I'd get used to always putting in the space.

Editorially, with either 2a or 2b, I think we'd have to make it clearer that 
the ":" in the rule is not the ":" in a Wildcard or QName. E.g., we could 
make it clearer that the whole delimiting/non-delimiting mess only applies 
in cases when ws:explicit doesn't apply.

-Michael

Received on Tuesday, 22 March 2016 21:07:58 UTC