Re: The longest token rule

I agree and also fail your test. 

In A.2.2 it currently says this:

"It is customary to separate consecutive terminal symbols by whitespace and Comments, but this is required only when otherwise two non-delimiting symbols would be adjacent to each other. There are two exceptions to this, that of "." and "-", which do require a symbol separator if they follow a QName or NCName. Also, "." requires a separator if it precedes or follows a numeric literal.”

Should this be extended so a separator is required if “:" follows NCName?  If we did this, then these cases are XPST0003ed:

   let $m := map{'a':1} return map:size(map{$m?a:true()})
 
   map{a: b}
   
   map{a:b}
  
But these would be OK:

   map{a :b}

   map{a:b:c}

   map{a:b:c:d}

Assuming “:” is a delimiting symbol.  I think ":” should be a delimiting symbol but it isn’t in the list currently. 

By the way, there are other cases where text "consistent with the EBNF" raises XPST0003.  e.g.

  /*5

See xgc:leading-lone-slash.

Thanks,
Josh


> On Feb 23, 2016, at 1:54 AM, Michael Kay <mike@saxonica.com> wrote:
> 
> We have for many years had the rule in A.2
> 
> When tokenizing, the longest possible match that is consistent with the EBNF is used."
> 
> and I have often wondered if there were cases where the phrase "that is consistent with the EBNF" actually affected the outcome. It suggests that the tokenization is sensitive to the grammatical context, which is a considerable complication.
> 
> I have submitted a test case MapConstructor-025 which does this:
> 
> let $m := map{'a':1} return map:size(map{$m?a:true()})
> 
> Although Saxon can't handle this, I believe it is permitted according to this rule. After the "?", an NCName is consistent with the EBNF but a QName containing a colon is not, so the longest token "consistent with the EBNF" is "a" rather than "a:true".
> 
> Any views on whether this is a correct interpretation of the rules?
> 
> Michael Kay
> Saxonica
> 

Received on Tuesday, 23 February 2016 15:53:29 UTC