Re: The longest token rule

On 16-02-23 04:54 AM, Michael Kay wrote:
> We have for many years had the rule in A.2
>
>   "When tokenizing, the longest possible match that is consistent with
>the EBNF is used."

(It used to say "... that is valid in the current context", but we decided
to change it at meeting #541 (2013-05-21), based on discussion prompted by a
message from you:
https://lists.w3.org/Archives/Member/w3c-xsl-query/2013Feb/0059.html
)

> and I have often wondered if there were cases where the phrase "that is
> consistent with the EBNF" actually affected the outcome. It suggests that
> the tokenization is sensitive to the grammatical context, which is a
> considerable complication.

For XQuery, tokenization has always had to be sensitive to grammatical
context. E.g., consider:
     let $t := <title>let it be</title> ...
The way that you 'tokenize' the three characters 'l', 'e', 't' differs
depending on the grammatical context.

And "Building a Tokenizer for XPath or XQuery" is complicated precisely
*because* tokenization has to be sensitive to grammatical context.
[https://www.w3.org/TR/xquery-xpath-parsing/]


> I have submitted a test case MapConstructor-025 which does this:
>
> let $m := map{'a':1} return map:size(map{$m?a:true()})
>
> Although Saxon can't handle this, I believe it is permitted according to
> this rule. After the "?", an NCName is consistent with the EBNF but a
> QName containing a colon is not, so the longest token "consistent with
> the EBNF" is "a" rather than "a:true".
>
> Any views on whether this is a correct interpretation of the rules?

Sounds correct to me.

-Michael

Received on Saturday, 27 February 2016 23:48:19 UTC