Re: The longest token rule

You're quite right. I scanned the list (about three times) looking for ":".

Michael Kay
Saxonica

> On 23 Feb 2016, at 16:41, Benito van der Zander <benito@benibela.de> wrote:
> 
> 
> Hi Michael,
> it is a colon?
> 
> [Definition: The delimiting terminal symbols are: S, "!", "!=", StringLiteral, "#", "#)", "$", "%", "(", "(#", ")", "*", "+", (comma), "-", "-->", (dot), "..", "/", "//", "/>", (colon), "::", ":=", (semi-colon), "<", "<!--", "<![CDATA[", "</", "<<", "<=", "<?", "=", "=>", ">", ">=", ">>", "?", "?>", "@", BracedURILiteral, "[", "]", "]]>", "]``", "``[", "`{", "{", "|", "||", "}", "}`" ]
> 
> Cheers,
> Benito 
> 
> 
> On 23.02.2016 17:20, Michael Kay wrote:
>> Indeed, that reminds me that ":" is neither in the list of delimiting terminal symbols nor in the list of non-delimiting terminal symbols. It clearly needs to be mentioned in A.2.2 somewhere, and it does seem to have similar status to "-" and "." in that it is sometimes part of a name and sometimes not.
>> 
>> Michael Kay
>> Saxonica
>> 
>>> On 23 Feb 2016, at 15:52, Josh Spiegel <josh.spiegel@oracle.com> <mailto:josh.spiegel@oracle.com> wrote:
>>> 
>>> I agree and also fail your test. 
>>> 
>>> In A.2.2 it currently says this:
>>> 
>>> "It is customary to separate consecutive terminal symbols by whitespace and Comments, but this is required only when otherwise two non-delimiting symbols would be adjacent to each other. There are two exceptions to this, that of "." and "-", which do require a symbol separator if they follow a QName or NCName. Also, "." requires a separator if it precedes or follows a numeric literal.”
>>> 
>>> Should this be extended so a separator is required if “:" follows NCName?  If we did this, then these cases are XPST0003ed:
>>> 
>>>   let $m := map{'a':1} return map:size(map{$m?a:true()})
>>> 
>>>   map{a: b}
>>> 
>>>   map{a:b}
>>> 
>>> But these would be OK:
>>> 
>>>   map{a :b}
>>> 
>>>   map{a:b:c}
>>> 
>>>   map{a:b:c:d}
>>> 
>>> Assuming “:” is a delimiting symbol.  I think ":” should be a delimiting symbol but it isn’t in the list currently. 
>>> 
>>> By the way, there are other cases where text "consistent with the EBNF" raises XPST0003.  e.g.
>>> 
>>>  /*5
>>> 
>>> See xgc:leading-lone-slash.
>>> 
>>> Thanks,
>>> Josh
>>> 
>>> 
>>>> On Feb 23, 2016, at 1:54 AM, Michael Kay <mike@saxonica.com> <mailto:mike@saxonica.com> wrote:
>>>> 
>>>> We have for many years had the rule in A.2
>>>> 
>>>> When tokenizing, the longest possible match that is consistent with the EBNF is used."
>>>> 
>>>> and I have often wondered if there were cases where the phrase "that is consistent with the EBNF" actually affected the outcome. It suggests that the tokenization is sensitive to the grammatical context, which is a considerable complication.
>>>> 
>>>> I have submitted a test case MapConstructor-025 which does this:
>>>> 
>>>> let $m := map{'a':1} return map:size(map{$m?a:true()})
>>>> 
>>>> Although Saxon can't handle this, I believe it is permitted according to this rule. After the "?", an NCName is consistent with the EBNF but a QName containing a colon is not, so the longest token "consistent with the EBNF" is "a" rather than "a:true".
>>>> 
>>>> Any views on whether this is a correct interpretation of the rules?
>>>> 
>>>> Michael Kay
>>>> Saxonica
>>>> 
>>> 
>> 
>> 
> 

Received on Tuesday, 23 February 2016 16:47:53 UTC