Re: Range conformance

In XPath a statement
45 to 2
is not considered an error, though this isn’t as complex as a case in a grammar

Sent from my iPad

> On 24 Feb 2022, at 15:42, C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> wrote:
> 
> 
> Steven Pemberton writes:
> 
>> The spec says:
>>    A range matches any character in the range from the start
>>    character to the end, inclusive, using the Unicode ordering
>> 
>> It doesn't require the start character to be earlier in the ordering
>> than the end character. This means that
>> 
>>    ["z"-"a"]
>> is the same as
>>    [ ]
>> 
>> Do we care? Should it be an error, or a warning?
> 
> I think a warning makes sense only if there is some meaning that can be
> attached to the range as specified. We can
> 
> (a) allow the arguments in either order and say that the range matches any
>  character between from the two points specified, inclusive;
> 
> (b) say that it matches the starting code point, the ending code point,
>  and any characters matching code points between the two (so ["z" -
>  "a"] is equivalent to ["za"]);
> 
> (c) invent some other meaning for ["z"-"a"]; or
> 
> (d) call it an error.
> 
> I don't know off hand what other regular expression notations people may
> be familiar with do.  Looking it up, I find that XSD defines the meaning
> of a range this way:
> 
>    A ·character range· in the form s-e identifies the set of characters
>    with UCS code points greater than or equal to the code point of s,
>    but not greater than the code point of e.
> 
> This seems to fall in class (c).  Since no characters have UCS code
> points p with (p ≥ 97) ∧ (p ≤ 122), that means that in XSD regular
> expressions (and, I guess, XPath 3 regular expressions), [z-a] matches
> nothing and is thus equivalent to ixml [].
> 
> I am agnostic, but in the abstract I would lean towards (a) or (d).
> 
> Since a processor has to check the order either way, I don't think (a)
> imposes any new cost on the processor, and it does make ixml processors
> less fussy.  (And using Earley parsing pretty much says off the bat that
> high performance in parsing is not a goal for ixml.)
> 
> But unless we find some regular expression syntax that is reasonably
> widely used that uses interpretation (a), I think the principle of least
> surprise would lead us to (d).
> 
> -- 
> C. M. Sperberg-McQueen
> Black Mesa Technologies LLC
> http://blackmesatech.com
> 

Received on Thursday, 24 February 2022 15:47:13 UTC