Re: Range conformance

Steven Pemberton writes:

> The spec says:
>  A range matches any character in the range from the start
>  character to the end, inclusive, using the Unicode ordering
>
> It doesn't require the start character to be earlier in the ordering
> than the end character. This means that
>
>  ["z"-"a"]
> is the same as
>  [ ]
>
> Do we care? Should it be an error, or a warning?

I think a warning makes sense only if there is some meaning that can be
attached to the range as specified. We can

(a) allow the arguments in either order and say that the range matches any
  character between from the two points specified, inclusive;

(b) say that it matches the starting code point, the ending code point,
  and any characters matching code points between the two (so ["z" -
  "a"] is equivalent to ["za"]);

(c) invent some other meaning for ["z"-"a"]; or

(d) call it an error.

I don't know off hand what other regular expression notations people may
be familiar with do.  Looking it up, I find that XSD defines the meaning
of a range this way:

    A ·character range· in the form s-e identifies the set of characters
    with UCS code points greater than or equal to the code point of s,
    but not greater than the code point of e.

This seems to fall in class (c).  Since no characters have UCS code
points p with (p ≥ 97) ∧ (p ≤ 122), that means that in XSD regular
expressions (and, I guess, XPath 3 regular expressions), [z-a] matches
nothing and is thus equivalent to ixml [].

I am agnostic, but in the abstract I would lean towards (a) or (d).

Since a processor has to check the order either way, I don't think (a)
imposes any new cost on the processor, and it does make ixml processors
less fussy.  (And using Earley parsing pretty much says off the bat that
high performance in parsing is not a goal for ixml.)

But unless we find some regular expression syntax that is reasonably
widely used that uses interpretation (a), I think the principle of least
surprise would lead us to (d).

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Thursday, 24 February 2022 15:42:41 UTC