[Bug 5251] [FO] Minimal match in starts-with(), ends-with()

http://www.w3.org/Bugs/Public/show_bug.cgi?id=5251





------- Comment #5 from mike@saxonica.com  2008-02-13 16:23 -------
The proposal was discussed at the telcon on 2008-02-12. There were two main
reservations expressed:

(a) the use of compare(substring(xxx)) was not equivalent to the current
behaviour because the substring function operates without knowledge of a
collation; thus this might find a match where the current spec would not. (See
minutes for example)

(b) Jim Melton felt unease about whether this was really a bug: were we sure
the status quo wasn't what the WG intended?

I would like to revise the proposal to make it a very simple change. In
starts-with() and ends-with(), (a) change "minimal match" to "match", and (b)
add the examples proposed in comment #4.

The relevant definitions from Unicode are:

DS2. There is a match according to C for P within Q[s,e] if and only if C
generates the same sort key for P as for Q[s,e], and the offsets s and e meet
the condition B.

DS4. The match is minimal if for all positive i and j, there is no match at
Q[s+i,e-j]. In such a case, we also say that P minimal matchs at Q[s,e].

Here C is the collation, B in our case is true so long as we are on a boundary
between collation units (we don't say this very clearly, but it's the best
interpretation), P is our second argument, and Q is the first argument. s and e
are character positions.

Note that DS4 is incorrect, is should only require one of i and j to be
positive, the other can be zero. This bug has been reported and accepted.

The current rule (for starts-with) requires a minimal match at the start of the
arg1 string. This means that if "-" is ignorable (as it often is), then
starts-with('-1', '-1') is false: the possible matches are on '-1' and '1'; the
first match doesn't count because it is not minimal, and the second doesn't
count because it is not at the start of the string. I find it impossible to
believe that the WG intended this: it's surely a reasonable expectation that
every string starts with itself and ends with itself.

Received on Wednesday, 13 February 2008 16:23:57 UTC