[Bug 5251] [FO] Minimal match in starts-with(), ends-with()

http://www.w3.org/Bugs/Public/show_bug.cgi?id=5251





------- Comment #1 from mike@saxonica.com  2008-01-15 00:03 -------
There are a couple of further complications here.

Firstly, the definition of "minimal match" in Unicode UTS #10 appears to be
incorrect. It states "The match is minimal if for all positive i and j, there
is no match at Q[s+i,e-j]. In such a case, we also say that P minimal matchs at
Q[s,e]." which would imply that a match at Q[2,4] is minimal even if there is a
match at Q[3,4] or at Q[2,3] - it should allow either i or j (but not both) to
be zero. I have raised this as a Unicode error report. I'll get around this by
not referring normatively to the Unicode definition.

Secondly, the definition of minimal match is parameterized by a "boundary
condition" B, which we have not specified. An example of such a boundary
condition is a requirement that the match occur at a grapheme boundary, word
boundary, or sentence boundary. I believe it's our intention that there should
be no boundary constraints on the match.

Thirdly, F+O section 7.5 states: "In the definitions below, we say that $arg1
contains $arg2 at positions m through n if the collation units corresponding to
characters in positions m to n of $arg1 are the same as the collation units
corresponding to all the characters of $arg2 modulo ignorable collation units.
In the simple case of the Unicode code point collation, the collation units are
the same as the characters of the string. See [Unicode Collation Algorithm] for
a detailed discussion of substring matching." However, the text below never
relies on this definition; instead it refers directly to the concept of
"minimal match".

PROPOSAL

1. Replace the cited definition in section 7.5 by: 

"[Definition: following [Unicode Collation Algorithm], we say that $arg1
*matches* $arg2 at positions M through M+L under collation C if
fn:compare(fn:substring($arg1, M, L), $arg2, C) = 0, where M>=0 and
M+L<=fn:string-length($arg1)] [Definition: we say that $arg1 *minimally
matches* $arg2 at positions M through N if $arg1 *matches* $arg2 at positions M
through N and if there are no non-negative integers i and j, not both equal to
zero, such that $arg1 *matches* $arg2 at positions M+i through N-j.]

2. In all five functions, delete the note referring to "minimal match".

3. In contains(), replace the Summary with: 

"Summary: Returns an xs:boolean indicating whether or not the value of $arg1
contains the value of $arg2 (at the beginning, at the end, or anywhere within).

The result is true if and only if there is some pair of values M, N such that
$arg1 matches $arg2 at positions M through N under the requested collation."

4. In starts-with(), replace the Summary with:

Summary: Returns an xs:boolean indicating whether or not the value of $arg1
starts with the value of $arg2.

The result is true if and only if there is some value N such that $arg1 matches
$arg2 at positions 1 through N under the requested collation."

5. In ends-with(), replace the Summary with:

Summary: Returns an xs:boolean indicating whether or not the value of $arg1
ends with the value of $arg2.

The result is true if and only if there is some value M such that $arg1 matches
$arg2 at positions M through N under the requested collation, where N is equal
to fn:string-length($arg2)."

6. In substring-before, replace the Summary with:

Summary: Returns the substring of the value of $arg1 that precedes the first
occurrence of $arg2.

If fn:contains($arg1, $arg2) is true then the function returns 
fn:substring($arg1, 1, M), where M is the smallest integer such that $arg1
minimally matches $arg2 at positions M through N for some N. Otherwise the
function returns the zero-length string.

Delete the now-redundant sentence "If the value of $arg1 does not contain a
string that is equal to the value of $arg2, then the function returns the
zero-length string."

7. In substring-after, replace the Summary with:

Summary: Returns the substring of the value of $arg1 that follows the first
occurrence of $arg2.

If fn:contains($arg1, $arg2) is true then the function returns substring($arg1,
N+1) where N is the smallest integer such that $arg1 minimally matches $arg2 at
positions M through N for some M. Otherwise the function returns the
zero-length string.

Delete the now-redundant sentence "If the value of $arg1 does not contain a
string that is equal to the value of $arg2, then the function returns the
zero-length string."

8. Add examples to starts-with() and ends-with() indicating the effect if $arg1
starts/ends with an ignorable.

Received on Tuesday, 15 January 2008 00:03:43 UTC