- From: <bugzilla@wiggum.w3.org>
- Date: Tue, 15 Jan 2008 00:03:39 +0000
- To: public-qt-comments@w3.org
- CC:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5251 ------- Comment #1 from mike@saxonica.com 2008-01-15 00:03 ------- There are a couple of further complications here. Firstly, the definition of "minimal match" in Unicode UTS #10 appears to be incorrect. It states "The match is minimal if for all positive i and j, there is no match at Q[s+i,e-j]. In such a case, we also say that P minimal matchs at Q[s,e]." which would imply that a match at Q[2,4] is minimal even if there is a match at Q[3,4] or at Q[2,3] - it should allow either i or j (but not both) to be zero. I have raised this as a Unicode error report. I'll get around this by not referring normatively to the Unicode definition. Secondly, the definition of minimal match is parameterized by a "boundary condition" B, which we have not specified. An example of such a boundary condition is a requirement that the match occur at a grapheme boundary, word boundary, or sentence boundary. I believe it's our intention that there should be no boundary constraints on the match. Thirdly, F+O section 7.5 states: "In the definitions below, we say that $arg1 contains $arg2 at positions m through n if the collation units corresponding to characters in positions m to n of $arg1 are the same as the collation units corresponding to all the characters of $arg2 modulo ignorable collation units. In the simple case of the Unicode code point collation, the collation units are the same as the characters of the string. See [Unicode Collation Algorithm] for a detailed discussion of substring matching." However, the text below never relies on this definition; instead it refers directly to the concept of "minimal match". PROPOSAL 1. Replace the cited definition in section 7.5 by: "[Definition: following [Unicode Collation Algorithm], we say that $arg1 *matches* $arg2 at positions M through M+L under collation C if fn:compare(fn:substring($arg1, M, L), $arg2, C) = 0, where M>=0 and M+L<=fn:string-length($arg1)] [Definition: we say that $arg1 *minimally matches* $arg2 at positions M through N if $arg1 *matches* $arg2 at positions M through N and if there are no non-negative integers i and j, not both equal to zero, such that $arg1 *matches* $arg2 at positions M+i through N-j.] 2. In all five functions, delete the note referring to "minimal match". 3. In contains(), replace the Summary with: "Summary: Returns an xs:boolean indicating whether or not the value of $arg1 contains the value of $arg2 (at the beginning, at the end, or anywhere within). The result is true if and only if there is some pair of values M, N such that $arg1 matches $arg2 at positions M through N under the requested collation." 4. In starts-with(), replace the Summary with: Summary: Returns an xs:boolean indicating whether or not the value of $arg1 starts with the value of $arg2. The result is true if and only if there is some value N such that $arg1 matches $arg2 at positions 1 through N under the requested collation." 5. In ends-with(), replace the Summary with: Summary: Returns an xs:boolean indicating whether or not the value of $arg1 ends with the value of $arg2. The result is true if and only if there is some value M such that $arg1 matches $arg2 at positions M through N under the requested collation, where N is equal to fn:string-length($arg2)." 6. In substring-before, replace the Summary with: Summary: Returns the substring of the value of $arg1 that precedes the first occurrence of $arg2. If fn:contains($arg1, $arg2) is true then the function returns fn:substring($arg1, 1, M), where M is the smallest integer such that $arg1 minimally matches $arg2 at positions M through N for some N. Otherwise the function returns the zero-length string. Delete the now-redundant sentence "If the value of $arg1 does not contain a string that is equal to the value of $arg2, then the function returns the zero-length string." 7. In substring-after, replace the Summary with: Summary: Returns the substring of the value of $arg1 that follows the first occurrence of $arg2. If fn:contains($arg1, $arg2) is true then the function returns substring($arg1, N+1) where N is the smallest integer such that $arg1 minimally matches $arg2 at positions M through N for some M. Otherwise the function returns the zero-length string. Delete the now-redundant sentence "If the value of $arg1 does not contain a string that is equal to the value of $arg2, then the function returns the zero-length string." 8. Add examples to starts-with() and ends-with() indicating the effect if $arg1 starts/ends with an ignorable.
Received on Tuesday, 15 January 2008 00:03:43 UTC