[F&O] IBM-FO-104: Description of substring matching should account for ignorable collations units

[My apologies that these comments are coming in after the end of the Last 
Call comment period.]

Section 7.5

According to the sixth paragraph of this section, "In the definitions 
below, we say that $arg1 contains $arg2 at positions m through n if the 
collation units corresponding to characters in positions m to n of $arg1 
are the same as the collation units corresponding to all the characters of 
$arg2."

This definition is not sufficiently precise in the presence of ignorable 
collation units. The rules should be based on 
http://www.unicode.org/unicode/reports/tr10/#Searching (e.g. minimal or 
maximal. For all positive i and j, there is no match at Q[s-i,e+j].)

For example, '-' is ignorable for some collations. It is not clear whether 
substring-before("a-b", "b") returns "a" or "a-".  This needs to be 
clearly specified.  If it is implementation-dependent or 
implementation-defined, that should be clearly specified.


Thanks,

Henry
[Speaking on behalf of reviewers from IBM.]
------------------------------------------------------------------
Henry Zongaro      Xalan development
IBM SWS Toronto Lab   T/L 969-6044;  Phone +1 905 413-6044
mailto:zongaro@ca.ibm.com

Received on Tuesday, 17 February 2004 20:43:27 UTC