W3C home > Mailing lists > Public > public-qt-comments@w3.org > February 2004

[F&O] IBM-FO-104: Description of substring matching should account for ignorable collations units

From: Henry Zongaro <zongaro@ca.ibm.com>
Date: Tue, 17 Feb 2004 20:43:20 -0500
To: public-qt-comments@w3.org
Message-ID: <OFE2CB6873.C0E2DE5D-ON85256E3B.007B2760-85256E3E.0009763B@ca.ibm.com>

[My apologies that these comments are coming in after the end of the Last 
Call comment period.]

Section 7.5

According to the sixth paragraph of this section, "In the definitions 
below, we say that $arg1 contains $arg2 at positions m through n if the 
collation units corresponding to characters in positions m to n of $arg1 
are the same as the collation units corresponding to all the characters of 
$arg2."

This definition is not sufficiently precise in the presence of ignorable 
collation units. The rules should be based on 
http://www.unicode.org/unicode/reports/tr10/#Searching (e.g. minimal or 
maximal. For all positive i and j, there is no match at Q[s-i,e+j].)

For example, '-' is ignorable for some collations. It is not clear whether 
substring-before("a-b", "b") returns "a" or "a-".  This needs to be 
clearly specified.  If it is implementation-dependent or 
implementation-defined, that should be clearly specified.


Thanks,

Henry
[Speaking on behalf of reviewers from IBM.]
------------------------------------------------------------------
Henry Zongaro      Xalan development
IBM SWS Toronto Lab   T/L 969-6044;  Phone +1 905 413-6044
mailto:zongaro@ca.ibm.com
Received on Tuesday, 17 February 2004 20:43:27 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:13:57 UTC