[Bug 3698] [FT] Interaction between FTDiacriticsOption and collation unclear

http://www.w3.org/Bugs/Public/show_bug.cgi?id=3698

           Summary: [FT] Interaction between FTDiacriticsOption and
                    collation unclear
           Product: XPath / XQuery / XSLT
           Version: Working drafts
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Full Text
        AssignedTo: jim.melton@acm.org
        ReportedBy: doerre@de.ibm.com
         QAContact: public-qt-comments@w3.org


Editorial

Some of the entries of the Diacritics Matrix in 3.2.2 do not clearly describe
what the intended comparison operation for the given case should be. In
particular, the entries for 
 - entry for UCC / "insensitive", which states "compare as if with and without"
(well, what???)
 - 4 entries for UCC+CDS / "with" + "without diacritics", which use an
exemplary query.
The reader has no clue how to interpret those exemplary queries and even if
they are meant to show how to reduce the "with" and "without" options to the
other options, there are several problems with those queries. 

E.g. in the entry for CDS / "with diacritics" the query stated there: 

  "resume diacritics insensitive" not in  "resume"

(i) is syntactically not what it meant to be (probably: "resume" diacritics
insensitive not in  "resume"), 
(ii) depends on diacritic options higher up the query tree, or a specified
default for the diacritic option (note that the second "resume" term is matched
according to that diacritic setting); 
and (iii) can never have a match in the default case where the second "resume"
is matched insensitive as well.

So maybe, this query should be: 

  "resume" diacritics insensitive not in  "resume" diacritics sensitive

(which would indeed be an equivalent rewrite for "resume" with diacritics,
because the term "resume" is spelled deliberately without diacritics in the
second subquery), but then what would be the case for "without diacritics"?
Also the rewriting relies that we have control over whether the query term
contains diacritics itself and how it would need to be transposed in case it
did. In general, however, we cannot assume this. E.g. consider the query:
 $node ftcontains $term with diacritics

/jochen

Received on Monday, 11 September 2006 16:40:24 UTC