Re: [Bug 6809] New: [FT] Test Suite - Thesaurus Queries

On Tue, 14 Apr 2009 11:17:34 -0700, Jim Melton <jim.melton@oracle.com>  
wrote:

> Christian,

>> [1] ft-3.4.3-examples-q1
>>
>> The usability.xml thesaurus file returns the synonym "tasks" for the  
>> query
>> input "duties" - but the queried document node includes only the word in
>> singular ("task" instead of "tasks"). Is this intended?
>
> I would say "no" because I don't believe that
> thesauri are expected to *ALSO* do stemming.  But
> Pat or Mary will have more authoritative responses.

I think this comes down to the "match option application order"
Is stemming applied before or after thesaurus expansion?
It looks like this case is assuming after.   IIRC, At one point
we had an explicit ordering, which we have since relaxed.
So I think we should try to make the test independent of
order if we can.

>> [3] ft-3.4.3-examples-q3.xq
>>
>> In this query, words similar to "Merrygould" are to be found. As "case
>> insensitive" is the default options, the term is converted to  
>> "merrygould" in
>> my tests - so the thesaurus doesn't return any result.
>
> This is something that you've done incorrectly,
> I'm sorry to say.  If you look into the Unicode
> rules for comparing character strings, you'll
> find that "case insensitive" explicitly does NOT
> mean "put everything into lowercase (or
> uppercase) and then do the comparison".  While
> that sometimes (almost always, in fact) works for
> languages that use the simple Latin script (a/k/a
> "ASCII"), it begins to break when moving into
> Eastern European scripts.  You should at least
> consider whether you should implement the Unicode
> "case insensitive" comparison rules.
>
> Aside from that, I would expect that thesauri
> searches should be done with case-insensitive
> comparisons, in which case the thesaurus search
> would properly find "Merrygould".  Pat and Mary
> will be more authoritative than I, however.

Well, again, it depends on whether the case option applies
before or after the thesaurus option.   Again, it looks like
the test is assuming a particular order of application, so we
need to fix up the test/thesaurus, or allow an alternative result.

>> [6] ft-3.4.3-expressions-q5
>>
>> ..references the missing file "TechnicalThesaurus.xml".
>
> The test suite catalog has this element:
>
>      <thesaurus ID="technical"
> uri="http://bstore1.example.com/TechnicalThesaurus.xml"
> FileName="TestSources/intentionally-missing.xml"
> Creator="Full-Text Task Force">
>        <description
> last-mod="2009-01-09">(Missing) technical thesaurus</description>
>      </thesaurus>
>
>  From the FileName and from the description, I
> believe it's evident that the file is INTENDED to
> be missing.  However, looking at the catalog entry for the test:
>
>                    <test-case is-XPath2="false"
> name="ft-3.4.3-expressions-q5"
> FilePath="Expressions/Operators/CompExpr/FTContainsExpr/FTSelection/MatchOptions/FTThesaurus/"
> scenario="standard" Creator="Full-Text Task Force">
>                      <description>WIth thesaurus
> level query. Find infrastructure at the 2nd level
> of narrower terms in a thesaurus.</description>
>                      <spec-citation
> spec="XQueryFullText" section-number="3.4.3"
> section-title="Thesaurus Option" section-pointer="ftthesaurusoption"/>
>                      <query name="ftthesaurus-q5" date="2008-11-28"/>
>                      <aux-URI role="thesaurus">usability</aux-URI>
>                      <input-file
> role="principal-data" variable="input-context">ftusecases</input-file>
>                      <output-file
> role="principal"  
> compare="Fragment">ftthesaurus-results-q5.txt</output-file>
>                    </test-case>

It  looks to me like the XQuery incorrectly references the wrong thesaurus.

//Mary

Received on Tuesday, 14 April 2009 18:58:24 UTC