- From: <bugzilla@wiggum.w3.org>
- Date: Mon, 24 Nov 2008 20:51:18 +0000
- To: public-qt-comments@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=6195
Mary Holstege <holstege@mathling.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |holstege@mathling.com
Status|NEW |RESOLVED
Resolution| |FIXED
--- Comment #1 from Mary Holstege <holstege@mathling.com> 2008-11-24 20:51:17 ---
The WG discussed this issue and agreed we need to augment the
testsuite. Please note that we have not yet completely implemented
the use of this new system throughout the testsuite. If you are satisfied
with this resolution, please mark the bug as closed.
Please note the following addition to the instructions:
<quote>
Special Sources: Stop Word List, Thesaurus, and Stemming Dictionary
The stopwords, thesaurus, and stemming-dictionary sources are not intended
to be used directly in the form in which they are given, but to provide
information to those running the test suite about the expectations a
particular test has about various implementation-specific aspects of the
execution context. Implementations are expected to provide equivalent
information to the query, but in whatever form is appropriate in their
context. A stopwords source is a plain text file containing list of stop
words, one per line. When a query references this stop word list, the
implementation is expected to provide that list of stop words to the
query. A thesaurus source is an XML document defined against the
thesaurus.xsd XML Schema. When a query references this thesaurus, the
implementation is expected to provide equivalent thesaurus information to
the query. The stemming-dictionary is a plain text file containing lines
of whitespace-separated tokens. Each token on the line should stem to the
first token on the line. When the catalog entry for a query references a
stemming dictionary, the implementation is expected to provide stemming
equivalent to the rules given in the stemming dictionary.
</quote>
The basic idea is that there are three new kinds of sources:
A stop word list, which is just a text file, one stop word per line;
a thesaurus, which is an XML file as per the schema; and a stemming
dictionary, which is one stem per line.
The catalog descriptions for stop word lists and thesauri include a URI
that matches up with the one in the query. This is similar to the
handling of schemas. The stemming dictionary has no URI: it is the resource
ID that matters and it is used to define the relevant stem equivalents
when it makes a difference for stemmed search.
** Changes to XQFTTSCatalog.xsd/xml:
Add three new kinds of source roles: stopwords, thesaurus, and
stemming-dictionary, and corresponding elements in the sources part of
the catalog. Add an aux-URI element to the test-case itself.
Queries that use a URI for a stop words list should have an aux-URI with
role="stopwords"; queries that us a URI for a thesaurus should have an
aux-URI with role="thesaurus". Queries that rely on particular stemming
behaviour should have an aux-URI with role="stemming-dictionary".
** Examples:
* Stop words:
TestSources/stopwords.txt:
and
the
then
it
of
in
Catalog description:
<stopwords ID="stopwords1"
uri="http://bstore1.example.com/StopWordList.xml" FileName="stopwords.txt"
Creator="Full-Text Task Force">
<description last-mod="2008-11-10">Stop word list for use
cases</description>
</stopwords>
Query description using stopwords
(with stop words at "http://bstore1.example.com/StopWordList.xml"):
<test-case is-XPath2="true" name="stopwords-1"
FilePath="Expressions/Operators/CompExpr/FTContainsExpr/FTSelection/MatchOptions/FTStopWord/"
scenario="standard" Creator="Full-Text Task Force">
<description>Example using stop words</description>
<spec-citation spec="XQueryFullText" section-number="3.4.7"
section-title="Stop Word Option" section-pointer="ftstopwordoption"/>
<query name="stopword-1" date="2008-11-10"/>
<aux-URI role="stopwords">stopwords1</aux-uri>
<input-file role="principal-data"
variable="input-context">ftusecases</input-file>
<output-file role="principal"
compare="XML">stopwords-1.xml</output-file>
</test-case>
* Thesaurus: (Schema is TestSources/thesaurus.xsd)
TestSources/soundex.xml:
<thesaurus xmlns="http://www.w3.org/xqftts/thesarus">
<entry>
<term>Marigold</term>
<synonym>
<term>Merrygould</term>
<relationship>sounds like</relationship>
</synonym>
</entry>
</thesaurus>
Catalog description:
<thesaurus ID="soundex"
uri="http://bstore1.example.com/UsabilitySoundex.xml"
FileName="soundex.txt"
Creator="Full-Text Task Force">
<description last-mod="2008-11-10">Soundex thesaurus for
examples</description>
</thesaurus>
Query using thesaurus:
(with thesaurus at "http://bstore1.example.com/UsabilitySoundex.xml"):
<test-case is-XPath2="true" name="thesaurus-1"
FilePath="Expressions/Operators/CompExpr/FTContainsExpr/FTSelection/MatchOptions/FTThesaurus/"
scenario="standard" Creator="Full-Text Task Force">
<description>Example using stop words</description>
<spec-citation spec="XQueryFullText" section-number="3.4.3"
section-title="Thesaurus Option" section-pointer="ftthesaurusoption"/>
<query name="thesaurus-1" date="2008-11-10"/>
<aux-URI role="thesaurus">soundex</aux-uri>
<input-file role="principal-data"
variable="input-context">ftusecases</input-file>
<output-file role="principal"
compare="XML">thesaurus-1.xml</output-file>
</test-case>
* Stemming
TestSources/english-stems.txt
improve improves improving improved
dog dogs
cat cats
train trains training trained
error errors
Catalog description:
<stemming-dictionary ID="english-stems" FileName="english-stems.txt"
Creator="Full-Text Task Force">
<description last-mod="2008-11-10">English stems</description>
</stemming-dictionary>
Query using thesaurus:
(with stemming)
<test-case is-XPath2="true" name="stemming-1"
FilePath="Expressions/Operators/CompExpr/FTContainsExpr/FTSelection/MatchOptions/FTStemming/"
scenario="standard" Creator="Full-Text Task Force">
<description>Example using stemming</description>
<spec-citation spec="XQueryFullText" section-number="3.4.4"
section-title="Stemming Option" section-pointer="ftstemoption"/>
<query name="stemming-1" date="2008-11-10"/>
<aux-URI role="stemming-dictionary">english</aux-uri>
<input-file role="principal-data"
variable="input-context">ftusecases</input-file>
<output-file role="principal"
compare="XML">stemming-1.xml</output-file>
</test-case>
--
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Monday, 24 November 2008 20:51:29 UTC