Re: [FT] Meaning of FTST0008 and FTST0018

On Sat, 11 Dec 2010 06:20:48 -0800, Paul J. Lucas <paul@lucasmail.org>
wrote:

> On Jun 25, 2010, at 11:16 AM, Mary Holstege wrote:
>
>> On Fri, 25 Jun 2010 09:00:19 -0700, Paul J. Lucas <paul@lucasmail.org>  
>> wrote:
>>
>>> From the specification section 3.4.3:
>>>
>>>> If the URI specifies a thesaurus that is not found in the statically  
>>>> known thesauri, an error is raised [err:FTST0018].
>>>
>>> I don't understand why URIs for these things would be "statically  
>>> known."  If the user wants to specify some arbitrary URI that points  
>>> to a valid thesaurus ..., why shouldn't that "just work?"
>>>
>>> Does "statically known" mean that the implementation has a set of  
>>> hard-coded URIs that the user can only select from?  If not, then what  
>>> does "statically known" mean here?
>>
>> "Statically known" means that it is part of the static context when the  
>> query is analyzed.  It is up to the implementation to decide what URIs  
>> it will consider as part of the static context, and whether it is a  
>> fixed list or more open.  ... So the idea here is that if an  
>> implementation wants to restrict the set of ... thesaurus expansions  
>> used to those that it actually indexes with, it can do that.   If it  
>> wants to apply everything in real time on the fly, it can do that do,  
>> by declaring every dereferencable URI as "statically known".
>
> The W3C Full Text test suite contains thesaurus tests that have  
> thesaurus URIs like:
>
>  http://bstore1.example.com/UsabilityThesaurus.xml
>  http://bstore1.example.com/TechnicalThesaurus.xml
>  http://bstore1.example.com/UsabilitySoundex.xml
>
> I believe your response above implies that, if an implementation wants  
> to pass these tests, it somehow has to incorporate these URIs into the  
> query's static context.
>
> Maybe I'm missing something obvious, but I don't see anything in either  
> the XQuery base language or the full text extension that allows a query  
> author the ability to do that.
>
> There's also no apparent way to bind a URI to some piece of  
> implementation code.  For example, in our XQuery implementation written  
> in C++, we're allowing the user to use Princeton's WordNet.  In order to  
> implement this, we're using a WordNet C++ API to access the WordNet  
> database, hence there is custom C++ code just for WordNet.  If at some  
> point we also wish to allow other thesauri, we'd imagine that more  
> custom C++ code would have to be written to access that thesaurus'  
> database since every thesaurus is likely to be in a different format  
> with a different C++ API.
>
> I don't see any way for an XQuery author to tell an implementation  
> something like:
>
>  declare thesaurus "http://wordnet.princeton.edu" as "wordnet";
>
> Hence, our implementation "knows" about the WordNet URI  
> "http://wordnet.princeton.edu" and internally maps it to libwordnet.so  
> by hard-coding that mapping.
>
> The URIs used in the full text test suite are just make-up URIs, so how  
> is an implementation wishing to pass these tests supposed to "statically  
> know" about these URIs and how is it supposed to map them to some piece  
> of implementation code?
>
> When the full text specification was written, its authors must have  
> envisioned at least one way in which an implementation would do that.   
> What is that way?  Hard-coding those made-up URIs into an implementation  
> just to pass the tests would seem strange.
>
> - Paul
>

Speaking for myself:

Whatever mechanism the implementation uses to hook in thesaurus processing,
that mechanism should be used in the testdriver as well.  It is expected
that an implementation will have some kind of internal data structure that
is initialized through various implementation-specific means and then
mapped to URIs in some way.  This is an implemention-specific configuration
issue, and therefore out of scope of the query language proper. There may
or may not be a URI resolver or mapper interposed between the thesaurus URI
in the query and the URI known to the implementation (or to the internal
identifierused by the implementation).  The expectation is not that the URI
maps to some special code, but that the URI maps to some specific data  
structure
that
is processed by some general thesaurus-handling code.  So, whatever those
means are: use them in your test driver.  I don't think we expect that to
be
hardcoded in the implementation in general, but if that is how your
implementation
works, then I guess that to run and report on those particular tests with
something more interesting than the unknown/unsupported thesaurus error,
you'll
have to have them hardcoded into the implementation somehow, or use some
substitute thesaurus that encodes the same/equivalent relations that you do
know about.  The point of 'statically known' is to say 'statically known
to the query' -- this doesn't mean that it is statically fixed in the
implementation; just that the binding is outside the scope of the query
itself.

//Mary

Received on Tuesday, 14 December 2010 18:37:41 UTC