Re: [FT] Meaning of FTST0008 and FTST0018

On Sat, 11 Dec 2010 06:20:48 -0800, Paul J. Lucas <paul@lucasmail.org>
wrote:

> On Jun 25, 2010, at 11:16 AM, Mary Holstege wrote:
>
>> On Fri, 25 Jun 2010 09:00:19 -0700, Paul J. Lucas <paul@lucasmail.org>  
>> wrote:
>>
>>> From the specification section 3.4.3:
>>>
>>>> If the URI specifies a thesaurus that is not found in the statically  
>>>> known thesauri, an error is raised [err:FTST0018].
>>>
>>> I don't understand why URIs for these things would be "statically  
>>> known."  If the user wants to specify some arbitrary URI that points  
>>> to a valid thesaurus ..., why shouldn't that "just work?"
>>>
>>> Does "statically known" mean that the implementation has a set of  
>>> hard-coded URIs that the user can only select from?  If not, then what  
>>> does "statically known" mean here?
>>
>> "Statically known" means that it is part of the static context when the  
>> query is analyzed.  It is up to the implementation to decide what URIs  
>> it will consider as part of the static context, and whether it is a  
>> fixed list or more open.  ... So the idea here is that if an  
>> implementation wants to restrict the set of ... thesaurus expansions  
>> used to those that it actually indexes with, it can do that.   If it  
>> wants to apply everything in real time on the fly, it can do that do,  
>> by declaring every dereferencable URI as "statically known".
>
> The W3C Full Text test suite contains thesaurus tests that have  
> thesaurus URIs like:
>
>  http://bstore1.example.com/UsabilityThesaurus.xml
>  http://bstore1.example.com/TechnicalThesaurus.xml
>  http://bstore1.example.com/UsabilitySoundex.xml
>
> I believe your response above implies that, if an implementation wants  
> to pass these tests, it somehow has to incorporate these URIs into the  
> query's static context.
>
> Maybe I'm missing something obvious, but I don't see anything in either  
> the XQuery base language or the full text extension that allows a query  
> author the ability to do that.
>
> There's also no apparent way to bind a URI to some piece of  
> implementation code.  For example, in our XQuery implementation written  
> in C++, we're allowing the user to use Princeton's WordNet.  In order to  
> implement this, we're using a WordNet C++ API to access the WordNet  
> database, hence there is custom C++ code just for WordNet.  If at some  
> point we also wish to allow other thesauri, we'd imagine that more  
> custom C++ code would have to be written to access that thesaurus'  
> database since every thesaurus is likely to be in a different format  
> with a different C++ API.
>
> I don't see any way for an XQuery author to tell an implementation  
> something like:
>
>  declare thesaurus "http://wordnet.princeton.edu" as "wordnet";
>
> Hence, our implementation "knows" about the WordNet URI  
> "http://wordnet.princeton.edu" and internally maps it to libwordnet.so  
> by hard-coding that mapping.
>
> The URIs used in the full text test suite are just make-up URIs, so how  
> is an implementation wishing to pass these tests supposed to "statically  
> know" about these URIs and how is it supposed to map them to some piece  
> of implementation code?
>
> When the full text specification was written, its authors must have  
> envisioned at least one way in which an implementation would do that.   
> What is that way?  Hard-coding those made-up URIs into an implementation  
> just to pass the tests would seem strange.
>
> - Paul
>

Speaking for myself:

Whatever mechanism the implementation uses to hook in thesaurus processing,
that mechanism should be used in the testdriver as well.  It is expected
that an implementation will have some kind of internal data structure that
is initialized through various implementation-specific means and then
mapped to URIs in some way.  This is an implemention-specific configuration
issue, and therefore out of scope of the query language proper. There may
or may not be a URI resolver or mapper interposed between the thesaurus URI
in the query and the URI known to the implementation (or to the internal
identifierused by the implementation).  The expectation is not that the URI
maps to some special code, but that the URI maps to some specific data
structure that is processed by some general thesaurus-handling code.
So, whatever those means are: use them in your test driver.  I don't think
we expect that to be hardcoded in the implementation in general, but if
that is how your implementation works, then I guess that to run and report
on those particular tests with something more interesting than the
unknown/unsupported thesaurus error, you'll have to have them hardcoded
into the implementation somehow, or use some substitute thesaurus that
encodes the same/equivalent relations that you do know about.  The point
of 'statically known' is to say 'statically known to the query' -- this
doesn't mean that it is statically fixed in the implementation; just
that the binding is outside the scope of the query itself.

//Mary

Received on Tuesday, 14 December 2010 18:38:41 UTC