Re: [FT] Meaning of FTST0008 and FTST0018

On Jun 25, 2010, at 11:16 AM, Mary Holstege wrote:

> On Fri, 25 Jun 2010 09:00:19 -0700, Paul J. Lucas <paul@lucasmail.org> wrote:
> 
>> From the specification section 3.4.3:
>> 
>>> If the URI specifies a thesaurus that is not found in the statically known thesauri, an error is raised [err:FTST0018].
>> 
>> I don't understand why URIs for these things would be "statically known."  If the user wants to specify some arbitrary URI that points to a valid thesaurus ..., why shouldn't that "just work?"
>> 
>> Does "statically known" mean that the implementation has a set of hard-coded URIs that the user can only select from?  If not, then what does "statically known" mean here?
> 
> "Statically known" means that it is part of the static context when the query is analyzed.  It is up to the implementation to decide what URIs it will consider as part of the static context, and whether it is a fixed list or more open.  ... So the idea here is that if an implementation wants to restrict the set of ... thesaurus expansions used to those that it actually indexes with, it can do that.   If it wants to apply everything in real time on the fly, it can do that do, by declaring every dereferencable URI as "statically known".

The W3C Full Text test suite contains thesaurus tests that have thesaurus URIs like:

	http://bstore1.example.com/UsabilityThesaurus.xml
	http://bstore1.example.com/TechnicalThesaurus.xml
	http://bstore1.example.com/UsabilitySoundex.xml

I believe your response above implies that, if an implementation wants to pass these tests, it somehow has to incorporate these URIs into the query's static context.

Maybe I'm missing something obvious, but I don't see anything in either the XQuery base language or the full text extension that allows a query author the ability to do that.

There's also no apparent way to bind a URI to some piece of implementation code.  For example, in our XQuery implementation written in C++, we're allowing the user to use Princeton's WordNet.  In order to implement this, we're using a WordNet C++ API to access the WordNet database, hence there is custom C++ code just for WordNet.  If at some point we also wish to allow other thesauri, we'd imagine that more custom C++ code would have to be written to access that thesaurus' database since every thesaurus is likely to be in a different format with a different C++ API.

I don't see any way for an XQuery author to tell an implementation something like:

	declare thesaurus "http://wordnet.princeton.edu" as "wordnet";

Hence, our implementation "knows" about the WordNet URI "http://wordnet.princeton.edu" and internally maps it to libwordnet.so by hard-coding that mapping.

The URIs used in the full text test suite are just make-up URIs, so how is an implementation wishing to pass these tests supposed to "statically know" about these URIs and how is it supposed to map them to some piece of implementation code?

When the full text specification was written, its authors must have envisioned at least one way in which an implementation would do that.  What is that way?  Hard-coding those made-up URIs into an implementation just to pass the tests would seem strange.

- Paul

Received on Saturday, 11 December 2010 14:21:26 UTC