Re: [FT] FTWildCardOption observations and question

Paul,
 
We discussed your comment in the XML Query/XSL meeting today. 
 
We agree that you are correct, that if the tokenizer strips punctuation, searching will be limited as you specify.
 
Pat Case, Library of Congress

>>> On 6/13/2010 at 12:04 PM, in message <4E0EF6CF-32D3-4677-B4CD-10EBC92A174E@lucasmail.org>, "Paul J. Lucas" <paul@lucasmail.org> wrote:
Section 3.4.2 of the spec says:

> A question mark, asterisk, plus sign, or left curly brace that is not immediately preceded by a period is not treated as a qualifier.
> 
> When "wildcards" is used, any character in a query string can be "escaped" by immediately preceding it with a backslash, "\". That is, a backslash immediately followed by any character represents that character literally, preventing any special interpretation that the "wildcards" option might otherwise attach to it. In particular:
> 
> 1. Escaping a period prevents its interpretation as a wildcard.
> 2. Escaping a question mark, asterisk, plus sign, or left curly brace ensures that it is not interpreted as a qualifier.
> 3. An escaped backslash ("\\") represents a literal backslash.
> 4. If a query string is terminated by an unescaped backslash, an error is raised: [err:FTDY0020].

Assuming a given implementation's tokenizer ordinarily strips all punctuation characters (that is, when wildcards are not used), then #2 above is practically irrelevant since nobody would presumably write such a query, e.g., nobody would write:

$x contains text "anybody there\?" using wildcards

since they could just write:

$x contains text "anybody there?" using wildcards

without the backslash instead.  Additionally, I assume that:

$x contains text "foo.\?bar" using wildcards

would be tokenized as:

"foo." "bar"

since the \? represents a literal '?' which would be stripped by the tokenizer. True?

- Paul

Received on Tuesday, 6 July 2010 16:31:55 UTC