- From: Pat Case <PCASE@crs.loc.gov>
- Date: Tue, 06 Jul 2010 12:31:10 -0400
- To: "Paul J. Lucas" <paul@lucasmail.org>,<public-qt-comments@w3.org>
- Message-Id: <4C33220E.7FBB.00F0.0@crs.loc.gov>
Paul,
We discussed your comment in the XML Query/XSL meeting today.
We agree that you are correct, that if the tokenizer strips punctuation, searching will be limited as you specify.
Pat Case, Library of Congress
>>> On 6/13/2010 at 12:04 PM, in message <4E0EF6CF-32D3-4677-B4CD-10EBC92A174E@lucasmail.org>, "Paul J. Lucas" <paul@lucasmail.org> wrote:
Section 3.4.2 of the spec says:
> A question mark, asterisk, plus sign, or left curly brace that is not immediately preceded by a period is not treated as a qualifier.
>
> When "wildcards" is used, any character in a query string can be "escaped" by immediately preceding it with a backslash, "\". That is, a backslash immediately followed by any character represents that character literally, preventing any special interpretation that the "wildcards" option might otherwise attach to it. In particular:
>
> 1. Escaping a period prevents its interpretation as a wildcard.
> 2. Escaping a question mark, asterisk, plus sign, or left curly brace ensures that it is not interpreted as a qualifier.
> 3. An escaped backslash ("\\") represents a literal backslash.
> 4. If a query string is terminated by an unescaped backslash, an error is raised: [err:FTDY0020].
Assuming a given implementation's tokenizer ordinarily strips all punctuation characters (that is, when wildcards are not used), then #2 above is practically irrelevant since nobody would presumably write such a query, e.g., nobody would write:
$x contains text "anybody there\?" using wildcards
since they could just write:
$x contains text "anybody there?" using wildcards
without the backslash instead. Additionally, I assume that:
$x contains text "foo.\?bar" using wildcards
would be tokenized as:
"foo." "bar"
since the \? represents a literal '?' which would be stripped by the tokenizer. True?
- Paul
Received on Tuesday, 6 July 2010 16:31:55 UTC