- From: Pat Case <PCASE@crs.loc.gov>
- Date: Tue, 06 Jul 2010 12:31:10 -0400
- To: "Paul J. Lucas" <paul@lucasmail.org>,<public-qt-comments@w3.org>
- Message-Id: <4C33220E.7FBB.00F0.0@crs.loc.gov>
Paul, We discussed your comment in the XML Query/XSL meeting today. We agree that you are correct, that if the tokenizer strips punctuation, searching will be limited as you specify. Pat Case, Library of Congress >>> On 6/13/2010 at 12:04 PM, in message <4E0EF6CF-32D3-4677-B4CD-10EBC92A174E@lucasmail.org>, "Paul J. Lucas" <paul@lucasmail.org> wrote: Section 3.4.2 of the spec says: > A question mark, asterisk, plus sign, or left curly brace that is not immediately preceded by a period is not treated as a qualifier. > > When "wildcards" is used, any character in a query string can be "escaped" by immediately preceding it with a backslash, "\". That is, a backslash immediately followed by any character represents that character literally, preventing any special interpretation that the "wildcards" option might otherwise attach to it. In particular: > > 1. Escaping a period prevents its interpretation as a wildcard. > 2. Escaping a question mark, asterisk, plus sign, or left curly brace ensures that it is not interpreted as a qualifier. > 3. An escaped backslash ("\\") represents a literal backslash. > 4. If a query string is terminated by an unescaped backslash, an error is raised: [err:FTDY0020]. Assuming a given implementation's tokenizer ordinarily strips all punctuation characters (that is, when wildcards are not used), then #2 above is practically irrelevant since nobody would presumably write such a query, e.g., nobody would write: $x contains text "anybody there\?" using wildcards since they could just write: $x contains text "anybody there?" using wildcards without the backslash instead. Additionally, I assume that: $x contains text "foo.\?bar" using wildcards would be tokenized as: "foo." "bar" since the \? represents a literal '?' which would be stripped by the tokenizer. True? - Paul
Received on Tuesday, 6 July 2010 16:31:55 UTC