- From: Michael Dyck <MichaelDyck@home.com>
- Date: Thu, 21 Jun 2001 18:16:25 -0700
- To: David Loy <de_loy@yahoo.com>
- CC: www-ql@w3.org
David Loy wrote:
>
> I have not kept up with this list. Sorry if this has
> been asked before.
>
> Does the standard provide a mechanism for searching
> individual words within a tag using some form of
> Boolean and proximity operation?
>
> Example:
> searching for 'wind' AND 'willow' in <title>
You could say
//title[contains(.,'wind') and contains(.,'willow')]
but this just finds "wind" and "willow" as substrings of the title, not as
*words*. So, for instance, it would select
<title>dwindling willows</title>
which is not what you wanted. You could prevent such matches by padding the
strings with blanks:
//title[contains(.,' wind ') and contains(.,' willow ')]
but then it *won't* select titles where the desired words occur, but are
delimited by something other than blanks, e.g.
<title>the wind, the willow, and the wardrobe</title>
or
<title>wind and willow</title>
And I'm not even getting into the problems of case-insensitive search, or
word-form search.
__________
> searching for 'blue' before 'sky' in <title>
I don't think there's any way to do that.
__________
> searching for phrase 'blue sky' in <title>
You could say:
//title[contains(.,'blue sky')]
but with all the same caveats as above.
__________
You should take a look at "XML Use Cases from the Library of Congress"
(http://lists.w3.org/Archives/Public/www-xml-query-comments/2001May/0000.html)
which probably covers your cases. It doesn't look like the XML Query WG has
responded yet, though.
-Michael Dyck
Received on Thursday, 21 June 2001 21:21:24 UTC