- From: Michael Dyck <MichaelDyck@home.com>
- Date: Thu, 21 Jun 2001 18:16:25 -0700
- To: David Loy <de_loy@yahoo.com>
- CC: www-ql@w3.org
David Loy wrote: > > I have not kept up with this list. Sorry if this has > been asked before. > > Does the standard provide a mechanism for searching > individual words within a tag using some form of > Boolean and proximity operation? > > Example: > searching for 'wind' AND 'willow' in <title> You could say //title[contains(.,'wind') and contains(.,'willow')] but this just finds "wind" and "willow" as substrings of the title, not as *words*. So, for instance, it would select <title>dwindling willows</title> which is not what you wanted. You could prevent such matches by padding the strings with blanks: //title[contains(.,' wind ') and contains(.,' willow ')] but then it *won't* select titles where the desired words occur, but are delimited by something other than blanks, e.g. <title>the wind, the willow, and the wardrobe</title> or <title>wind and willow</title> And I'm not even getting into the problems of case-insensitive search, or word-form search. __________ > searching for 'blue' before 'sky' in <title> I don't think there's any way to do that. __________ > searching for phrase 'blue sky' in <title> You could say: //title[contains(.,'blue sky')] but with all the same caveats as above. __________ You should take a look at "XML Use Cases from the Library of Congress" (http://lists.w3.org/Archives/Public/www-xml-query-comments/2001May/0000.html) which probably covers your cases. It doesn't look like the XML Query WG has responded yet, though. -Michael Dyck
Received on Thursday, 21 June 2001 21:21:24 UTC