W3C home > Mailing lists > Public > www-ql@w3.org > April to June 2001

Re: word searching

From: Michael Dyck <MichaelDyck@home.com>
Date: Thu, 21 Jun 2001 18:16:25 -0700
Message-ID: <3B329C69.8BBF7066@home.com>
To: David Loy <de_loy@yahoo.com>
CC: www-ql@w3.org
David Loy wrote:
> 
> I have not kept up with this list. Sorry if this has
> been asked before.
> 
> Does the standard provide a mechanism for searching
> individual words within a tag using some form of
> Boolean and proximity operation?
> 
> Example:
> searching for 'wind' AND 'willow' in <title>

You could say
    //title[contains(.,'wind') and contains(.,'willow')]
but this just finds "wind" and "willow" as substrings of the title, not as
*words*. So, for instance, it would select
    <title>dwindling willows</title>
which is not what you wanted. You could prevent such matches by padding the
strings with blanks:
    //title[contains(.,' wind ') and contains(.,' willow ')]
but then it *won't* select titles where the desired words occur, but are
delimited by something other than blanks, e.g.
    <title>the wind, the willow, and the wardrobe</title>
or
    <title>wind and willow</title>

And I'm not even getting into the problems of case-insensitive search, or
word-form search. 
__________

> searching for 'blue' before 'sky' in <title>

I don't think there's any way to do that.
__________

> searching for phrase 'blue sky' in <title>

You could say:
    //title[contains(.,'blue sky')]
but with all the same caveats as above.
__________

You should take a look at "XML Use Cases from the Library of Congress"
(http://lists.w3.org/Archives/Public/www-xml-query-comments/2001May/0000.html)
which probably covers your cases. It doesn't look like the XML Query WG has
responded yet, though.

-Michael Dyck
Received on Thursday, 21 June 2001 21:21:24 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 22 July 2006 00:10:17 GMT