W3C home > Mailing lists > Public > www-ql@w3.org > April to June 2001

Re: word searching

From: Michael Dyck <MichaelDyck@home.com>
Date: Thu, 21 Jun 2001 18:16:25 -0700
Message-ID: <3B329C69.8BBF7066@home.com>
To: David Loy <de_loy@yahoo.com>
CC: www-ql@w3.org
David Loy wrote:
> I have not kept up with this list. Sorry if this has
> been asked before.
> Does the standard provide a mechanism for searching
> individual words within a tag using some form of
> Boolean and proximity operation?
> Example:
> searching for 'wind' AND 'willow' in <title>

You could say
    //title[contains(.,'wind') and contains(.,'willow')]
but this just finds "wind" and "willow" as substrings of the title, not as
*words*. So, for instance, it would select
    <title>dwindling willows</title>
which is not what you wanted. You could prevent such matches by padding the
strings with blanks:
    //title[contains(.,' wind ') and contains(.,' willow ')]
but then it *won't* select titles where the desired words occur, but are
delimited by something other than blanks, e.g.
    <title>the wind, the willow, and the wardrobe</title>
    <title>wind and willow</title>

And I'm not even getting into the problems of case-insensitive search, or
word-form search. 

> searching for 'blue' before 'sky' in <title>

I don't think there's any way to do that.

> searching for phrase 'blue sky' in <title>

You could say:
    //title[contains(.,'blue sky')]
but with all the same caveats as above.

You should take a look at "XML Use Cases from the Library of Congress"
which probably covers your cases. It doesn't look like the XML Query WG has
responded yet, though.

-Michael Dyck
Received on Thursday, 21 June 2001 21:21:24 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:17:15 UTC