XML Query Use Case TEXT

XML Query Use Cases
W3C Working Draft 15 February 2001

Use Case TEXT

-------------------
1.6.1 vs all queries

"In this use case, searches for company names are to be interpreted
as word-based searches. The words in a company name may be in any
case and may be separated by any kind of white space."

The Solutions in XQuery use "contains()", and there's nothing in XPath
section 4.2 to indicate that this function is case-insensitive, or that it
treats every chunk of whitespace as the same.

-----------------
1.6.3 Sample Data

/news/news_item[1]/content/par[1] contains a spelling mistake:
    corparation
should be at least
    corporation
and preferably
    Corporation
This mistake causes queries 3 and 5 to return unexpected results.

----------
1.6.4.1 Q1

The informal query asks for particular news items, but the Solution in
XQuery only yields the titles of those items. So
    //news_item/title[contains(./text(), "Foobar Corporation")]
should be changed to
    //news_item[title[contains(./text(), "Foobar Corporation")]]
or
    //news_item[contains(title/text(), "Foobar Corporation")]

-----------------------
1.6.4.3 Q3 & 1.6.4.6 Q6

contains_in_same_sentence() and contains_stems_in_same_sentence():
These seem pretty ad hoc to be built-in functions, so are they supposed to
be user-defined functions for which the definition hasn't been written yet?

Also, if "." designates the end of a sentence, "YouNameItWeIntegrateIt.com"
will never be deemed to appear in a sentence. (This is a problem for Q6.)

----------
1.6.4.5 Q5

(1)
The Solution in XQuery uses
    string( ($item//par)[1] )
but this doesn't reproduce the <quote> element in the Expected Result.
Instead, use
    ($item//par)[1]/node()

(2)
The Expected Result's whitespace doesn't match that of the news document.

----------
1.6.4.6 Q6

(1)
In the Solution in XQuery, "para" should be "par".

(2)
The construct
    $item_title IN $item/title,
    $item_para IN $item//par
is bad, because if the $item has no par elements (which is allowed by the
DTD), the FOR will "abort", even if it should have found a hit in the title.

(3)
The construct
    WHERE different_companies AND title_mentions OR para_mentions
needs parentheses:
    WHERE different_companies AND ( title_mentions OR para_mentions )

(4)
The function call
    distinct($item)
is useless, because $item is always just a single node. Instead, you could
pass the result of the whole FLWR expression to distinct(). But really, you
don't need distinct(), because
    FOR $item IN //news_item
generates distinct items.

Pulling all these together, I suggest:
   LET $companies := ...
   FOR $item IN //news_item
   LET $places := $item/title UNION $item//par
   WHERE
       SOME $c1 IN $companies SATISFIES
         SOME $c2 IN $companies SATISFIES
           ( $c1 != $c2 AND
             contains_stems_in_same_sentence(
                $places/text(), $c1, $c2, "acquire") )
   RETURN $item

(It would be nice if the two quantifications could be written
    SOME $c1 IN $companies, $c2 IN $companies SATISFIES ... )

-Michael Dyck

Received on Monday, 30 April 2001 01:38:57 UTC