[Bug 3596] second order aspect of scoring expressions from bugzilla@wiggum.w3.org on 2006-08-13 (public-qt-comments@w3.org from August 2006)

From: <bugzilla@wiggum.w3.org>
Date: Sun, 13 Aug 2006 08:37:32 +0000
To: public-qt-comments@w3.org
CC:
Message-Id: <E1GCBTQ-0006bq-O2@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=3596

           Summary: second order aspect of scoring expressions
           Product: XPath / XQuery / XSLT
           Version: Working drafts
          Platform: Macintosh
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Full Text
        AssignedTo: sihem@research.att.com
        ReportedBy: martin@x-hive.com
         QAContact: public-qt-comments@w3.org


I've posted this to the list but found out later that it might be better to
submit a bug report.

The full text specification extends the XQuery processing model to allow for a
second-order aspect of functions and it appears to me values are somewhat
cheating around the normal flow of XDM instances in XQuery using this
mechanism. This seems a bit strange, as it does not go so well with the XQuery
spec. Also, there seem to be some holes, e.g. what is score here:
> for $x score $score in //book[title ftcontains "hello"]/para[. ftcontains "world"] return $score
The score of the title, or the score of the para? I think this problem occurs
because of the score values sneaking around normal XQuery evaluation order.

Now I wonder if this couldn't be greatly simplified by providing just two full
text keywords, e.g. "ftmatches" returning an xs:boolean and "ftscore" returning
an xs:double in [0.1]. "ftmatches" could be used for boolean conditions:
> //book[. ftmatches "hello" && "world"]
And "ftscore" if the user needs more control over relevance:
> for $b in //book
> let $score := $b ftscore "hello" && "world"
> where $score > 0.5
> order by $score descending
> return $b
The definition of what score is a "match" could be an option, e.g.
> declare option fts:match-score := 0.5;
Or completely arbitrary and application defined (as in the current spec, I
think).

As this only adds completely normal XQuery expressions returning XDM instances
I think this would greatly simplify both the processing model, the application
for the user and the implementation for vendors (which is of course why I write
this, I'm lazy :-)).

I can't quite come up with a limitation of this concept over the one with the
special score keywords, functions etc. Am I missing something?

Received on Sunday, 13 August 2006 08:37:39 UTC