- From: Orri Erling <erling@xs4all.nl>
- Date: Wed, 6 May 2009 08:44:42 +0200
- To: "'Simon Schenk'" <sschenk@uni-koblenz.de>, <public-rdf-dawg@w3.org>
As concerns full text, I would just add the following: - Order - For ranking on text hit score, we would have an explicit sorted order by in the query. . It is true that a text index can be built with order in the index itself but if one has a composite ranking combining the text hits score and the document page rank or equivalent plus word proximities, one does end up having to sort the results. So preserving order is not a big deal as I see it since an application will in practice have an order by clause with limit and offset in the query. - Filtering - The SQL MM full text feature for example is expressed in terms of filtering, the eventual existence of an index is hardly mentioned in the spec. However, the index is the crucial point for using the feature, however not for defining it. - Syntax - I do not care whether the full text match looks like a triple pattern or something else. The important part is that it ought to be able to bind a score variable and possibly other "offband"variables, for example for purposes of locating the text hit in the document, or for purposes of fetching other information colocated with the text index. I would not expect the standard to mention anything but a score but it could have placeholders for other things. - Symetry - it seems that joins involving full text matches will in practice not be commutative: When making an execution plan, a text index lookup can only go to a place where the text expression is bound. If the text match binds other variables, anything depending on these variables can be evaluated only after the text match. - Due to the above, full text is not quite surface syntax. But in many places it will look like such. - In practice, we do not allow contains in SQL or SPARQL inside an OR. If one wishes an OR, one can write it in the text pattern. The text pattern language has the connectives of and/or/not and plus phrase and proximity. A negated contains is also not allowed, although one could do this with a negated exists subquery. We have never suffered any inconvenience from these limitations. But we see that a purist might call these restrictions arbitrary and ad hoc. Thus, if full text is treated as a filter, it must be specified as such, as XPATH and SQL have done. Then implementations will have to deal with this inside OR's, NOT's etc., sometimes use a text index and sometimes not. The score is the only thing that can be returned. I would prefer text search as a pattern, analogous to a SQL table valued function or derived table. This can bind many variables and by its nature does not occur in expressions. If one wishes to OR or negate these, one uses a union or not exists. Orri
Received on Wednesday, 6 May 2009 06:46:14 UTC