- From: Kjetil Kjernsmo <Kjetil.Kjernsmo@computas.com>
- Date: Mon, 4 May 2009 13:04:38 +0200
- To: public-rdf-dawg@w3.org
On Friday 01 May 2009 06:27:54 Lee Feigenbaum wrote: > * Full text. The survey indicated strong support for standardizing > the syntax and semantics for full text search in SPARQL. While I believe > that this is one of the top interoperability stumbling blocks for > SPARQL, the wide-open design space (both for syntax and semantics) of > the problem worries me. Indeed it has a very open design space, but I think we should look into how fulltext search is used and let that guide the implementation now. I think it would be very unfortunate to not have any standardised fulltext capability in SPARQL, as it signals that "if you have a search box on your site that is used extensively by your users, then SPARQL is not suitable for you". Even if there are extensions that does free text, this is a message that I for one, would be very concerned about as it is most of the current web. We sometimes match strings with regular expressions, but never with exact string match. Regular expressions are far too flexible to be useful in many contexts. All we have used so far can be summarised as follows: 1) Terms shorter than three characters are ignored. 2) a single terms is matched exactly against a whole word. 3) a single term ending in asterisk is matched against words beginning with the term. 4) multiple terms with AND matches all words in any order. 5) multiple terms with OR matches any words in any order. 6) multiple terms without an operator matches all words in the given order. At some point, we had phrase search too, which is a nice feature but I think we dropped it. Here, there is no Xquery, a small subset of what Lucene does, there is no advanced stemming, just plain string matching, with some permutations of terms. Yet, it covers most of what people do in our experience. Also, forward compatibility can be kept by defining different functions for different matching rules, we could have a simple contains function now, and SPARQL 1.2 could adopt ftcontains in addition if they so wish. In summary, the design space can be constrained to something small, and while SPARQL does not need a very elaborate freetext matching system, it needs something, and much of it is allready there, it is mostly just a matter of naming a function or predicate normatively. Kind regards Kjetil Kjernsmo -- Senior Knowledge Engineer Mobile: +47 986 48 234 Email: kjetil.kjernsmo@computas.com Web: http://www.computas.com/ | SHARE YOUR KNOWLEDGE | Computas AS PO Box 482, N-1327 Lysaker | Phone:+47 6783 1000 | Fax:+47 6783 1001
Received on Monday, 4 May 2009 11:05:09 UTC