- From: Kjetil Kjernsmo <Kjetil.Kjernsmo@computas.com>
- Date: Mon, 4 May 2009 18:31:22 +0200
- To: public-rdf-dawg@w3.org
John, Thank you very much for your support! On Monday 04 May 2009 16:23:12 Clark, John wrote: > I agree, and I think it's a useful exercise to try to standardize "general > text search", perhaps even for consumption by technologies other than > SPARQL. Possibly, but I care first and foremost about SPARQL :-) If anybody else has any use for it, I'd say fine. > > All we have used so far can be summarised as follows: > > 1) Terms shorter than three characters are ignored. > > So, with this feature, query string "Amazon S3" would be equivalent to > "Amazon" and query string "theorems about ?" would be equivalent to > "theorems about", correct? This makes me uneasy. Yeah, it has some drawbacks, clearly. I think it is mostly a practical matter, as far as I know, this restriction exists in LARQ, Virtuoso, MySQL to name a few I've worked with. It is painful at times, but I guess that it is simply too time-consuming to create an index that will match any two-letter combinations? > > 2) a single terms is matched exactly against a whole word. > > 3) a single term ending in asterisk is matched against words beginning > > with the term. > > 4) multiple terms with AND matches all words in any order. > > 5) multiple terms with OR matches any words in any order. > > 6) multiple terms without an operator matches all words in the given > > order. > > > > At some point, we had phrase search too, which is a nice feature but I > > think we dropped it. > > I think this is a reasonable set, but I'd also like to approach it slightly > differently and try to standardize what already exists (and thus is > reasonably "well understood" by users). Thank you! > That is, I'd suggest standardizing > generalized text search as "what Google does", Well, some of what "what Google does" could be http://www.google.com/support/websearch/bin/answer.py?hl=en&answer=136861 and indeed, I think some of that is quite reasonable, but I don't know if it is right for us. > including phrase search with > quotes, term negation, and query extensions with syntax like "loc: > cleveland, ohio" (e.g. in Google maps). Hmmm, I think we might end up standardising a bit too much of CQL (which is quite nice and a nice complement to SPARQL in many situations): http://www.loc.gov/standards/sru/specs/cql.html Also, I don't think loc: would belong in the object, since that is a predicate for us, and I feel that such specific things belong in a application layer that translates to SPARQL. Also, with property paths, we might be able to say stuff like "geo:location or any sub properties". Anyway, I hope we can discuss this a bit further on Wednesday. My agenda here is to constrain the feature so that it is a useful feature, yet something that will not take a lot of WG time and not a lot of time for implementers. Kind regards Kjetil Kjernsmo -- Senior Knowledge Engineer Mobile: +47 986 48 234 Email: kjetil.kjernsmo@computas.com Web: http://www.computas.com/ | SHARE YOUR KNOWLEDGE | Computas AS PO Box 482, N-1327 Lysaker | Phone:+47 6783 1000 | Fax:+47 6783 1001
Received on Monday, 4 May 2009 16:31:51 UTC