Re: More fulltext advocacy (was Re: Lee's feature proposal)

Kjetil, John,

Can your proposal/discussion be summarized (without going int o 
extensions such as loc: ...) in that
what you want is a simpler full-text "surface syntax for Regular 
expressions"?

The proposal of the basic expression set looks reasonable and I think 
this could fly as a part of SurfaceSyntax, if we find agreement on that.
Opinions?

Axel

Kjetil Kjernsmo wrote:
> John,
> 
> Thank you very much for your support!
> 
> On Monday 04 May 2009 16:23:12 Clark, John wrote:
>> I agree, and I think it's a useful exercise to try to standardize "general
>> text search", perhaps even for consumption by technologies other than
>> SPARQL.
> 
> Possibly, but I care first and foremost about SPARQL :-) If anybody else has 
> any use for it, I'd say fine.
> 
>>> All we have used so far can be summarised as follows:
>>> 1) Terms shorter than three characters are ignored.
>> So, with this feature, query string "Amazon S3" would be equivalent to
>> "Amazon" and query string "theorems about ?" would be equivalent to
>> "theorems about", correct? Â This makes me uneasy.
> 
> Yeah, it has some drawbacks, clearly. I think it is mostly a practical matter, 
> as far as I know, this restriction exists in LARQ, Virtuoso, MySQL to name a 
> few I've worked with. It is painful at times, but I guess that it is simply 
> too time-consuming to create an index that will match any two-letter 
> combinations?
> 
>>> 2) a single terms is matched exactly against a whole word.
>>> 3) a single term ending in asterisk is matched against words beginning
>>> with the term.
>>> 4) multiple terms with AND matches all words in any order.
>>> 5) multiple terms with OR matches any words in any order.
>>> 6) multiple terms without an operator matches all words in the given
>>> order.
>>>
>>> At some point, we had phrase search too, which is a nice feature but I
>>> think we dropped it.
>> I think this is a reasonable set, but I'd also like to approach it slightly
>> differently and try to standardize what already exists (and thus is
>> reasonably "well understood" by users).
> 
> Thank you! 
> 
>> That is, I'd suggest standardizing 
>> generalized text search as "what Google does", 
> 
> Well, some of what "what Google does" could be 
> http://www.google.com/support/websearch/bin/answer.py?hl=en&answer=136861
> and indeed, I think some of that is quite reasonable, but I don't know if it 
> is right for us.
> 
>> including phrase search with 
>> quotes, term negation, and query extensions with syntax like "loc:
>> cleveland, ohio" (e.g. in Google maps).
> 
> Hmmm, I think we might end up standardising a bit too much of CQL (which is 
> quite nice and a nice complement to SPARQL in many situations):
> http://www.loc.gov/standards/sru/specs/cql.html
> Also, I don't think loc: would belong in the object, since that is a predicate 
> for us, and I feel that such specific things belong in a application layer 
> that translates to SPARQL. Also, with property paths, we might be able to say 
> stuff like "geo:location or any sub properties". 
> 
> Anyway, I hope we can discuss this a bit further on Wednesday. My agenda here 
> is to constrain the feature so that it is a useful feature, yet something 
> that will not take a lot of WG time and not a lot of time for implementers.
> 
> Kind regards 
> 
> Kjetil Kjernsmo


-- 
Dr. Axel Polleres
Digital Enterprise Research Institute, National University of Ireland, 
Galway
email: axel.polleres@deri.org  url: http://www.polleres.net/

Received on Monday, 4 May 2009 19:35:14 UTC