RE: More fulltext advocacy (was Re: Lee's feature proposal)



> -----Original Message-----
> From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
> On Behalf Of Axel Polleres
> Sent: 04 May 2009 20:35
> To: Kjetil Kjernsmo
> Cc: public-rdf-dawg@w3.org
> Subject: Re: More fulltext advocacy (was Re: Lee's feature proposal)
> 
> Kjetil, John,
> 
> Can your proposal/discussion be summarized (without going int o
> extensions such as loc: ...) in that
> what you want is a simpler full-text "surface syntax for Regular
> expressions"?
> 
> The proposal of the basic expression set looks reasonable and I think
> this could fly as a part of SurfaceSyntax, if we find agreement on that.
> Opinions?

In my opinion, this is not surface syntax (it is not convenient syntax for a feature or set of features that can be done some other way in SPARQL).

Experience with LARQ shows that users want access to other features as well, such as term weighing and scoring, language specific stemming, and also that there are different models for index use, based on whether it is literal in the graph being indexed or the content of externally held resources (e.g. PDFs, then when the URI for the PDF from the index lookup).

 Andy

> 
> Axel
> 
> Kjetil Kjernsmo wrote:
> > John,
> >
> > Thank you very much for your support!
> >
> > On Monday 04 May 2009 16:23:12 Clark, John wrote:
> >> I agree, and I think it's a useful exercise to try to standardize
> "general
> >> text search", perhaps even for consumption by technologies other than
> >> SPARQL.
> >
> > Possibly, but I care first and foremost about SPARQL :-) If anybody else
> has
> > any use for it, I'd say fine.
> >
> >>> All we have used so far can be summarised as follows:
> >>> 1) Terms shorter than three characters are ignored.
> >> So, with this feature, query string "Amazon S3" would be equivalent to
> >> "Amazon" and query string "theorems about ?" would be equivalent to
> >> "theorems about", correct? Â This makes me uneasy.
> >
> > Yeah, it has some drawbacks, clearly. I think it is mostly a practical
> matter,
> > as far as I know, this restriction exists in LARQ, Virtuoso, MySQL to name
> a
> > few I've worked with. It is painful at times, but I guess that it is
> simply
> > too time-consuming to create an index that will match any two-letter
> > combinations?
> >
> >>> 2) a single terms is matched exactly against a whole word.
> >>> 3) a single term ending in asterisk is matched against words beginning
> >>> with the term.
> >>> 4) multiple terms with AND matches all words in any order.
> >>> 5) multiple terms with OR matches any words in any order.
> >>> 6) multiple terms without an operator matches all words in the given
> >>> order.
> >>>
> >>> At some point, we had phrase search too, which is a nice feature but I
> >>> think we dropped it.
> >> I think this is a reasonable set, but I'd also like to approach it
> slightly
> >> differently and try to standardize what already exists (and thus is
> >> reasonably "well understood" by users).
> >
> > Thank you!
> >
> >> That is, I'd suggest standardizing
> >> generalized text search as "what Google does",
> >
> > Well, some of what "what Google does" could be
> > http://www.google.com/support/websearch/bin/answer.py?hl=en&answer=136861

> > and indeed, I think some of that is quite reasonable, but I don't know if
> it
> > is right for us.
> >
> >> including phrase search with
> >> quotes, term negation, and query extensions with syntax like "loc:
> >> cleveland, ohio" (e.g. in Google maps).
> >
> > Hmmm, I think we might end up standardising a bit too much of CQL (which
> is
> > quite nice and a nice complement to SPARQL in many situations):
> > http://www.loc.gov/standards/sru/specs/cql.html

> > Also, I don't think loc: would belong in the object, since that is a
> predicate
> > for us, and I feel that such specific things belong in a application layer
> > that translates to SPARQL. Also, with property paths, we might be able to
> say
> > stuff like "geo:location or any sub properties".
> >
> > Anyway, I hope we can discuss this a bit further on Wednesday. My agenda
> here
> > is to constrain the feature so that it is a useful feature, yet something
> > that will not take a lot of WG time and not a lot of time for
> implementers.
> >
> > Kind regards
> >
> > Kjetil Kjernsmo
> 
> 
> --
> Dr. Axel Polleres
> Digital Enterprise Research Institute, National University of Ireland,
> Galway
> email: axel.polleres@deri.org  url: http://www.polleres.net/

> 

Received on Monday, 4 May 2009 20:28:39 UTC