RE: Paging, filtering, and sorting from Denenberg, Ray on 2015-04-16 (public-annotation@w3.org from April 2015)

From: Denenberg, Ray <rden@loc.gov>
Date: Thu, 16 Apr 2015 08:40:40 -0400
To: Frederick Hirsch <w3c@fjhirsch.com>
CC: Web Annotation <public-annotation@w3.org>
Message-ID: <5483534C5FA8464B881ED2184D98C0F61446BA5754@LCXCLMB03.LCDS.LOC.GOV>
 From: Frederick Hirsch [mailto:w3c@fjhirsch.com]
> is there open source query language processing plugin/code etc?

Here's a place to start, http://www.loc.gov/standards/sru/resources/products.html 
(unfortunately it hasn't been updated lately.)

Ray





> 
> regards, Frederick
> 
> On Apr 15, 2015, at 5:39 PM, Denenberg, Ray <rden@loc.gov> wrote:
> 
> > Thanks, Frederick.     I likely oversimplified so let me elaborate/clarify a few
> points.
> >
> > The SRU search URL is composed of a base URL, followed by a question
> mark ('?) followed by a list of parameter name/value pairs separated by
> ampersands ('&') where the parameter name and value are separated by
> equal sign ('='). This is all as in the URI standard.
> >
> > But the SRU parameter names are strictly defined in the SRU standard, you
> can't make them up as you go along, But what you DO get to do is make up
> your own index names.  That is, you can define a namespace of index names.
> For SRU/CQL we call such a namespace a "context set".
> >
> > So in my example where I say
> >
> > query=oa.motivation=reviewing
> >
> > 'oa' would refer to the oa context set  (and actually you could omit the
> prefix and declare oa to be the default, which if you do, causes other
> complications, but I won't go into that now).    The query string is defined to
> be a list of search clauses separated by Boolean operators (spaces on each
> side) where each search clause is an index and value, separated by a relator,
> the most common of which is '='.
> >
> > I concede there is a bit of awkwardness in the syntax where "=" is
> > used to mean different things, as in
> >
> > query=oa.motivation=reviewing
> >
> > but you can always quote the query string if it makes you more
> comfortable:
> >
> > query="oa.motivation=reviewing"
> >
> > and in fact you HAVE to quote it if there are embedded spaces:
> >
> > Query="title=cat AND publisher=dog"
> >
> > (Note the AND, not ampersand, because we are not separating URL
> > parameters but rather CQL search clauses.)
> >
> >
> > But back to my point;  when you say:
> >
> > http://example.com/annotations?target=boston.com&match=contains
> >
> > instead you would say:
> >
> > http://example.com/annotations?query="oa:target=boston.com AND
> oa:match=contains"
> >
> >
> > And yes, all (or most) of the logic is in the query string.
> >
> > Ray
> >
> >
> >> -----Original Message-----
> >> From: Frederick Hirsch [mailto:w3c@fjhirsch.com]
> >> Sent: Wednesday, April 15, 2015 4:18 PM
> >> To: Denenberg, Ray
> >> Cc: Web Annotation
> >> Subject: Re: Paging, filtering, and sorting
> >>
> >> Thanks Ray, the concept of sorted result sets seems very relevant.
> >>
> >> How hard would it be for me to make the following query:
> >>
> >> Search/Filter the annotations stored on my web site (example.com) for
> >> the target domain boston.com (or *.boston.com) posted on the date 1
> >> April 2015 sorted by most recent first and limited to the first 200?
> >>
> >> My naive approach might be to simply store annotations with ids I
> >> create and perhaps index by target domain without other fields (e.g.
> >> think of a table with id, domain as text string, and text holding
> >> arbitrary JSON of the annotation). This means I would have a server
> >> that could return an annotation by id, or by domain, or iterate, but
> >> other choices might be more difficult in terms of parsing JSON etc.
> >>
> >> I might think I have the following URLs:
> >>
> >> http//example.com/annotations/ ; (container)
> >>
> >> http//example.com/annotations/ids/ ;  e.g. GET
> >> http://example.com/annotations/ids/3 to get annotation #3
> >>
> >> http//example.com/annotations/targets/ ;  e.g. GET
> >> http://example.com/annotations/targets/boston.com to get all
> >> annotations for the boston.com domain (exact match)
> >>
> >> I think you are suggesting that all logic is in the query string, so
> >> to get all matches containing boston.com, it might be
> >>
> >> or GET
> >> http://example.com/annotations?target=boston.com&match=contains
> >>
> >> where 'contains' is a string that would have to be well defined.
> >>
> >> I'm probably missing something related to the resources but am
> >> thinking I might be interested in all targets as well...
> >>
> >> regards, Frederick
> >>
> >> Frederick Hirsch
> >>
> >> www.fjhirsch.com
> >> @fjhirsch
> >>
> >>> On Apr 15, 2015, at 2:02 PM, Denenberg, Ray <rden@loc.gov> wrote:
> >>>
> >>> At this morning’s call  we discussed paging, filtering, and sorting
> >>> of
> >> annotations.
> >>>
> >>> A container may have a large number of annotations, and a client may
> >>> want
> >> to specify that it wants only 100, then another 100 on the next request,
> and
> >> so on.   That would be straight paging, as the annotations are going to be
> >> supplied in random order.
> >>
> >>>
> >>> But the client may be  interested only in annotations with (for
> >>> example) a
> >> specific Motivation, or meeting some other criteria.  Then that’s
> >> going to require pre-filtering, and it still may require paging in addition
> because the
> >> set of annotation meeting the criteria might still be large.   So this brings
> into
> >> the conversation the concept of a result set (where for “straight”
> >> paging, the result set is the entire set of annotations).
> >>>
> >>> Further, the client may want the results supplied in some specified
> >>> order,
> >> for example, most recent first.  That brings into play sorting the result set.
> >>>
> >>> If we are going to come up with a querying mechanism  it would make
> >> sense to build into it  support for result sets and sorting.
> >> Alternatively we could use an existing search protocol that already
> supports all of this.
> >>>
> >>> So I’d like to offer for consideration developing a profile of the
> >>> SRU
> >> protocol  http://www.loc.gov/standards/sru/. I suggest that you NOT
> >> bother reading the spec and instead let me try to describe how simple it
> really can
> >> be if profiled for our  purposes.   (As to the status of this protocol, it is an
> >> OASIS standard, and is being fast-tracked in ISO.)
> >>>
> >>> Here is a rough outline of the suggested approach:
> >>> _________________________________________________
> >>>
> >>> I have a resource:
> >>> http://example.com /rays-resources/resource1
> >>>
> >>> I create an annotation container for it:
> >>> http://example.com /rays-resources/resource1/annotations
> >>>
> >>> I create an SRU endpoint for it:
> >>> http://example.com /rays-resources/resource1/annotations/sru
> >>>
> >>> this URL …..
> >>>
> >>> http://example.com /rays-resources/resource1/annotations/sru?
> >>> query=”oa.motivation=reviewing sortBy=oa.date/descending”
> >> &startRecord=1&maximumRecords=100
> >>>
> >>> (might have to percent encode “/” and space)
> >>>
> >>> …….  Says:
> >>> Search  http://example.com /rays-resources/resource1/annotations/
> >>> ·         For annotations whose Motivation is “reviewing”
> >>> ·         Sort the results by date, most recent first
> >>> ·         Return 100 annotations, beginning with the first
> >>>
> >>> Within the response, there will be a resultSetId.  Let’s say it’s
> “resultsXYZ”
> >>>
> >>>   The following URL gets the next 100 annotations:
> >>>
> >>> http://example.com /rays-resources/resource1/annotations/sru?
> >>> query=resultSetId=resultsXYZ&startRecord=101&maximumRecords=100
> >>>
> >>>
> >>>
> >>> Ok there’s handwaving here,  it needs elaboration, but it is nearly
> >>> as simple
> >> as this.  Don’t be scared by the complexity of the specification, it
> >> can be profiled into a specification nearly as simple as I have described.
> >>>
> >>>
> >>> Ray
> >
Received on Thursday, 16 April 2015 12:41:12 UTC