Re: Paging, filtering, and sorting from Robert Sanderson on 2015-04-16 (public-annotation@w3.org from April 2015)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Thu, 16 Apr 2015 09:51:16 -0700
To: Frederick Hirsch <w3c@fjhirsch.com>
Cc: "Denenberg, Ray" <rden@loc.gov>, Web Annotation <public-annotation@w3.org>
Message-ID: <CABevsUHynboQs+=+YrkLqFMPwFNRZ4xvQxAC59ZBVNc2HUoaAA@mail.gmail.com>
The most recent version of my python implementation is:

https://github.com/cheshire3/cheshire3/blob/develop/cheshire3/cqlParser.py

And should be usable in a stand alone way, outside of the Cheshire3 system.

Rob




On Thu, Apr 16, 2015 at 9:48 AM, Frederick Hirsch <w3c@fjhirsch.com> wrote:

> thanks, was looking for Python and  see the name azaroth next to it :)
>
> regards, Frederick
>
> Frederick Hirsch
> Co-Chair, W3C Web Annotation WG
>
> www.fjhirsch.com
> @fjhirsch
>
> > On Apr 16, 2015, at 8:40 AM, Denenberg, Ray <rden@loc.gov> wrote:
> >
> > From: Frederick Hirsch [mailto:w3c@fjhirsch.com]
> >> is there open source query language processing plugin/code etc?
> >
> > Here's a place to start,
> http://www.loc.gov/standards/sru/resources/products.html
> > (unfortunately it hasn't been updated lately.)
> >
> > Ray
> >
> >
> >
> >
> >
> >>
> >> regards, Frederick
> >>
> >> On Apr 15, 2015, at 5:39 PM, Denenberg, Ray <rden@loc.gov> wrote:
> >>
> >>> Thanks, Frederick.     I likely oversimplified so let me
> elaborate/clarify a few
> >> points.
> >>>
> >>> The SRU search URL is composed of a base URL, followed by a question
> >> mark ('?) followed by a list of parameter name/value pairs separated by
> >> ampersands ('&') where the parameter name and value are separated by
> >> equal sign ('='). This is all as in the URI standard.
> >>>
> >>> But the SRU parameter names are strictly defined in the SRU standard,
> you
> >> can't make them up as you go along, But what you DO get to do is make up
> >> your own index names.  That is, you can define a namespace of index
> names.
> >> For SRU/CQL we call such a namespace a "context set".
> >>>
> >>> So in my example where I say
> >>>
> >>> query=oa.motivation=reviewing
> >>>
> >>> 'oa' would refer to the oa context set  (and actually you could omit
> the
> >> prefix and declare oa to be the default, which if you do, causes other
> >> complications, but I won't go into that now).    The query string is
> defined to
> >> be a list of search clauses separated by Boolean operators (spaces on
> each
> >> side) where each search clause is an index and value, separated by a
> relator,
> >> the most common of which is '='.
> >>>
> >>> I concede there is a bit of awkwardness in the syntax where "=" is
> >>> used to mean different things, as in
> >>>
> >>> query=oa.motivation=reviewing
> >>>
> >>> but you can always quote the query string if it makes you more
> >> comfortable:
> >>>
> >>> query="oa.motivation=reviewing"
> >>>
> >>> and in fact you HAVE to quote it if there are embedded spaces:
> >>>
> >>> Query="title=cat AND publisher=dog"
> >>>
> >>> (Note the AND, not ampersand, because we are not separating URL
> >>> parameters but rather CQL search clauses.)
> >>>
> >>>
> >>> But back to my point;  when you say:
> >>>
> >>> http://example.com/annotations?target=boston.com&match=contains
> >>>
> >>> instead you would say:
> >>>
> >>> http://example.com/annotations?query="oa:target=boston.com AND
> >> oa:match=contains"
> >>>
> >>>
> >>> And yes, all (or most) of the logic is in the query string.
> >>>
> >>> Ray
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Frederick Hirsch [mailto:w3c@fjhirsch.com]
> >>>> Sent: Wednesday, April 15, 2015 4:18 PM
> >>>> To: Denenberg, Ray
> >>>> Cc: Web Annotation
> >>>> Subject: Re: Paging, filtering, and sorting
> >>>>
> >>>> Thanks Ray, the concept of sorted result sets seems very relevant.
> >>>>
> >>>> How hard would it be for me to make the following query:
> >>>>
> >>>> Search/Filter the annotations stored on my web site (example.com) for
> >>>> the target domain boston.com (or *.boston.com) posted on the date 1
> >>>> April 2015 sorted by most recent first and limited to the first 200?
> >>>>
> >>>> My naive approach might be to simply store annotations with ids I
> >>>> create and perhaps index by target domain without other fields (e.g.
> >>>> think of a table with id, domain as text string, and text holding
> >>>> arbitrary JSON of the annotation). This means I would have a server
> >>>> that could return an annotation by id, or by domain, or iterate, but
> >>>> other choices might be more difficult in terms of parsing JSON etc.
> >>>>
> >>>> I might think I have the following URLs:
> >>>>
> >>>> http//example.com/annotations/ ; (container)
> >>>>
> >>>> http//example.com/annotations/ids/ ;  e.g. GET
> >>>> http://example.com/annotations/ids/3 to get annotation #3
> >>>>
> >>>> http//example.com/annotations/targets/ ;  e.g. GET
> >>>> http://example.com/annotations/targets/boston.com to get all
> >>>> annotations for the boston.com domain (exact match)
> >>>>
> >>>> I think you are suggesting that all logic is in the query string, so
> >>>> to get all matches containing boston.com, it might be
> >>>>
> >>>> or GET
> >>>> http://example.com/annotations?target=boston.com&match=contains
> >>>>
> >>>> where 'contains' is a string that would have to be well defined.
> >>>>
> >>>> I'm probably missing something related to the resources but am
> >>>> thinking I might be interested in all targets as well...
> >>>>
> >>>> regards, Frederick
> >>>>
> >>>> Frederick Hirsch
> >>>>
> >>>> www.fjhirsch.com
> >>>> @fjhirsch
> >>>>
> >>>>> On Apr 15, 2015, at 2:02 PM, Denenberg, Ray <rden@loc.gov> wrote:
> >>>>>
> >>>>> At this morning’s call  we discussed paging, filtering, and sorting
> >>>>> of
> >>>> annotations.
> >>>>>
> >>>>> A container may have a large number of annotations, and a client may
> >>>>> want
> >>>> to specify that it wants only 100, then another 100 on the next
> request,
> >> and
> >>>> so on.   That would be straight paging, as the annotations are going
> to be
> >>>> supplied in random order.
> >>>>
> >>>>>
> >>>>> But the client may be  interested only in annotations with (for
> >>>>> example) a
> >>>> specific Motivation, or meeting some other criteria.  Then that’s
> >>>> going to require pre-filtering, and it still may require paging in
> addition
> >> because the
> >>>> set of annotation meeting the criteria might still be large.   So
> this brings
> >> into
> >>>> the conversation the concept of a result set (where for “straight”
> >>>> paging, the result set is the entire set of annotations).
> >>>>>
> >>>>> Further, the client may want the results supplied in some specified
> >>>>> order,
> >>>> for example, most recent first.  That brings into play sorting the
> result set.
> >>>>>
> >>>>> If we are going to come up with a querying mechanism  it would make
> >>>> sense to build into it  support for result sets and sorting.
> >>>> Alternatively we could use an existing search protocol that already
> >> supports all of this.
> >>>>>
> >>>>> So I’d like to offer for consideration developing a profile of the
> >>>>> SRU
> >>>> protocol  http://www.loc.gov/standards/sru/. I suggest that you NOT
> >>>> bother reading the spec and instead let me try to describe how simple
> it
> >> really can
> >>>> be if profiled for our  purposes.   (As to the status of this
> protocol, it is an
> >>>> OASIS standard, and is being fast-tracked in ISO.)
> >>>>>
> >>>>> Here is a rough outline of the suggested approach:
> >>>>> _________________________________________________
> >>>>>
> >>>>> I have a resource:
> >>>>> http://example.com /rays-resources/resource1
> >>>>>
> >>>>> I create an annotation container for it:
> >>>>> http://example.com /rays-resources/resource1/annotations
> >>>>>
> >>>>> I create an SRU endpoint for it:
> >>>>> http://example.com /rays-resources/resource1/annotations/sru
> >>>>>
> >>>>> this URL …..
> >>>>>
> >>>>> http://example.com /rays-resources/resource1/annotations/sru?
> >>>>> query=”oa.motivation=reviewing sortBy=oa.date/descending”
> >>>> &startRecord=1&maximumRecords=100
> >>>>>
> >>>>> (might have to percent encode “/” and space)
> >>>>>
> >>>>> …….  Says:
> >>>>> Search  http://example.com /rays-resources/resource1/annotations/
> >>>>> ·         For annotations whose Motivation is “reviewing”
> >>>>> ·         Sort the results by date, most recent first
> >>>>> ·         Return 100 annotations, beginning with the first
> >>>>>
> >>>>> Within the response, there will be a resultSetId.  Let’s say it’s
> >> “resultsXYZ”
> >>>>>
> >>>>>  The following URL gets the next 100 annotations:
> >>>>>
> >>>>> http://example.com /rays-resources/resource1/annotations/sru?
> >>>>> query=resultSetId=resultsXYZ&startRecord=101&maximumRecords=100
> >>>>>
> >>>>>
> >>>>>
> >>>>> Ok there’s handwaving here,  it needs elaboration, but it is nearly
> >>>>> as simple
> >>>> as this.  Don’t be scared by the complexity of the specification, it
> >>>> can be profiled into a specification nearly as simple as I have
> described.
> >>>>>
> >>>>>
> >>>>> Ray
> >>>
> >
>
>
>


-- 
Rob Sanderson
Information Standards Advocate
Digital Library Systems and Services
Stanford, CA 94305
Received on Thursday, 16 April 2015 16:51:44 UTC