Re: Paging, filtering, and sorting from Frederick Hirsch on 2015-04-16 (public-annotation@w3.org from April 2015)

From: Frederick Hirsch <w3c@fjhirsch.com>
Date: Wed, 15 Apr 2015 20:17:25 -0400
To: "Denenberg, Ray" <rden@loc.gov>
Cc: Web Annotation <public-annotation@w3.org>
Message-Id: <5B396B21-4041-4640-93F0-A3AF9049724B@fjhirsch.com>
Ray,

Much thanks to you and Rob for the explanations

I need to better understand how this fits with the REST approach,  however I can see the benefits of the generality and extensibility of the approach.

is there open source query language processing plugin/code etc?

regards, Frederick

On Apr 15, 2015, at 5:39 PM, Denenberg, Ray <rden@loc.gov> wrote:

> Thanks, Frederick.     I likely oversimplified so let me elaborate/clarify a few points. 
> 
> The SRU search URL is composed of a base URL, followed by a question mark ('?) followed by a list of parameter name/value pairs separated by ampersands ('&') where the parameter name and value are separated by equal sign ('='). This is all as in the URI standard.   
> 
> But the SRU parameter names are strictly defined in the SRU standard, you can't make them up as you go along, But what you DO get to do is make up your own index names.  That is, you can define a namespace of index names.  For SRU/CQL we call such a namespace a "context set".  
> 
> So in my example where I say
> 
> query=oa.motivation=reviewing
> 
> 'oa' would refer to the oa context set  (and actually you could omit the prefix and declare oa to be the default, which if you do, causes other complications, but I won't go into that now).    The query string is defined to be a list of search clauses separated by Boolean operators (spaces on each side) where each search clause is an index and value, separated by a relator, the most common of which is '='.
> 
> I concede there is a bit of awkwardness in the syntax where "=" is used to mean different things, as in
> 
> query=oa.motivation=reviewing
> 
> but you can always quote the query string if it makes you more comfortable:
> 
> query="oa.motivation=reviewing"
> 
> and in fact you HAVE to quote it if there are embedded spaces:
> 
> Query="title=cat AND publisher=dog"
> 
> (Note the AND, not ampersand, because we are not separating URL parameters but rather CQL search clauses.)
> 
> 
> But back to my point;  when you say:
> 
> http://example.com/annotations?target=boston.com&match=contains
> 
> instead you would say:
> 
> http://example.com/annotations?query="oa:target=boston.com AND oa:match=contains"
> 
> 
> And yes, all (or most) of the logic is in the query string.    
> 
> Ray
> 
> 
>> -----Original Message-----
>> From: Frederick Hirsch [mailto:w3c@fjhirsch.com]
>> Sent: Wednesday, April 15, 2015 4:18 PM
>> To: Denenberg, Ray
>> Cc: Web Annotation
>> Subject: Re: Paging, filtering, and sorting
>> 
>> Thanks Ray, the concept of sorted result sets seems very relevant.
>> 
>> How hard would it be for me to make the following query:
>> 
>> Search/Filter the annotations stored on my web site (example.com) for the
>> target domain boston.com (or *.boston.com) posted on the date 1 April 2015
>> sorted by most recent first and limited to the first 200?
>> 
>> My naive approach might be to simply store annotations with ids I create and
>> perhaps index by target domain without other fields (e.g. think of a table
>> with id, domain as text string, and text holding arbitrary JSON of the
>> annotation). This means I would have a server that could return an
>> annotation by id, or by domain, or iterate, but other choices might be more
>> difficult in terms of parsing JSON etc.
>> 
>> I might think I have the following URLs:
>> 
>> http//example.com/annotations/ ; (container)
>> 
>> http//example.com/annotations/ids/ ;  e.g. GET
>> http://example.com/annotations/ids/3 to get annotation #3
>> 
>> http//example.com/annotations/targets/ ;  e.g. GET
>> http://example.com/annotations/targets/boston.com to get all annotations
>> for the boston.com domain (exact match)
>> 
>> I think you are suggesting that all logic is in the query string, so to get all
>> matches containing boston.com, it might be
>> 
>> or GET
>> http://example.com/annotations?target=boston.com&match=contains
>> 
>> where 'contains' is a string that would have to be well defined.
>> 
>> I'm probably missing something related to the resources but am thinking I
>> might be interested in all targets as well...
>> 
>> regards, Frederick
>> 
>> Frederick Hirsch
>> 
>> www.fjhirsch.com
>> @fjhirsch
>> 
>>> On Apr 15, 2015, at 2:02 PM, Denenberg, Ray <rden@loc.gov> wrote:
>>> 
>>> At this morning’s call  we discussed paging, filtering, and sorting of
>> annotations.
>>> 
>>> A container may have a large number of annotations, and a client may want
>> to specify that it wants only 100, then another 100 on the next request, and
>> so on.   That would be straight paging, as the annotations are going to be
>> supplied in random order.
>> 
>>> 
>>> But the client may be  interested only in annotations with (for example) a
>> specific Motivation, or meeting some other criteria.  Then that’s going to
>> require pre-filtering, and it still may require paging in addition because the
>> set of annotation meeting the criteria might still be large.   So this brings into
>> the conversation the concept of a result set (where for “straight” paging, the
>> result set is the entire set of annotations).
>>> 
>>> Further, the client may want the results supplied in some specified order,
>> for example, most recent first.  That brings into play sorting the result set.
>>> 
>>> If we are going to come up with a querying mechanism  it would make
>> sense to build into it  support for result sets and sorting.  Alternatively we
>> could use an existing search protocol that already supports all of this.
>>> 
>>> So I’d like to offer for consideration developing a profile of the SRU
>> protocol  http://www.loc.gov/standards/sru/. I suggest that you NOT bother
>> reading the spec and instead let me try to describe how simple it really can
>> be if profiled for our  purposes.   (As to the status of this protocol, it is an
>> OASIS standard, and is being fast-tracked in ISO.)
>>> 
>>> Here is a rough outline of the suggested approach:
>>> _________________________________________________
>>> 
>>> I have a resource:
>>> http://example.com /rays-resources/resource1
>>> 
>>> I create an annotation container for it:
>>> http://example.com /rays-resources/resource1/annotations
>>> 
>>> I create an SRU endpoint for it:
>>> http://example.com /rays-resources/resource1/annotations/sru
>>> 
>>> this URL …..
>>> 
>>> http://example.com /rays-resources/resource1/annotations/sru?
>>> query=”oa.motivation=reviewing sortBy=oa.date/descending”
>> &startRecord=1&maximumRecords=100
>>> 
>>> (might have to percent encode “/” and space)
>>> 
>>> …….  Says:
>>> Search  http://example.com /rays-resources/resource1/annotations/
>>> ·         For annotations whose Motivation is “reviewing”
>>> ·         Sort the results by date, most recent first
>>> ·         Return 100 annotations, beginning with the first
>>> 
>>> Within the response, there will be a resultSetId.  Let’s say it’s  “resultsXYZ”
>>> 
>>>   The following URL gets the next 100 annotations:
>>> 
>>> http://example.com /rays-resources/resource1/annotations/sru?
>>> query=resultSetId=resultsXYZ&startRecord=101&maximumRecords=100
>>> 
>>> 
>>> 
>>> Ok there’s handwaving here,  it needs elaboration, but it is nearly as simple
>> as this.  Don’t be scared by the complexity of the specification, it can be
>> profiled into a specification nearly as simple as I have described.
>>> 
>>> 
>>> Ray
>
Received on Thursday, 16 April 2015 00:18:00 UTC