Re: Paging, filtering, and sorting

Thanks Ray, the concept of sorted result sets seems very relevant.

How hard would it be for me to make the following query:

Search/Filter the annotations stored on my web site (example.com) for the target domain boston.com (or *.boston.com) posted on the date 1 April 2015 sorted by most recent first and limited to the first 200?

My naive approach might be to simply store annotations with ids I create and perhaps index by target domain without other fields (e.g. think of a table with id, domain as text string, and text holding arbitrary JSON of the annotation). This means I would have a server that could return an annotation by id, or by domain, or iterate, but other choices might be more difficult in terms of parsing JSON etc.

I might think I have the following URLs:

http//example.com/annotations/ ; (container)

http//example.com/annotations/ids/ ;  e.g. GET http://example.com/annotations/ids/3 to get annotation #3

http//example.com/annotations/targets/ ;  e.g. GET http://example.com/annotations/targets/boston.com to get all annotations for the boston.com domain (exact match)

I think you are suggesting that all logic is in the query string, so to get all matches containing boston.com, it might be

or GET http://example.com/annotations?target=boston.com&match=contains

where 'contains' is a string that would have to be well defined.

I'm probably missing something related to the resources but am thinking I might be interested in all targets as well...

regards, Frederick

Frederick Hirsch

www.fjhirsch.com
@fjhirsch

> On Apr 15, 2015, at 2:02 PM, Denenberg, Ray <rden@loc.gov> wrote:
> 
> At this morning’s call  we discussed paging, filtering, and sorting of annotations.  
>  
> A container may have a large number of annotations, and a client may want to specify that it wants only 100, then another 100 on the next request, and so on.   That would be straight paging, as the annotations are going to be supplied in random order.  
>  
> But the client may be  interested only in annotations with (for example) a specific Motivation, or meeting some other criteria.  Then that’s going to require pre-filtering, and it still may require paging in addition because the set of annotation meeting the criteria might still be large.   So this brings into the conversation the concept of a result set (where for “straight” paging, the result set is the entire set of annotations).
>  
> Further, the client may want the results supplied in some specified order, for example, most recent first.  That brings into play sorting the result set.
>  
> If we are going to come up with a querying mechanism  it would make sense to build into it  support for result sets and sorting.  Alternatively we could use an existing search protocol that already supports all of this. 
>  
> So I’d like to offer for consideration developing a profile of the SRU protocol  http://www.loc.gov/standards/sru/. I suggest that you NOT bother reading the spec and instead let me try to describe how simple it really can be if profiled for our  purposes.   (As to the status of this protocol, it is an OASIS standard, and is being fast-tracked in ISO.)
>  
> Here is a rough outline of the suggested approach:
> _________________________________________________
>  
> I have a resource:
> http://example.com /rays-resources/resource1
>  
> I create an annotation container for it:
> http://example.com /rays-resources/resource1/annotations
>  
> I create an SRU endpoint for it:
> http://example.com /rays-resources/resource1/annotations/sru
>  
> this URL ….. 
>  
> http://example.com /rays-resources/resource1/annotations/sru?
> query=”oa.motivation=reviewing sortBy=oa.date/descending” &startRecord=1&maximumRecords=100
>  
> (might have to percent encode “/” and space)
>  
> …….  Says:
> Search  http://example.com /rays-resources/resource1/annotations/
> ·         For annotations whose Motivation is “reviewing”
> ·         Sort the results by date, most recent first
> ·         Return 100 annotations, beginning with the first
>  
> Within the response, there will be a resultSetId.  Let’s say it’s  “resultsXYZ”
>  
>    The following URL gets the next 100 annotations:
>  
> http://example.com /rays-resources/resource1/annotations/sru?
> query=resultSetId=resultsXYZ&startRecord=101&maximumRecords=100
>  
>  
>  
> Ok there’s handwaving here,  it needs elaboration, but it is nearly as simple as this.  Don’t be scared by the complexity of the specification, it can be profiled into a specification nearly as simple as I have described. 
>  
>  
> Ray

Received on Wednesday, 15 April 2015 20:18:46 UTC