Re: Paging, filtering, and sorting from Frederick Hirsch on 2015-04-16 (public-annotation@w3.org from April 2015)

From: Frederick Hirsch <w3c@fjhirsch.com>
Date: Thu, 16 Apr 2015 12:48:47 -0400
To: "Denenberg, Ray" <rden@loc.gov>
Cc: Web Annotation <public-annotation@w3.org>
Message-Id: <68883CEA-4B10-4C0A-8672-D9D9EAB5292F@fjhirsch.com>
thanks, was looking for Python and  see the name azaroth next to it :)
 
regards, Frederick

Frederick Hirsch
Co-Chair, W3C Web Annotation WG

www.fjhirsch.com
@fjhirsch

> On Apr 16, 2015, at 8:40 AM, Denenberg, Ray <rden@loc.gov> wrote:
> 
> From: Frederick Hirsch [mailto:w3c@fjhirsch.com]
>> is there open source query language processing plugin/code etc?
> 
> Here's a place to start, http://www.loc.gov/standards/sru/resources/products.html 
> (unfortunately it hasn't been updated lately.)
> 
> Ray
> 
> 
> 
> 
> 
>> 
>> regards, Frederick
>> 
>> On Apr 15, 2015, at 5:39 PM, Denenberg, Ray <rden@loc.gov> wrote:
>> 
>>> Thanks, Frederick.     I likely oversimplified so let me elaborate/clarify a few
>> points.
>>> 
>>> The SRU search URL is composed of a base URL, followed by a question
>> mark ('?) followed by a list of parameter name/value pairs separated by
>> ampersands ('&') where the parameter name and value are separated by
>> equal sign ('='). This is all as in the URI standard.
>>> 
>>> But the SRU parameter names are strictly defined in the SRU standard, you
>> can't make them up as you go along, But what you DO get to do is make up
>> your own index names.  That is, you can define a namespace of index names.
>> For SRU/CQL we call such a namespace a "context set".
>>> 
>>> So in my example where I say
>>> 
>>> query=oa.motivation=reviewing
>>> 
>>> 'oa' would refer to the oa context set  (and actually you could omit the
>> prefix and declare oa to be the default, which if you do, causes other
>> complications, but I won't go into that now).    The query string is defined to
>> be a list of search clauses separated by Boolean operators (spaces on each
>> side) where each search clause is an index and value, separated by a relator,
>> the most common of which is '='.
>>> 
>>> I concede there is a bit of awkwardness in the syntax where "=" is
>>> used to mean different things, as in
>>> 
>>> query=oa.motivation=reviewing
>>> 
>>> but you can always quote the query string if it makes you more
>> comfortable:
>>> 
>>> query="oa.motivation=reviewing"
>>> 
>>> and in fact you HAVE to quote it if there are embedded spaces:
>>> 
>>> Query="title=cat AND publisher=dog"
>>> 
>>> (Note the AND, not ampersand, because we are not separating URL
>>> parameters but rather CQL search clauses.)
>>> 
>>> 
>>> But back to my point;  when you say:
>>> 
>>> http://example.com/annotations?target=boston.com&match=contains
>>> 
>>> instead you would say:
>>> 
>>> http://example.com/annotations?query="oa:target=boston.com AND
>> oa:match=contains"
>>> 
>>> 
>>> And yes, all (or most) of the logic is in the query string.
>>> 
>>> Ray
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Frederick Hirsch [mailto:w3c@fjhirsch.com]
>>>> Sent: Wednesday, April 15, 2015 4:18 PM
>>>> To: Denenberg, Ray
>>>> Cc: Web Annotation
>>>> Subject: Re: Paging, filtering, and sorting
>>>> 
>>>> Thanks Ray, the concept of sorted result sets seems very relevant.
>>>> 
>>>> How hard would it be for me to make the following query:
>>>> 
>>>> Search/Filter the annotations stored on my web site (example.com) for
>>>> the target domain boston.com (or *.boston.com) posted on the date 1
>>>> April 2015 sorted by most recent first and limited to the first 200?
>>>> 
>>>> My naive approach might be to simply store annotations with ids I
>>>> create and perhaps index by target domain without other fields (e.g.
>>>> think of a table with id, domain as text string, and text holding
>>>> arbitrary JSON of the annotation). This means I would have a server
>>>> that could return an annotation by id, or by domain, or iterate, but
>>>> other choices might be more difficult in terms of parsing JSON etc.
>>>> 
>>>> I might think I have the following URLs:
>>>> 
>>>> http//example.com/annotations/ ; (container)
>>>> 
>>>> http//example.com/annotations/ids/ ;  e.g. GET
>>>> http://example.com/annotations/ids/3 to get annotation #3
>>>> 
>>>> http//example.com/annotations/targets/ ;  e.g. GET
>>>> http://example.com/annotations/targets/boston.com to get all
>>>> annotations for the boston.com domain (exact match)
>>>> 
>>>> I think you are suggesting that all logic is in the query string, so
>>>> to get all matches containing boston.com, it might be
>>>> 
>>>> or GET
>>>> http://example.com/annotations?target=boston.com&match=contains
>>>> 
>>>> where 'contains' is a string that would have to be well defined.
>>>> 
>>>> I'm probably missing something related to the resources but am
>>>> thinking I might be interested in all targets as well...
>>>> 
>>>> regards, Frederick
>>>> 
>>>> Frederick Hirsch
>>>> 
>>>> www.fjhirsch.com
>>>> @fjhirsch
>>>> 
>>>>> On Apr 15, 2015, at 2:02 PM, Denenberg, Ray <rden@loc.gov> wrote:
>>>>> 
>>>>> At this morning’s call  we discussed paging, filtering, and sorting
>>>>> of
>>>> annotations.
>>>>> 
>>>>> A container may have a large number of annotations, and a client may
>>>>> want
>>>> to specify that it wants only 100, then another 100 on the next request,
>> and
>>>> so on.   That would be straight paging, as the annotations are going to be
>>>> supplied in random order.
>>>> 
>>>>> 
>>>>> But the client may be  interested only in annotations with (for
>>>>> example) a
>>>> specific Motivation, or meeting some other criteria.  Then that’s
>>>> going to require pre-filtering, and it still may require paging in addition
>> because the
>>>> set of annotation meeting the criteria might still be large.   So this brings
>> into
>>>> the conversation the concept of a result set (where for “straight”
>>>> paging, the result set is the entire set of annotations).
>>>>> 
>>>>> Further, the client may want the results supplied in some specified
>>>>> order,
>>>> for example, most recent first.  That brings into play sorting the result set.
>>>>> 
>>>>> If we are going to come up with a querying mechanism  it would make
>>>> sense to build into it  support for result sets and sorting.
>>>> Alternatively we could use an existing search protocol that already
>> supports all of this.
>>>>> 
>>>>> So I’d like to offer for consideration developing a profile of the
>>>>> SRU
>>>> protocol  http://www.loc.gov/standards/sru/. I suggest that you NOT
>>>> bother reading the spec and instead let me try to describe how simple it
>> really can
>>>> be if profiled for our  purposes.   (As to the status of this protocol, it is an
>>>> OASIS standard, and is being fast-tracked in ISO.)
>>>>> 
>>>>> Here is a rough outline of the suggested approach:
>>>>> _________________________________________________
>>>>> 
>>>>> I have a resource:
>>>>> http://example.com /rays-resources/resource1
>>>>> 
>>>>> I create an annotation container for it:
>>>>> http://example.com /rays-resources/resource1/annotations
>>>>> 
>>>>> I create an SRU endpoint for it:
>>>>> http://example.com /rays-resources/resource1/annotations/sru
>>>>> 
>>>>> this URL …..
>>>>> 
>>>>> http://example.com /rays-resources/resource1/annotations/sru?
>>>>> query=”oa.motivation=reviewing sortBy=oa.date/descending”
>>>> &startRecord=1&maximumRecords=100
>>>>> 
>>>>> (might have to percent encode “/” and space)
>>>>> 
>>>>> …….  Says:
>>>>> Search  http://example.com /rays-resources/resource1/annotations/
>>>>> ·         For annotations whose Motivation is “reviewing”
>>>>> ·         Sort the results by date, most recent first
>>>>> ·         Return 100 annotations, beginning with the first
>>>>> 
>>>>> Within the response, there will be a resultSetId.  Let’s say it’s
>> “resultsXYZ”
>>>>> 
>>>>>  The following URL gets the next 100 annotations:
>>>>> 
>>>>> http://example.com /rays-resources/resource1/annotations/sru?
>>>>> query=resultSetId=resultsXYZ&startRecord=101&maximumRecords=100
>>>>> 
>>>>> 
>>>>> 
>>>>> Ok there’s handwaving here,  it needs elaboration, but it is nearly
>>>>> as simple
>>>> as this.  Don’t be scared by the complexity of the specification, it
>>>> can be profiled into a specification nearly as simple as I have described.
>>>>> 
>>>>> 
>>>>> Ray
>>> 
>
Received on Thursday, 16 April 2015 16:49:15 UTC