- From: Robert Sanderson <azaroth42@gmail.com>
- Date: Thu, 16 Apr 2015 09:51:16 -0700
- To: Frederick Hirsch <w3c@fjhirsch.com>
- Cc: "Denenberg, Ray" <rden@loc.gov>, Web Annotation <public-annotation@w3.org>
- Message-ID: <CABevsUHynboQs+=+YrkLqFMPwFNRZ4xvQxAC59ZBVNc2HUoaAA@mail.gmail.com>
The most recent version of my python implementation is: https://github.com/cheshire3/cheshire3/blob/develop/cheshire3/cqlParser.py And should be usable in a stand alone way, outside of the Cheshire3 system. Rob On Thu, Apr 16, 2015 at 9:48 AM, Frederick Hirsch <w3c@fjhirsch.com> wrote: > thanks, was looking for Python and see the name azaroth next to it :) > > regards, Frederick > > Frederick Hirsch > Co-Chair, W3C Web Annotation WG > > www.fjhirsch.com > @fjhirsch > > > On Apr 16, 2015, at 8:40 AM, Denenberg, Ray <rden@loc.gov> wrote: > > > > From: Frederick Hirsch [mailto:w3c@fjhirsch.com] > >> is there open source query language processing plugin/code etc? > > > > Here's a place to start, > http://www.loc.gov/standards/sru/resources/products.html > > (unfortunately it hasn't been updated lately.) > > > > Ray > > > > > > > > > > > >> > >> regards, Frederick > >> > >> On Apr 15, 2015, at 5:39 PM, Denenberg, Ray <rden@loc.gov> wrote: > >> > >>> Thanks, Frederick. I likely oversimplified so let me > elaborate/clarify a few > >> points. > >>> > >>> The SRU search URL is composed of a base URL, followed by a question > >> mark ('?) followed by a list of parameter name/value pairs separated by > >> ampersands ('&') where the parameter name and value are separated by > >> equal sign ('='). This is all as in the URI standard. > >>> > >>> But the SRU parameter names are strictly defined in the SRU standard, > you > >> can't make them up as you go along, But what you DO get to do is make up > >> your own index names. That is, you can define a namespace of index > names. > >> For SRU/CQL we call such a namespace a "context set". > >>> > >>> So in my example where I say > >>> > >>> query=oa.motivation=reviewing > >>> > >>> 'oa' would refer to the oa context set (and actually you could omit > the > >> prefix and declare oa to be the default, which if you do, causes other > >> complications, but I won't go into that now). The query string is > defined to > >> be a list of search clauses separated by Boolean operators (spaces on > each > >> side) where each search clause is an index and value, separated by a > relator, > >> the most common of which is '='. > >>> > >>> I concede there is a bit of awkwardness in the syntax where "=" is > >>> used to mean different things, as in > >>> > >>> query=oa.motivation=reviewing > >>> > >>> but you can always quote the query string if it makes you more > >> comfortable: > >>> > >>> query="oa.motivation=reviewing" > >>> > >>> and in fact you HAVE to quote it if there are embedded spaces: > >>> > >>> Query="title=cat AND publisher=dog" > >>> > >>> (Note the AND, not ampersand, because we are not separating URL > >>> parameters but rather CQL search clauses.) > >>> > >>> > >>> But back to my point; when you say: > >>> > >>> http://example.com/annotations?target=boston.com&match=contains > >>> > >>> instead you would say: > >>> > >>> http://example.com/annotations?query="oa:target=boston.com AND > >> oa:match=contains" > >>> > >>> > >>> And yes, all (or most) of the logic is in the query string. > >>> > >>> Ray > >>> > >>> > >>>> -----Original Message----- > >>>> From: Frederick Hirsch [mailto:w3c@fjhirsch.com] > >>>> Sent: Wednesday, April 15, 2015 4:18 PM > >>>> To: Denenberg, Ray > >>>> Cc: Web Annotation > >>>> Subject: Re: Paging, filtering, and sorting > >>>> > >>>> Thanks Ray, the concept of sorted result sets seems very relevant. > >>>> > >>>> How hard would it be for me to make the following query: > >>>> > >>>> Search/Filter the annotations stored on my web site (example.com) for > >>>> the target domain boston.com (or *.boston.com) posted on the date 1 > >>>> April 2015 sorted by most recent first and limited to the first 200? > >>>> > >>>> My naive approach might be to simply store annotations with ids I > >>>> create and perhaps index by target domain without other fields (e.g. > >>>> think of a table with id, domain as text string, and text holding > >>>> arbitrary JSON of the annotation). This means I would have a server > >>>> that could return an annotation by id, or by domain, or iterate, but > >>>> other choices might be more difficult in terms of parsing JSON etc. > >>>> > >>>> I might think I have the following URLs: > >>>> > >>>> http//example.com/annotations/ ; (container) > >>>> > >>>> http//example.com/annotations/ids/ ; e.g. GET > >>>> http://example.com/annotations/ids/3 to get annotation #3 > >>>> > >>>> http//example.com/annotations/targets/ ; e.g. GET > >>>> http://example.com/annotations/targets/boston.com to get all > >>>> annotations for the boston.com domain (exact match) > >>>> > >>>> I think you are suggesting that all logic is in the query string, so > >>>> to get all matches containing boston.com, it might be > >>>> > >>>> or GET > >>>> http://example.com/annotations?target=boston.com&match=contains > >>>> > >>>> where 'contains' is a string that would have to be well defined. > >>>> > >>>> I'm probably missing something related to the resources but am > >>>> thinking I might be interested in all targets as well... > >>>> > >>>> regards, Frederick > >>>> > >>>> Frederick Hirsch > >>>> > >>>> www.fjhirsch.com > >>>> @fjhirsch > >>>> > >>>>> On Apr 15, 2015, at 2:02 PM, Denenberg, Ray <rden@loc.gov> wrote: > >>>>> > >>>>> At this morning’s call we discussed paging, filtering, and sorting > >>>>> of > >>>> annotations. > >>>>> > >>>>> A container may have a large number of annotations, and a client may > >>>>> want > >>>> to specify that it wants only 100, then another 100 on the next > request, > >> and > >>>> so on. That would be straight paging, as the annotations are going > to be > >>>> supplied in random order. > >>>> > >>>>> > >>>>> But the client may be interested only in annotations with (for > >>>>> example) a > >>>> specific Motivation, or meeting some other criteria. Then that’s > >>>> going to require pre-filtering, and it still may require paging in > addition > >> because the > >>>> set of annotation meeting the criteria might still be large. So > this brings > >> into > >>>> the conversation the concept of a result set (where for “straight” > >>>> paging, the result set is the entire set of annotations). > >>>>> > >>>>> Further, the client may want the results supplied in some specified > >>>>> order, > >>>> for example, most recent first. That brings into play sorting the > result set. > >>>>> > >>>>> If we are going to come up with a querying mechanism it would make > >>>> sense to build into it support for result sets and sorting. > >>>> Alternatively we could use an existing search protocol that already > >> supports all of this. > >>>>> > >>>>> So I’d like to offer for consideration developing a profile of the > >>>>> SRU > >>>> protocol http://www.loc.gov/standards/sru/. I suggest that you NOT > >>>> bother reading the spec and instead let me try to describe how simple > it > >> really can > >>>> be if profiled for our purposes. (As to the status of this > protocol, it is an > >>>> OASIS standard, and is being fast-tracked in ISO.) > >>>>> > >>>>> Here is a rough outline of the suggested approach: > >>>>> _________________________________________________ > >>>>> > >>>>> I have a resource: > >>>>> http://example.com /rays-resources/resource1 > >>>>> > >>>>> I create an annotation container for it: > >>>>> http://example.com /rays-resources/resource1/annotations > >>>>> > >>>>> I create an SRU endpoint for it: > >>>>> http://example.com /rays-resources/resource1/annotations/sru > >>>>> > >>>>> this URL ….. > >>>>> > >>>>> http://example.com /rays-resources/resource1/annotations/sru? > >>>>> query=”oa.motivation=reviewing sortBy=oa.date/descending” > >>>> &startRecord=1&maximumRecords=100 > >>>>> > >>>>> (might have to percent encode “/” and space) > >>>>> > >>>>> ……. Says: > >>>>> Search http://example.com /rays-resources/resource1/annotations/ > >>>>> · For annotations whose Motivation is “reviewing” > >>>>> · Sort the results by date, most recent first > >>>>> · Return 100 annotations, beginning with the first > >>>>> > >>>>> Within the response, there will be a resultSetId. Let’s say it’s > >> “resultsXYZ” > >>>>> > >>>>> The following URL gets the next 100 annotations: > >>>>> > >>>>> http://example.com /rays-resources/resource1/annotations/sru? > >>>>> query=resultSetId=resultsXYZ&startRecord=101&maximumRecords=100 > >>>>> > >>>>> > >>>>> > >>>>> Ok there’s handwaving here, it needs elaboration, but it is nearly > >>>>> as simple > >>>> as this. Don’t be scared by the complexity of the specification, it > >>>> can be profiled into a specification nearly as simple as I have > described. > >>>>> > >>>>> > >>>>> Ray > >>> > > > > > -- Rob Sanderson Information Standards Advocate Digital Library Systems and Services Stanford, CA 94305
Received on Thursday, 16 April 2015 16:51:44 UTC