- From: Frederick Hirsch <w3c@fjhirsch.com>
- Date: Thu, 16 Apr 2015 12:48:47 -0400
- To: "Denenberg, Ray" <rden@loc.gov>
- Cc: Web Annotation <public-annotation@w3.org>
thanks, was looking for Python and see the name azaroth next to it :) regards, Frederick Frederick Hirsch Co-Chair, W3C Web Annotation WG www.fjhirsch.com @fjhirsch > On Apr 16, 2015, at 8:40 AM, Denenberg, Ray <rden@loc.gov> wrote: > > From: Frederick Hirsch [mailto:w3c@fjhirsch.com] >> is there open source query language processing plugin/code etc? > > Here's a place to start, http://www.loc.gov/standards/sru/resources/products.html > (unfortunately it hasn't been updated lately.) > > Ray > > > > > >> >> regards, Frederick >> >> On Apr 15, 2015, at 5:39 PM, Denenberg, Ray <rden@loc.gov> wrote: >> >>> Thanks, Frederick. I likely oversimplified so let me elaborate/clarify a few >> points. >>> >>> The SRU search URL is composed of a base URL, followed by a question >> mark ('?) followed by a list of parameter name/value pairs separated by >> ampersands ('&') where the parameter name and value are separated by >> equal sign ('='). This is all as in the URI standard. >>> >>> But the SRU parameter names are strictly defined in the SRU standard, you >> can't make them up as you go along, But what you DO get to do is make up >> your own index names. That is, you can define a namespace of index names. >> For SRU/CQL we call such a namespace a "context set". >>> >>> So in my example where I say >>> >>> query=oa.motivation=reviewing >>> >>> 'oa' would refer to the oa context set (and actually you could omit the >> prefix and declare oa to be the default, which if you do, causes other >> complications, but I won't go into that now). The query string is defined to >> be a list of search clauses separated by Boolean operators (spaces on each >> side) where each search clause is an index and value, separated by a relator, >> the most common of which is '='. >>> >>> I concede there is a bit of awkwardness in the syntax where "=" is >>> used to mean different things, as in >>> >>> query=oa.motivation=reviewing >>> >>> but you can always quote the query string if it makes you more >> comfortable: >>> >>> query="oa.motivation=reviewing" >>> >>> and in fact you HAVE to quote it if there are embedded spaces: >>> >>> Query="title=cat AND publisher=dog" >>> >>> (Note the AND, not ampersand, because we are not separating URL >>> parameters but rather CQL search clauses.) >>> >>> >>> But back to my point; when you say: >>> >>> http://example.com/annotations?target=boston.com&match=contains >>> >>> instead you would say: >>> >>> http://example.com/annotations?query="oa:target=boston.com AND >> oa:match=contains" >>> >>> >>> And yes, all (or most) of the logic is in the query string. >>> >>> Ray >>> >>> >>>> -----Original Message----- >>>> From: Frederick Hirsch [mailto:w3c@fjhirsch.com] >>>> Sent: Wednesday, April 15, 2015 4:18 PM >>>> To: Denenberg, Ray >>>> Cc: Web Annotation >>>> Subject: Re: Paging, filtering, and sorting >>>> >>>> Thanks Ray, the concept of sorted result sets seems very relevant. >>>> >>>> How hard would it be for me to make the following query: >>>> >>>> Search/Filter the annotations stored on my web site (example.com) for >>>> the target domain boston.com (or *.boston.com) posted on the date 1 >>>> April 2015 sorted by most recent first and limited to the first 200? >>>> >>>> My naive approach might be to simply store annotations with ids I >>>> create and perhaps index by target domain without other fields (e.g. >>>> think of a table with id, domain as text string, and text holding >>>> arbitrary JSON of the annotation). This means I would have a server >>>> that could return an annotation by id, or by domain, or iterate, but >>>> other choices might be more difficult in terms of parsing JSON etc. >>>> >>>> I might think I have the following URLs: >>>> >>>> http//example.com/annotations/ ; (container) >>>> >>>> http//example.com/annotations/ids/ ; e.g. GET >>>> http://example.com/annotations/ids/3 to get annotation #3 >>>> >>>> http//example.com/annotations/targets/ ; e.g. GET >>>> http://example.com/annotations/targets/boston.com to get all >>>> annotations for the boston.com domain (exact match) >>>> >>>> I think you are suggesting that all logic is in the query string, so >>>> to get all matches containing boston.com, it might be >>>> >>>> or GET >>>> http://example.com/annotations?target=boston.com&match=contains >>>> >>>> where 'contains' is a string that would have to be well defined. >>>> >>>> I'm probably missing something related to the resources but am >>>> thinking I might be interested in all targets as well... >>>> >>>> regards, Frederick >>>> >>>> Frederick Hirsch >>>> >>>> www.fjhirsch.com >>>> @fjhirsch >>>> >>>>> On Apr 15, 2015, at 2:02 PM, Denenberg, Ray <rden@loc.gov> wrote: >>>>> >>>>> At this morning’s call we discussed paging, filtering, and sorting >>>>> of >>>> annotations. >>>>> >>>>> A container may have a large number of annotations, and a client may >>>>> want >>>> to specify that it wants only 100, then another 100 on the next request, >> and >>>> so on. That would be straight paging, as the annotations are going to be >>>> supplied in random order. >>>> >>>>> >>>>> But the client may be interested only in annotations with (for >>>>> example) a >>>> specific Motivation, or meeting some other criteria. Then that’s >>>> going to require pre-filtering, and it still may require paging in addition >> because the >>>> set of annotation meeting the criteria might still be large. So this brings >> into >>>> the conversation the concept of a result set (where for “straight” >>>> paging, the result set is the entire set of annotations). >>>>> >>>>> Further, the client may want the results supplied in some specified >>>>> order, >>>> for example, most recent first. That brings into play sorting the result set. >>>>> >>>>> If we are going to come up with a querying mechanism it would make >>>> sense to build into it support for result sets and sorting. >>>> Alternatively we could use an existing search protocol that already >> supports all of this. >>>>> >>>>> So I’d like to offer for consideration developing a profile of the >>>>> SRU >>>> protocol http://www.loc.gov/standards/sru/. I suggest that you NOT >>>> bother reading the spec and instead let me try to describe how simple it >> really can >>>> be if profiled for our purposes. (As to the status of this protocol, it is an >>>> OASIS standard, and is being fast-tracked in ISO.) >>>>> >>>>> Here is a rough outline of the suggested approach: >>>>> _________________________________________________ >>>>> >>>>> I have a resource: >>>>> http://example.com /rays-resources/resource1 >>>>> >>>>> I create an annotation container for it: >>>>> http://example.com /rays-resources/resource1/annotations >>>>> >>>>> I create an SRU endpoint for it: >>>>> http://example.com /rays-resources/resource1/annotations/sru >>>>> >>>>> this URL ….. >>>>> >>>>> http://example.com /rays-resources/resource1/annotations/sru? >>>>> query=”oa.motivation=reviewing sortBy=oa.date/descending” >>>> &startRecord=1&maximumRecords=100 >>>>> >>>>> (might have to percent encode “/” and space) >>>>> >>>>> ……. Says: >>>>> Search http://example.com /rays-resources/resource1/annotations/ >>>>> · For annotations whose Motivation is “reviewing” >>>>> · Sort the results by date, most recent first >>>>> · Return 100 annotations, beginning with the first >>>>> >>>>> Within the response, there will be a resultSetId. Let’s say it’s >> “resultsXYZ” >>>>> >>>>> The following URL gets the next 100 annotations: >>>>> >>>>> http://example.com /rays-resources/resource1/annotations/sru? >>>>> query=resultSetId=resultsXYZ&startRecord=101&maximumRecords=100 >>>>> >>>>> >>>>> >>>>> Ok there’s handwaving here, it needs elaboration, but it is nearly >>>>> as simple >>>> as this. Don’t be scared by the complexity of the specification, it >>>> can be profiled into a specification nearly as simple as I have described. >>>>> >>>>> >>>>> Ray >>> >
Received on Thursday, 16 April 2015 16:49:15 UTC