- From: Denenberg, Ray <rden@loc.gov>
- Date: Thu, 16 Apr 2015 08:40:40 -0400
- To: Frederick Hirsch <w3c@fjhirsch.com>
- CC: Web Annotation <public-annotation@w3.org>
From: Frederick Hirsch [mailto:w3c@fjhirsch.com] > is there open source query language processing plugin/code etc? Here's a place to start, http://www.loc.gov/standards/sru/resources/products.html (unfortunately it hasn't been updated lately.) Ray > > regards, Frederick > > On Apr 15, 2015, at 5:39 PM, Denenberg, Ray <rden@loc.gov> wrote: > > > Thanks, Frederick. I likely oversimplified so let me elaborate/clarify a few > points. > > > > The SRU search URL is composed of a base URL, followed by a question > mark ('?) followed by a list of parameter name/value pairs separated by > ampersands ('&') where the parameter name and value are separated by > equal sign ('='). This is all as in the URI standard. > > > > But the SRU parameter names are strictly defined in the SRU standard, you > can't make them up as you go along, But what you DO get to do is make up > your own index names. That is, you can define a namespace of index names. > For SRU/CQL we call such a namespace a "context set". > > > > So in my example where I say > > > > query=oa.motivation=reviewing > > > > 'oa' would refer to the oa context set (and actually you could omit the > prefix and declare oa to be the default, which if you do, causes other > complications, but I won't go into that now). The query string is defined to > be a list of search clauses separated by Boolean operators (spaces on each > side) where each search clause is an index and value, separated by a relator, > the most common of which is '='. > > > > I concede there is a bit of awkwardness in the syntax where "=" is > > used to mean different things, as in > > > > query=oa.motivation=reviewing > > > > but you can always quote the query string if it makes you more > comfortable: > > > > query="oa.motivation=reviewing" > > > > and in fact you HAVE to quote it if there are embedded spaces: > > > > Query="title=cat AND publisher=dog" > > > > (Note the AND, not ampersand, because we are not separating URL > > parameters but rather CQL search clauses.) > > > > > > But back to my point; when you say: > > > > http://example.com/annotations?target=boston.com&match=contains > > > > instead you would say: > > > > http://example.com/annotations?query="oa:target=boston.com AND > oa:match=contains" > > > > > > And yes, all (or most) of the logic is in the query string. > > > > Ray > > > > > >> -----Original Message----- > >> From: Frederick Hirsch [mailto:w3c@fjhirsch.com] > >> Sent: Wednesday, April 15, 2015 4:18 PM > >> To: Denenberg, Ray > >> Cc: Web Annotation > >> Subject: Re: Paging, filtering, and sorting > >> > >> Thanks Ray, the concept of sorted result sets seems very relevant. > >> > >> How hard would it be for me to make the following query: > >> > >> Search/Filter the annotations stored on my web site (example.com) for > >> the target domain boston.com (or *.boston.com) posted on the date 1 > >> April 2015 sorted by most recent first and limited to the first 200? > >> > >> My naive approach might be to simply store annotations with ids I > >> create and perhaps index by target domain without other fields (e.g. > >> think of a table with id, domain as text string, and text holding > >> arbitrary JSON of the annotation). This means I would have a server > >> that could return an annotation by id, or by domain, or iterate, but > >> other choices might be more difficult in terms of parsing JSON etc. > >> > >> I might think I have the following URLs: > >> > >> http//example.com/annotations/ ; (container) > >> > >> http//example.com/annotations/ids/ ; e.g. GET > >> http://example.com/annotations/ids/3 to get annotation #3 > >> > >> http//example.com/annotations/targets/ ; e.g. GET > >> http://example.com/annotations/targets/boston.com to get all > >> annotations for the boston.com domain (exact match) > >> > >> I think you are suggesting that all logic is in the query string, so > >> to get all matches containing boston.com, it might be > >> > >> or GET > >> http://example.com/annotations?target=boston.com&match=contains > >> > >> where 'contains' is a string that would have to be well defined. > >> > >> I'm probably missing something related to the resources but am > >> thinking I might be interested in all targets as well... > >> > >> regards, Frederick > >> > >> Frederick Hirsch > >> > >> www.fjhirsch.com > >> @fjhirsch > >> > >>> On Apr 15, 2015, at 2:02 PM, Denenberg, Ray <rden@loc.gov> wrote: > >>> > >>> At this morning’s call we discussed paging, filtering, and sorting > >>> of > >> annotations. > >>> > >>> A container may have a large number of annotations, and a client may > >>> want > >> to specify that it wants only 100, then another 100 on the next request, > and > >> so on. That would be straight paging, as the annotations are going to be > >> supplied in random order. > >> > >>> > >>> But the client may be interested only in annotations with (for > >>> example) a > >> specific Motivation, or meeting some other criteria. Then that’s > >> going to require pre-filtering, and it still may require paging in addition > because the > >> set of annotation meeting the criteria might still be large. So this brings > into > >> the conversation the concept of a result set (where for “straight” > >> paging, the result set is the entire set of annotations). > >>> > >>> Further, the client may want the results supplied in some specified > >>> order, > >> for example, most recent first. That brings into play sorting the result set. > >>> > >>> If we are going to come up with a querying mechanism it would make > >> sense to build into it support for result sets and sorting. > >> Alternatively we could use an existing search protocol that already > supports all of this. > >>> > >>> So I’d like to offer for consideration developing a profile of the > >>> SRU > >> protocol http://www.loc.gov/standards/sru/. I suggest that you NOT > >> bother reading the spec and instead let me try to describe how simple it > really can > >> be if profiled for our purposes. (As to the status of this protocol, it is an > >> OASIS standard, and is being fast-tracked in ISO.) > >>> > >>> Here is a rough outline of the suggested approach: > >>> _________________________________________________ > >>> > >>> I have a resource: > >>> http://example.com /rays-resources/resource1 > >>> > >>> I create an annotation container for it: > >>> http://example.com /rays-resources/resource1/annotations > >>> > >>> I create an SRU endpoint for it: > >>> http://example.com /rays-resources/resource1/annotations/sru > >>> > >>> this URL ….. > >>> > >>> http://example.com /rays-resources/resource1/annotations/sru? > >>> query=”oa.motivation=reviewing sortBy=oa.date/descending” > >> &startRecord=1&maximumRecords=100 > >>> > >>> (might have to percent encode “/” and space) > >>> > >>> ……. Says: > >>> Search http://example.com /rays-resources/resource1/annotations/ > >>> · For annotations whose Motivation is “reviewing” > >>> · Sort the results by date, most recent first > >>> · Return 100 annotations, beginning with the first > >>> > >>> Within the response, there will be a resultSetId. Let’s say it’s > “resultsXYZ” > >>> > >>> The following URL gets the next 100 annotations: > >>> > >>> http://example.com /rays-resources/resource1/annotations/sru? > >>> query=resultSetId=resultsXYZ&startRecord=101&maximumRecords=100 > >>> > >>> > >>> > >>> Ok there’s handwaving here, it needs elaboration, but it is nearly > >>> as simple > >> as this. Don’t be scared by the complexity of the specification, it > >> can be profiled into a specification nearly as simple as I have described. > >>> > >>> > >>> Ray > >
Received on Thursday, 16 April 2015 12:41:12 UTC