- From: Jeremy Jay <jeremy@pbnjay.com>
- Date: Thu, 11 Jun 2020 14:31:44 -0400
- To: Thad Guidry <thadguidry@gmail.com>
- Cc: Tom Morris <tfmorris@gmail.com>, "Ford, Kevin" <kevinford@loc.gov>, "public-reconciliation@w3.org" <public-reconciliation@w3.org>
- Message-ID: <CAOT=ff-ZOZYEkL37V66T8qB5MFaTXVEMQL5P444N3NRkS01GPA@mail.gmail.com>
Forgive me if I missed this, but I don't believe there is a contract requiring the service provider to respond to all queries? e.g. if a request contains 100 queries the response may return only the first 10. The client would have to implement retry and completion to handle that, e.g. I think OpenRefine assumes it's an empty match at the moment. Jeremy On Thu, Jun 11, 2020 at 2:01 PM Thad Guidry <thadguidry@gmail.com> wrote: > Tom, > > Curious, Do you yourself have a particular preference for seeing rate > limiting from a service? > What methods do you see services use most often for that? HTTP Client > Error codes 4xx? 206 Partial Content returned after a Range header field > sent? > (I see Amazon, Google, etc. mostly use HTTP error responses specifically > of 403 and 429) > > Thad > https://www.linkedin.com/in/thadguidry/ > > > On Thu, Jun 11, 2020 at 12:45 PM Tom Morris <tfmorris@gmail.com> wrote: > >> All of the currently defined limits are for controlling the number of >> responses sent by the server, rather than the requests sent by the client. >> >> While we could add "recommended batch size" and/or "maximum batch size" >> to the manifest, I'm not sure it would add a lot of value. As a practical >> matter, clients are going to choose a batch size which balances between >> amortizing request overhead/latency and responsiveness for progress >> reporting. They aren't motivated to use giant request sizes. In a DOS >> situation, the attacker isn't going to be respecting any advertised limits. >> Note that the server is always free to respond 413 Request Entity Too >> Large and all modern service frameworks have a configurable limit for >> this. >> >> The spec is also silent on whether you can send simultaneous requests in >> parallel, rate limits, etc. I think this would be a more valuable area to >> improve from the point of view of protecting services. The 429 Too Many >> Requests code and Retry-After: header provide a starting point, but it >> may be useful to make use of some extended headers in the X-RateLimit-* >> space. >> >> There are two resources buckets associated with large requests: space & >> time. Once you've accepted the request, the space is used up, but there are >> no requirements or guarantees on how quickly the request will be processed. >> If you want to meter work on a per-request basis and take longer to respond >> to bigger batches, that's completely within the service implementers right >> to do. >> >> OpenRefine currently uses a fixed batch size of 10 and processes batches >> serially in a single threaded fashion, which is inherently rate limiting, >> but it would be nice to improve the latency hiding and be able to have >> multiple requests in flight, while still being polite to reconciliation >> services. >> >> Tom >> >> On Thu, Jun 11, 2020 at 12:00 PM Ford, Kevin <kevinford@loc.gov> wrote: >> >>> Hello all: >>> >>> >>> >>> I presume this is the best place to ask this question, which I’ve >>> harbored for years but which for a variety of reasons I’ve never had real >>> occasion to ask until now. >>> >>> >>> >>> Do I understand correctly that there is no limit to, and no way to >>> enforce a limit of, the reconciliation query batch size? >>> >>> >>> >>> This sentence from the documentation on the Github site – “OpenRefine >>> queries the reconciliation service in batch mode >>> <https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API#multiple-query-mode> >>> on the first ten items of the column to be reconciled.” [1] - /might/ >>> suggest the size of batches is 10, but I believe we’re to understand that >>> this particular call basically represents a test before the real, full >>> reconciliation kicks off. Yes? >>> >>> >>> >>> The “Note” under this section of the W3C specification work [2] seems to >>> make it abundantly clear that there is no restriction on the length of >>> query batches. >>> >>> >>> >>> I didn’t see a clear way to do this via the service manifest. >>> >>> >>> >>> If there is no limit on the size, is there a way for a service provider >>> to impose a limit? If so, how? If not, why not? >>> >>> >>> >>> Assuming it is not possible to impose a limit, how does one protect a >>> service from becoming overwhelmed by one extremely large reconciliation >>> request or a number of big ones? It seems that this opens up the service >>> to a DoS attack, but perhaps I am mistaken. Even if that risk is perhaps >>> marginal, it still seems that a provider could nevertheless experience a >>> considerable performance penalty having to field requests with huge query >>> batch sizes. >>> >>> >>> >>> I’m familiar in an academic sense with OpenRefine, but not whether it >>> might control the size of query batches to ensure a provider is not >>> overwhelmed. That said, if this work is to become a more generic way to >>> provide reconciliation or suggest services to be used by software other >>> than OpenRefine, then it still seems this should be an >>> advertiseable/controllable value since one cannot always count on the >>> client being responsible. >>> >>> >>> >>> Yours, >>> >>> Kevin >>> >>> >>> >>> [1] >>> https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API#workflow-overview >>> >>> [2] >>> https://reconciliation-api.github.io/specs/0.1/#sending-reconciliation-queries-to-a-service >>> >>> >>> >>> -- >>> >>> Kevin Ford >>> >>> Library of Congress >>> >>> Washington, DC >>> >>> >>> >>
Received on Thursday, 11 June 2020 19:25:09 UTC