- From: Thad Guidry <thadguidry@gmail.com>
- Date: Thu, 11 Jun 2020 13:01:08 -0500
- To: Tom Morris <tfmorris@gmail.com>
- Cc: "Ford, Kevin" <kevinford@loc.gov>, "public-reconciliation@w3.org" <public-reconciliation@w3.org>
- Message-ID: <CAChbWaNvaWMKuZLDZprmQYENVezxEO6V-bGrg2KznKTyU_GJ8A@mail.gmail.com>
Tom, Curious, Do you yourself have a particular preference for seeing rate limiting from a service? What methods do you see services use most often for that? HTTP Client Error codes 4xx? 206 Partial Content returned after a Range header field sent? (I see Amazon, Google, etc. mostly use HTTP error responses specifically of 403 and 429) Thad https://www.linkedin.com/in/thadguidry/ On Thu, Jun 11, 2020 at 12:45 PM Tom Morris <tfmorris@gmail.com> wrote: > All of the currently defined limits are for controlling the number of > responses sent by the server, rather than the requests sent by the client. > > While we could add "recommended batch size" and/or "maximum batch size" to > the manifest, I'm not sure it would add a lot of value. As a practical > matter, clients are going to choose a batch size which balances between > amortizing request overhead/latency and responsiveness for progress > reporting. They aren't motivated to use giant request sizes. In a DOS > situation, the attacker isn't going to be respecting any advertised limits. > Note that the server is always free to respond 413 Request Entity Too > Large and all modern service frameworks have a configurable limit for > this. > > The spec is also silent on whether you can send simultaneous requests in > parallel, rate limits, etc. I think this would be a more valuable area to > improve from the point of view of protecting services. The 429 Too Many > Requests code and Retry-After: header provide a starting point, but it > may be useful to make use of some extended headers in the X-RateLimit-* > space. > > There are two resources buckets associated with large requests: space & > time. Once you've accepted the request, the space is used up, but there are > no requirements or guarantees on how quickly the request will be processed. > If you want to meter work on a per-request basis and take longer to respond > to bigger batches, that's completely within the service implementers right > to do. > > OpenRefine currently uses a fixed batch size of 10 and processes batches > serially in a single threaded fashion, which is inherently rate limiting, > but it would be nice to improve the latency hiding and be able to have > multiple requests in flight, while still being polite to reconciliation > services. > > Tom > > On Thu, Jun 11, 2020 at 12:00 PM Ford, Kevin <kevinford@loc.gov> wrote: > >> Hello all: >> >> >> >> I presume this is the best place to ask this question, which I’ve >> harbored for years but which for a variety of reasons I’ve never had real >> occasion to ask until now. >> >> >> >> Do I understand correctly that there is no limit to, and no way to >> enforce a limit of, the reconciliation query batch size? >> >> >> >> This sentence from the documentation on the Github site – “OpenRefine >> queries the reconciliation service in batch mode >> <https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API#multiple-query-mode> >> on the first ten items of the column to be reconciled.” [1] - /might/ >> suggest the size of batches is 10, but I believe we’re to understand that >> this particular call basically represents a test before the real, full >> reconciliation kicks off. Yes? >> >> >> >> The “Note” under this section of the W3C specification work [2] seems to >> make it abundantly clear that there is no restriction on the length of >> query batches. >> >> >> >> I didn’t see a clear way to do this via the service manifest. >> >> >> >> If there is no limit on the size, is there a way for a service provider >> to impose a limit? If so, how? If not, why not? >> >> >> >> Assuming it is not possible to impose a limit, how does one protect a >> service from becoming overwhelmed by one extremely large reconciliation >> request or a number of big ones? It seems that this opens up the service >> to a DoS attack, but perhaps I am mistaken. Even if that risk is perhaps >> marginal, it still seems that a provider could nevertheless experience a >> considerable performance penalty having to field requests with huge query >> batch sizes. >> >> >> >> I’m familiar in an academic sense with OpenRefine, but not whether it >> might control the size of query batches to ensure a provider is not >> overwhelmed. That said, if this work is to become a more generic way to >> provide reconciliation or suggest services to be used by software other >> than OpenRefine, then it still seems this should be an >> advertiseable/controllable value since one cannot always count on the >> client being responsible. >> >> >> >> Yours, >> >> Kevin >> >> >> >> [1] >> https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API#workflow-overview >> >> [2] >> https://reconciliation-api.github.io/specs/0.1/#sending-reconciliation-queries-to-a-service >> >> >> >> -- >> >> Kevin Ford >> >> Library of Congress >> >> Washington, DC >> >> >> >
Received on Thursday, 11 June 2020 18:01:32 UTC