- From: Thad Guidry <thadguidry@gmail.com>
- Date: Thu, 11 Jun 2020 11:47:10 -0500
- To: "Ford, Kevin" <kevinford@loc.gov>
- Cc: "public-reconciliation@w3.org" <public-reconciliation@w3.org>
- Message-ID: <CAChbWaN_GCSfn4TE=-Znihb7vOyPw4F8hi=bA_H=S0x9XBFg0w@mail.gmail.com>
Hi Kevin! 1. The Reconciliation API is for ANY client, not just for OpenRefine. And why we decided to begin a W3C community to collaborate with community on creating a standard (currently in working draft) 2. As for where OpenRefine uses limits within Reconciliation processes...the limits are set or used in a few areas: OpenRefine's StandardReconConfig - API documented here: https://reconciliation-api.github.io/specs/0.1/#structure-of-a-reconciliation-query OpenRefine's GuessTypesOfColumn - hardcoded sample size in OpenRefine client <https://github.com/OpenRefine/OpenRefine/blob/master/main/src/com/google/refine/commands/recon/GuessTypesOfColumnCommand.java#L119> is limit of 10 first rows to inspect and send to service to guess column Type. OpenRefine's Suggest Service flyout pane - limits provided by service provider - API documented here: https://reconciliation-api.github.io/specs/0.1/#suggest-services Data Extension Property - API proposal documented here https://reconciliation-api.github.io/specs/0.1/#data-extension-service Let me know if that helps or you have further questions. Thad https://www.linkedin.com/in/thadguidry/ On Thu, Jun 11, 2020 at 11:00 AM Ford, Kevin <kevinford@loc.gov> wrote: > Hello all: > > > > I presume this is the best place to ask this question, which I’ve harbored > for years but which for a variety of reasons I’ve never had real occasion > to ask until now. > > > > Do I understand correctly that there is no limit to, and no way to enforce > a limit of, the reconciliation query batch size? > > > > This sentence from the documentation on the Github site – “OpenRefine > queries the reconciliation service in batch mode > <https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API#multiple-query-mode> > on the first ten items of the column to be reconciled.” [1] - /might/ > suggest the size of batches is 10, but I believe we’re to understand that > this particular call basically represents a test before the real, full > reconciliation kicks off. Yes? > > > > The “Note” under this section of the W3C specification work [2] seems to > make it abundantly clear that there is no restriction on the length of > query batches. > > > > I didn’t see a clear way to do this via the service manifest. > > > > If there is no limit on the size, is there a way for a service provider to > impose a limit? If so, how? If not, why not? > > > > Assuming it is not possible to impose a limit, how does one protect a > service from becoming overwhelmed by one extremely large reconciliation > request or a number of big ones? It seems that this opens up the service > to a DoS attack, but perhaps I am mistaken. Even if that risk is perhaps > marginal, it still seems that a provider could nevertheless experience a > considerable performance penalty having to field requests with huge query > batch sizes. > > > > I’m familiar in an academic sense with OpenRefine, but not whether it > might control the size of query batches to ensure a provider is not > overwhelmed. That said, if this work is to become a more generic way to > provide reconciliation or suggest services to be used by software other > than OpenRefine, then it still seems this should be an > advertiseable/controllable value since one cannot always count on the > client being responsible. > > > > Yours, > > Kevin > > > > [1] > https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API#workflow-overview > > [2] > https://reconciliation-api.github.io/specs/0.1/#sending-reconciliation-queries-to-a-service > > > > -- > > Kevin Ford > > Library of Congress > > Washington, DC > > >
Received on Thursday, 11 June 2020 16:47:34 UTC