- From: Ford, Kevin <kevinford@loc.gov>
- Date: Thu, 11 Jun 2020 15:58:50 +0000
- To: "public-reconciliation@w3.org" <public-reconciliation@w3.org>
- Message-ID: <ff45f9980cd5426da1e728ed4e5adf47@LCXEX03.LCDS.LOC.GOV>
Hello all: I presume this is the best place to ask this question, which I've harbored for years but which for a variety of reasons I've never had real occasion to ask until now. Do I understand correctly that there is no limit to, and no way to enforce a limit of, the reconciliation query batch size? This sentence from the documentation on the Github site - "OpenRefine queries the reconciliation service in batch mode<https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API#multiple-query-mode> on the first ten items of the column to be reconciled." [1] - /might/ suggest the size of batches is 10, but I believe we're to understand that this particular call basically represents a test before the real, full reconciliation kicks off. Yes? The "Note" under this section of the W3C specification work [2] seems to make it abundantly clear that there is no restriction on the length of query batches. I didn't see a clear way to do this via the service manifest. If there is no limit on the size, is there a way for a service provider to impose a limit? If so, how? If not, why not? Assuming it is not possible to impose a limit, how does one protect a service from becoming overwhelmed by one extremely large reconciliation request or a number of big ones? It seems that this opens up the service to a DoS attack, but perhaps I am mistaken. Even if that risk is perhaps marginal, it still seems that a provider could nevertheless experience a considerable performance penalty having to field requests with huge query batch sizes. I'm familiar in an academic sense with OpenRefine, but not whether it might control the size of query batches to ensure a provider is not overwhelmed. That said, if this work is to become a more generic way to provide reconciliation or suggest services to be used by software other than OpenRefine, then it still seems this should be an advertiseable/controllable value since one cannot always count on the client being responsible. Yours, Kevin [1] https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation-Service-API#workflow-overview [2] https://reconciliation-api.github.io/specs/0.1/#sending-reconciliation-queries-to-a-service -- Kevin Ford Library of Congress Washington, DC
Received on Thursday, 11 June 2020 16:00:48 UTC