W3C home > Mailing lists > Public > public-webplatform@w3.org > November 2013

Re: Attempt to solve the job runner situation

From: Ryan Lane <rlane32@gmail.com>
Date: Wed, 6 Nov 2013 01:18:46 -0600
Message-ID: <CALKgCA3WwYX+chnib1ZWS9WGN=nfebGatN1ujn+_BpWL=5pTDA@mail.gmail.com>
To: Renoir Boulanger <renoir@w3.org>
Cc: List WebPlatform public <public-webplatform@w3.org>
Also notice the ganglia graph for db1:


The issues started with the database issues we had earlier and stopped when
I fixed the SMW data.

On Wed, Nov 6, 2013 at 1:13 AM, Ryan Lane <rlane32@gmail.com> wrote:

> On Wed, Nov 6, 2013 at 12:30 AM, Renoir Boulanger <renoir@w3.org> wrote:
>> Hi all,
>> Tonight, between 23:00 to 24:30 EST, Ryan and I had the chance to work on
>> the job runner situation [0].
>> The problem was two-fold. Part of the problem was due to a high level of
>> database writes that was filling the database server storage partitions.
>> The main reason of the problem was because of an ever increasing quantity
>> of jobs being added to MediaWiki's job runner system. Details of the issue
>> are described in [0].
>> Symptoms of the problem were described in those e-mails: [1][2][3]
>> Ryan took the decision to truncate SemanticMediaWiki data and delete all
>> SemanticMediaWiki related jobs.
>> We hope it solves the situation.
> The problem is indeed solved. I believe the SMW data was corrupted,
> causing it to continuously add jobs into the queue to fix its data. I
> truncated the job table, then refreshed the SMW data using a maintenance
> script. The current job queue is at 0.
> - Ryan
Received on Wednesday, 6 November 2013 07:19:33 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:13:55 UTC