2

I am writing a custom framework and in it I'm trying to train a simple network to predict the addition function.

**The network:**

- 1 hidden layer of 3 Neurons
- 1 output layer
- cost function used is Squared error, (Not MSE to avoid precision problems)
- Identity transfer function to make things simple at first
- no specal updaters, just the step size
- no learning rate decay
- no regularization

**The training set:**

- ~500 samples
- inputs:
`[n1][n2]`

; labels:`[n1 + n2]`

- Every element is between 0 and 1. e.g.:
`[0.5][0.3] => [0.8]`

**The algorithm I'm using to optimize:**

- samples 64 elements for an epoch
- for each sample: it evaluates the error
- then propagates the error back
- and then based on the error values calculates the gradients
- the gradients for each elements are added up into one vector, then normalized by dividing by the number of samples evaluated

- After the gradients are calculated a step size of 1e-2 is used to modify the weights.
- The training stops when the sum of the errors for the 500 data elements are below 1e-2

I don't have a test dataset yet, as first I'd like to overfit to a training set, to see if it could even do that. Withouot bias the training converges to an optimum in about ~4k epochs.

When I include the tuning of bias into the training, it seems to have a much worse performance, the network is not converging to the optimum, instead the biases and the weights oscillate next to one another..

Is this a normal effect of introducing a bias?

Here is a chart abuot the weight values throughout the training:

Thank you for your answer! How would you say I can improve upon this? – David Tóth – 2020-03-10T12:18:24.867

Depends on your goal. You wanna generalise? introduce bias. You wanna overfit train- dont. In general overfitting train tells you nothing. We already know that NN can approximate arbitrary functions... – Noah Weber – 2020-03-10T12:32:21.210

So a good next step would be to introduce an evaluation training set, then. But what I don't understand is that how could adding a different evaluation metric affect the training itself, i.e. : how should the gradients be affected? To the best of my knowledge introducing a validation set does not change the input for the training set.. – David Tóth – 2020-03-10T12:39:50.143

Maybe I'm not understanding "bias" correctly. It is a learnable parameter, so I update that along with the parameters inside the training set. What I mean under bias is a value which is stored inside the Neuron, which offsets its output activation. It is something to be learned just like the parameters are, so why does calculating the gradient for it, and updating it just like the parameters worsen the performance? – David Tóth – 2020-03-10T14:51:19.083