CNN from scratch – Backpropagation not working

I’m trying to write a CNN in Python using only basic math operations (sums, convolutions, …).
The problem is that it doesn’t do backpropagation well (the error keeps fluctuating in a small interval with an error rate of roughly 90%).

The dataset is the MNIST dataset, picked from https://www.kaggle.com/c/digit-recognizer. The method to build the model is SGD (batch_size=1).

Each conv layer has a particular class representing it, with its backward and forward methods.

The structure is:

self.layers['H1'] = H1Conv() # Conv.layer, 4 5x5 filters, stride=1
self.layers['A2'] = TanhAct() # Tanh applied element-wise
self.layers['H2'] = MaxPooling((1,2,2)) # 2x2 max pooling, stride=2
self.layers['H3'] = H3Conv() # Conv. layer, 12 4x5x5 filters, stride=1
self.layers['A3'] = TanhAct() # Tanh applied element-wise
self.layers['H4'] = MaxPooling((1,2,2)) # 2x2 max pooling, stride=2
self.layers['H5'] = H5Conv() # Conv. layer, 10 12x4x4 filters, stride=1 (threated as a fully-connected layer)
self.layers['A5'] = SoftmaxAct() # Softmax function

Since I’ve used the cross entropy loss, the first derivative of loss(softmax(..)) is

dout = Ypred - Y

where Y is the correct label and Ypred the result of the forward pass throught the network.
Then, each layer backpropagate the derivative of the previous layer backward:

for key in reversed(self.layers):
            dout = self.layers[key].backward(dout)

I think I’ve made an error while writing the backpropagation for the convolutional layers. The code is:

class H5Conv:

def forward(self, inputs):
    self.cache = inputs
    ...

def backward(self, dout):
    self.biases = self.biases - (LEARNING_RATE * dout)

    # dW
    for i in range(self.kernels.shape[0]):
        dW = np.zeros_like(self.kernels[i])
        for z in range(0, dW.shape[0]):
            for y in range(0, dW.shape[1]):
                for x in range(0, dW.shape[2]):
                    dW[z, y, x] = dout[i] * self.cache[z, y, x]
        self.kernels[i] = self.kernels[i] - (dW * LEARNING_RATE)

    # dX
    dX = np.zeros_like(self.cache)
    for z in range(0, dW.shape[0]):
            for y in range(0, dW.shape[1]):
                for x in range(0, dW.shape[2]):
                    dX[z, y, x] = np.sum(np.multiply(self.kernels[:, z, y, x], dout))              
    return dX

class H3Conv:

def forward(self, inputs):
    self.cache = inputs
    ...

def backward(self, dout):
    # db
    for i in range(0, self.biases.shape[0]):
        db = np.sum(np.sum(dout[i], axis=0))
        self.biases[i] = self.biases[i] - (LEARNING_RATE * db)

    # dW
    dW = np.zeros_like(self.kernels)
    for kernel_index in range(0, self.kernels.shape[0]):
        single_dout = dout[kernel_index]
        for depth in range(0, 4):
            dW[kernel_index, depth] = scipy.signal.correlate(self.cache[depth], single_dout, mode='valid')
    self.kernels = self.kernels - (LEARNING_RATE * dW)

    # dX
    dX = np.zeros_like(self.cache)
    for depth in range(0, dout.shape[0]):
        single_dout = np.pad(dout[depth], ((4,4), (4,4)), 'constant')
        for i in range(0,4):
             dX[i] = dX[i] + scipy.signal.correlate(single_dout, self.kernels[depth, i], mode='valid')
    return dX

class H1Conv:

def forward(self, inputs):
    self.cache = inputs
    ...

def backward(self, dout):
    # db
    for i in range(0, self.biases.shape[0]):
        db = np.sum(np.sum(dout[i], axis=0))
        self.biases[i] = self.biases[i] - (LEARNING_RATE * db)

    # dW
    dW = np.zeros_like(self.kernels)
    for kernel_index in range(0, self.kernels.shape[0]):
        single_dout = dout[kernel_index]
        dW[kernel_index] = scipy.signal.correlate(self.cache, single_dout, mode='valid')
    self.kernels = self.kernels - (LEARNING_RATE * dW)

Leave a Reply

Your email address will not be published. Required fields are marked *