I didn’t have time to do a deep dive into your notebook, but from my quick look, there are definitely a few things that may be causing the issue.
Regarding the data labels, I see that you simply labelled the first half of the data as 1, and the second half of the data as 0. You’ll definitely want to check that the data itself is identically organized. If the (actual) labels for the audio spectrograms do not match the labels you have assigned it, your network will be unable to learn the association between the audio data and the label, which will result in an accuracy of about 50%. As a note, since your network has only 2 classes (label 0 or label 1), an accuracy of about 50% means that your network is essentially making a guess.
To check what your network is doing, you’ll also want to probe the output of a test run of your network. If your network is always guessing the same class (e.g., always outputting a 0), then for your specific problem, the test accuracy will be about 50% (because the network will get all of the 0’s correct, and all of the 1’s wrong)
As for the network itself, I would recommend that you construct, train and test your network solely in TensorFlow first (to make sure it runs to your desired performance in TensorFlow) before adding in the NengoDL parts to do the spiking network conversion.
It looks like your images have colour to them. I haven’t worked much with colour images, so I can’t give many recommendations here. You’ll definitely want to make sure that the convolutional layers are set up properly to work with the 3 channels of the colour image. Also, it may help to simply things by converting the coloured image to a grayscale image, although, I’m not sure how important the colour components are to the spectrogram. As far as I understand though, in a spectrogram, the colour just determines the value of that frequency component, so you should try to pre-process the data to train on that data instead of the colour values.
For the output of your network, typically, the last layer of the network (from the Nengo example, this is the
dense layer) maps the activities of the previous layers on to the output classes. For the Nengo example, the MNIST dataset has 10 output classes (one for each digit from 0 to 9), so the output layer has 10 neurons / units. The activity of each neuron in the output layer determines how much the network “thinks” the input represents that specific output class. If we apply the same logic to your problem, the number of neurons in the
dense layer should be 2. It seems like you are using a layer with a sigmoid function (in conjunction with the dense layer) to do this, but I’m not sure if this is achieving the desired output you want. It may be that removing the dense layer and connecting the
dropout directly with the
op layer will resolve your training issue.