The dataset cannot be imported into the nongo Loihi network

lry · June 26, 2022, 5:23pm

I am trying to import a preprocessed dataset into a neural network for simulation. But when I convert the dataset to Tensor, I can’t proceed further:

Can someone help me with this problem?

xchoo · June 29, 2022, 2:46am

Hi @Iry, and welcome back to the Nengo forums.

The code you have provided is not really sufficient to say for sure what’s causing the issue, but from the error message, it seems like you need to do additional processing on the data object to extract data from it. The error indicates that the data object is of type bytes, and that specific class does not have a TensorDataset attribute. Without knowing how the data object has been created, all I can surmise is that there is some step missing to convert the data object into a class that PyTorch can use to extract the TensorDataset from.

lry · June 29, 2022, 10:15am

Hi, I am very happy with your reply. I think I have solved this problem, but there is a new problem, the ndarray size of my preprocessed SHD training set is (8156,25,700), which are the number of training samples, the time dimension of audio channels and Number of channels in the cochlear implant model. During training, the input dimension reaches 25*700, which exceeds the processing power of RAM. My progress in the build process has been zero, and I don’t know if this is due to insufficient computing power or a programming problem.

I’ve tried to solve this problem, but none of it works.

Below is my code.

cache_dir=os.path.expanduser("~/data")
cache_subdir="hdspikes"
print("Using cache dir: %s"%cache_dir)
 
# The remote directory with the data files
base_url = "https://zenkelab.org/datasets"
 
# Retrieve MD5 hashes from remote
response = urllib.request.urlopen("%s/md5sums.txt"%base_url)
data = response.read() 
lines = data.decode('utf-8').split("\n")
file_hashes = { line.split()[1]:line.split()[0] for line in lines if len(line.split())==2 }
 
def get_and_gunzip(origin, filename, md5hash=None):
    gz_file_path = get_file(filename, origin, md5_hash=md5hash, cache_dir=cache_dir, cache_subdir=cache_subdir)
    hdf5_file_path=gz_file_path[:-3]
    if not os.path.isfile(hdf5_file_path) or os.path.getctime(gz_file_path) > os.path.getctime(hdf5_file_path):
        print("Decompressing %s"%gz_file_path)
        with gzip.open(gz_file_path, 'r') as f_in, open(hdf5_file_path, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)
    return hdf5_file_path

# Download the Spiking Heidelberg Digits (SHD) dataset
files1 = ["shd_train.h5.gz"]
files2 = ["shd_test.h5.gz"]
 
for fn in files1:
    origin = "%s/%s"%(base_url,fn)
    hdf5_file_path1 = get_and_gunzip(origin, fn, md5hash=file_hashes[fn])
    print(hdf5_file_path1)

for fn in files2:
    origin = "%s/%s"%(base_url,fn)
    hdf5_file_path2 = get_and_gunzip(origin, fn, md5hash=file_hashes[fn])
    print(hdf5_file_path2)

import tables
import numpy as np
 
fileh_train = tables.open_file(hdf5_file_path1, mode='r')
units_train = fileh_train.root.spikes.units
times_train = fileh_train.root.spikes.times
labels_train = fileh_train.root.labels
 
fileh_test = tables.open_file(hdf5_file_path2, mode='r')
units_test = fileh_test.root.spikes.units
times_test = fileh_test.root.spikes.times
labels_test = fileh_test.root.labels

def binary_image_readout(times,units,dt):
    img = []
    N = int(1/dt)
    for i in range(N):
        idxs = np.argwhere(times<=i*dt).flatten()
        vals = units[idxs]
        vals = vals[vals > 0]
        vector = np.zeros(700)
        vector[700-vals] = 1
        times = np.delete(times,idxs)
        units = np.delete(units,idxs)
        img.append(vector)
    return np.array(img)

def generate_dataset(file_name,dt):
    fileh = tables.open_file(file_name, mode='r')
    units = fileh.root.spikes.units
    times = fileh.root.spikes.times
    labels = fileh.root.labels

    # This is how we access spikes and labels
    index = 0
    print("Number of samples: ",len(times))
    X = []
    y = []
    for i in range(len(times)):
        tmp = binary_image_readout(times[i], units[i],dt=dt)
        X.append(tmp)
        y.append(labels[i])
    return np.array(X),np.array(y)

train_X,train_y = generate_dataset(hdf5_file_path1,dt=4e-2)

from matplotlib.transforms import Transform
from numpy.ma.core import reshape
from matplotlib import transforms
from matplotlib import transforms
from nengo import neurons

presentation_time=0.2

with nengo.Network(label="PES learning") as model:
    # Randomly varying input signal
    # # change to dataset;
    stim = nengo.Node(
        nengo.processes.PresentInput(train_X, presentation_time),size_out=25*700 )
    
    # Connect pre to the input signal
    pre = nengo.Ensemble(700, dimensions=17500)
    nengo.Connection(stim, pre)
    post = nengo.Ensemble(20, dimensions=1)
    # Connecting pre to post,
  
    conn = nengo.Connection(
        pre,
        post,
        function=lambda x: [0],
        learning_rule_type=nengo.PES(learning_rate=2e-2),
    )
nengo.Connection(error, conn.learning_rule)

    stim_p = nengo.Probe(stim)
    pre_p = nengo.Probe(pre, synapse=0.01)
    post_p = nengo.Probe(post, synapse=0.01)

sim = nengo.Simulator(model, dt=dt)
sim.run(1)
t = sim.trange()

xchoo · June 29, 2022, 6:59pm

You seem to be working on the same problem as this post, so I’ll forward you to my response there. Unfortunately, given my limited experience with neural networks that do audio processing, my help will be limited here.