I am trying to import a preprocessed dataset into a neural network for simulation. But when I convert the dataset to Tensor, I can’t proceed further:
Can someone help me with this problem?
I am trying to import a preprocessed dataset into a neural network for simulation. But when I convert the dataset to Tensor, I can’t proceed further:
Can someone help me with this problem?
Hi @Iry, and welcome back to the Nengo forums.
The code you have provided is not really sufficient to say for sure what’s causing the issue, but from the error message, it seems like you need to do additional processing on the data
object to extract data from it. The error indicates that the data
object is of type bytes
, and that specific class does not have a TensorDataset
attribute. Without knowing how the data
object has been created, all I can surmise is that there is some step missing to convert the data
object into a class that PyTorch can use to extract the TensorDataset
from.
Hi, I am very happy with your reply. I think I have solved this problem, but there is a new problem, the ndarray size of my preprocessed SHD training set is (8156,25,700), which are the number of training samples, the time dimension of audio channels and Number of channels in the cochlear implant model. During training, the input dimension reaches 25*700, which exceeds the processing power of RAM. My progress in the build process has been zero, and I don’t know if this is due to insufficient computing power or a programming problem.
I’ve tried to solve this problem, but none of it works.
Below is my code.
cache_dir=os.path.expanduser("~/data")
cache_subdir="hdspikes"
print("Using cache dir: %s"%cache_dir)
# The remote directory with the data files
base_url = "https://zenkelab.org/datasets"
# Retrieve MD5 hashes from remote
response = urllib.request.urlopen("%s/md5sums.txt"%base_url)
data = response.read()
lines = data.decode('utf-8').split("\n")
file_hashes = { line.split()[1]:line.split()[0] for line in lines if len(line.split())==2 }
def get_and_gunzip(origin, filename, md5hash=None):
gz_file_path = get_file(filename, origin, md5_hash=md5hash, cache_dir=cache_dir, cache_subdir=cache_subdir)
hdf5_file_path=gz_file_path[:-3]
if not os.path.isfile(hdf5_file_path) or os.path.getctime(gz_file_path) > os.path.getctime(hdf5_file_path):
print("Decompressing %s"%gz_file_path)
with gzip.open(gz_file_path, 'r') as f_in, open(hdf5_file_path, 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
return hdf5_file_path
# Download the Spiking Heidelberg Digits (SHD) dataset
files1 = ["shd_train.h5.gz"]
files2 = ["shd_test.h5.gz"]
for fn in files1:
origin = "%s/%s"%(base_url,fn)
hdf5_file_path1 = get_and_gunzip(origin, fn, md5hash=file_hashes[fn])
print(hdf5_file_path1)
for fn in files2:
origin = "%s/%s"%(base_url,fn)
hdf5_file_path2 = get_and_gunzip(origin, fn, md5hash=file_hashes[fn])
print(hdf5_file_path2)
import tables
import numpy as np
fileh_train = tables.open_file(hdf5_file_path1, mode='r')
units_train = fileh_train.root.spikes.units
times_train = fileh_train.root.spikes.times
labels_train = fileh_train.root.labels
fileh_test = tables.open_file(hdf5_file_path2, mode='r')
units_test = fileh_test.root.spikes.units
times_test = fileh_test.root.spikes.times
labels_test = fileh_test.root.labels
def binary_image_readout(times,units,dt):
img = []
N = int(1/dt)
for i in range(N):
idxs = np.argwhere(times<=i*dt).flatten()
vals = units[idxs]
vals = vals[vals > 0]
vector = np.zeros(700)
vector[700-vals] = 1
times = np.delete(times,idxs)
units = np.delete(units,idxs)
img.append(vector)
return np.array(img)
def generate_dataset(file_name,dt):
fileh = tables.open_file(file_name, mode='r')
units = fileh.root.spikes.units
times = fileh.root.spikes.times
labels = fileh.root.labels
# This is how we access spikes and labels
index = 0
print("Number of samples: ",len(times))
X = []
y = []
for i in range(len(times)):
tmp = binary_image_readout(times[i], units[i],dt=dt)
X.append(tmp)
y.append(labels[i])
return np.array(X),np.array(y)
train_X,train_y = generate_dataset(hdf5_file_path1,dt=4e-2)
from matplotlib.transforms import Transform
from numpy.ma.core import reshape
from matplotlib import transforms
from matplotlib import transforms
from nengo import neurons
presentation_time=0.2
with nengo.Network(label="PES learning") as model:
# Randomly varying input signal
# # change to dataset;
stim = nengo.Node(
nengo.processes.PresentInput(train_X, presentation_time),size_out=25*700 )
# Connect pre to the input signal
pre = nengo.Ensemble(700, dimensions=17500)
nengo.Connection(stim, pre)
post = nengo.Ensemble(20, dimensions=1)
# Connecting pre to post,
conn = nengo.Connection(
pre,
post,
function=lambda x: [0],
learning_rule_type=nengo.PES(learning_rate=2e-2),
)
nengo.Connection(error, conn.learning_rule)
stim_p = nengo.Probe(stim)
pre_p = nengo.Probe(pre, synapse=0.01)
post_p = nengo.Probe(post, synapse=0.01)
sim = nengo.Simulator(model, dt=dt)
sim.run(1)
t = sim.trange()
You seem to be working on the same problem as this post, so I’ll forward you to my response there. Unfortunately, given my limited experience with neural networks that do audio processing, my help will be limited here.