So I am working with Nengo_DL on an Image Classification task. I have a DataGenerator which yields batches of data. I saw an online blog post in which two probes are being connected to an output layer with the following explanation:
# we'll create two different output probes, one with a filter
# (for when we're simulating the network over time and
# accumulating spikes), and one without (for when we're
# training the network using a rate-based approximation)
# probes essentially collect data over time
out_p = nengo.Probe(out, label="out_p")
out_p_filt = nengo.Probe(out, synapse=0.1, label="out_p_filt")
now, this person was using two matrices for his data and labels respectively, so he could call the evaluate function like this
I haven’t used data generators in TF myself, but it’s all Python code, so I should be able to help you out. From the examples of TF data generators, it looks like (depending how they have been coded) possible to manually extract the data and labels from the generator (through the __getitem__ function possibly)
If you could post the code of your data generator class (or something similar), I can propose a potential solution for you.
from keras.utils import Sequence
from keras.utils import np_utils
class DataGenerator(Sequence):
"""Data Generator inherited from keras.utils.Sequence
Args:
directory: the path of data set, and each sub-folder will be assigned to one class
batch_size: the number of data points in each batch
shuffle: whether to shuffle the data per epoch
Note:
If you want to load file with other data format, please fix the method of "load_data" as you want
"""
def __init__(self, directory, batch_size=1, shuffle=True, n_steps=1):
# Initialize the params
self.batch_size = batch_size
self.directory = directory
self.shuffle = shuffle
self.n_steps = n_steps
# Load all the save_path of files, and create a dictionary that save the pair of "data:label"
self.X_path, self.Y_dict = self.search_data()
# Print basic statistics information
self.print_stats()
return None
def search_data(self):
X_path = []
Y_dict = {}
# list all kinds of sub-folders
self.dirs = sorted(os.listdir(self.directory))
one_hots = np_utils.to_categorical(range(len(self.dirs)))
for i,folder in enumerate(self.dirs):
folder_path = os.path.join(self.directory,folder)
for file in os.listdir(folder_path):
file_path = os.path.join(folder_path,file)
# append the each file path, and keep its label
X_path.append(file_path)
Y_dict[file_path] = one_hots[i]
return X_path, Y_dict
def print_stats(self):
# calculate basic information
self.n_files = len(self.X_path)
self.n_classes = len(self.dirs)
self.indexes = np.arange(len(self.X_path))
np.random.shuffle(self.indexes)
# Output states
print("Found {} files belonging to {} classes.".format(self.n_files,self.n_classes))
for i,label in enumerate(self.dirs):
print('%10s : '%(label),i)
return None
def __len__(self):
# calculate the iterations of each epoch
steps_per_epoch = np.ceil(len(self.X_path) / float(self.batch_size))
return int(steps_per_epoch)
def __getitem__(self, index):
"""Get the data of each batch
"""
# get the indexs of each batch
batch_indexs = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
# using batch_indexs to get path of current batch
batch_path = [self.X_path[k] for k in batch_indexs]
# get batch data
batch_x, batch_y = self.data_generation(batch_path)
return batch_x, batch_y
def on_epoch_end(self):
# shuffle the data at each end of epoch
if self.shuffle == True:
np.random.shuffle(self.indexes)
def data_generation(self, batch_path):
# load data into memory, you can change the np.load to any method you want
batch_x = [self.load_data(x) for x in batch_path]
batch_y = [self.Y_dict[x] for x in batch_path]
# transfer the data format and take one-hot coding for labels
batch_x = np.array(batch_x)
batch_y = np.array(batch_y)
# flatten the data
batch_x = batch_x.reshape(batch_x.shape[0], -1)
batch_y = batch_y.reshape(batch_y.shape[0], -1)
# tile the input data
batch_x = np.tile(batch_x[:, None, :], (1, self.n_steps, 1))
batch_y = np.tile(batch_y[:, None, :], (1, self.n_steps, 1))
# attatching n_steps to batch_x
batch_x = (batch_x, (np.ones((self.batch_size, 1)) * self.n_steps))
return batch_x, batch_y
def normalize(self, data):
mean = np.mean(data)
std = np.std(data)
return (data-mean) / std
def load_data(self, path):
# load the processed .npy files which have 5 channels (1-3 for RGB, 4-5 for optical flows)
data = np.load(path)
data = np.float32(data)
# normalize rgb images
data.setflags(write=1)
data[...,:] = self.normalize(data[...,:])
return data
The data_generation() function is what fetches the data. I am working with tuples of data. Should I switch to dictionaries?
Looking at the NengoDL code, the sim.evaluate, sim.predict (and others) are pretty much just wrappers for the underlying TensorFlow model calls. So, to use the data generator with the sim.evaluate function, you’ll have to do the same thing you would need to do to get it to work in TensorFlow.
From the TensorFlow documentation I could find (see the TensorFlow model class documentation), it looks like (for single output models) you should be able to use generators with the evaluate function, by simply providing it as the x input, and leaving the y input undefined.
For multi-output models, following this example, it looks like you’ll need to modify the generator to output a dictionary instead of just the desired label. I.e., in your __getitem__ function, instead of return batch_x, batch_y, you’ll want to do return batch_x, {out_p_filt: batch_y}.
Alternatively, you could also manually call the data generator to extract out the x and y data, and then assemble the test_images and test_labels arrays from that data.
Caveat: I haven’t tested this code… My suggestions are based purely on what I gathered from the documentation. Some tweaks might be needed to get it to work.