I have read the paper titled “Hardware Aware Training for Efficient Keyword Spotting on
General Purpose and Specialized Hardware” . In this paper, the LMU was modified as below:
"we have removed the connection from the nonlinear to the linear layer, the connection from the linear layer to the intermediate input 𝑢𝑡, and the recurrent connection from the nonlinear layer to itself. As well, we have included multiple linear memory layers in the architecture; the outputs of each of these layers are concatenated before being mapped to the nonlinear layer via the matrix 𝑊𝑚.
I read the some origin codes of LMU that as below:
h = states[:-1] # state for the LMU memory m = states[-1] # compute memory input u_in = tf.concat((inputs, h), axis=1) if self.hidden_to_memory else inputs if self.dropout > 0: u_in *= self.get_dropout_mask_for_cell(u_in, training) u = tf.matmul(u_in, self.kernel) if self.memory_to_memory: if self.recurrent_dropout > 0: # note: we don't apply dropout to the memory input, only # the recurrent kernel rec_m = m * self.get_recurrent_dropout_mask_for_cell(m, training) else: rec_m = m u += tf.matmul(rec_m, self.recurrent_kernel) # separate memory/order dimensions m = tf.reshape(m, (-1, self.memory_d, self.order)) u = tf.expand_dims(u, -1) # update memory m = tf.matmul(m, self.A) + tf.matmul(u, self.B) # re-combine memory/order dimensions m = tf.reshape(m, (-1, self.memory_d * self.order)) # apply hidden cell h_in = tf.concat((m, inputs), axis=1) if self.input_to_hidden else m if self.hidden_cell is None: o = h_in h =  elif hasattr(self.hidden_cell, "state_size"): o, h = self.hidden_cell(h_in, h, training=training) #print('removed h to h') else: o = self.hidden_cell(h_in, training=training) h = [o] return o, h + [m]
Could anyone who can show me how to modify the codes above to achieve the required LMU architecture?
Thanks a lot.