# How to build a multiple linear memory layer LMU?

Dear all,

I have read the paper titled “Hardware Aware Training for Efficient Keyword Spotting on
General Purpose and Specialized Hardware” . In this paper, the LMU was modified as below:

"we have removed the connection from the nonlinear to the linear layer, the connection from the linear layer to the intermediate input 𝑢𝑡, and the recurrent connection from the nonlinear layer to itself. As well, we have included multiple linear memory layers in the architecture; the outputs of each of these layers are concatenated before being mapped to the nonlinear layer via the matrix 𝑊𝑚.
"
I read the some origin codes of LMU that as below:

# state for the hidden cell

    h = states[:-1]
# state for the LMU memory
m = states[-1]

# compute memory input
u_in = tf.concat((inputs, h[0]), axis=1) if self.hidden_to_memory else inputs
if self.dropout > 0:
u = tf.matmul(u_in, self.kernel)

if self.memory_to_memory:
if self.recurrent_dropout > 0:
# note: we don't apply dropout to the memory input, only
# the recurrent kernel
rec_m = m * self.get_recurrent_dropout_mask_for_cell(m, training)
else:
rec_m = m

u += tf.matmul(rec_m, self.recurrent_kernel)
# separate memory/order dimensions
m = tf.reshape(m, (-1, self.memory_d, self.order))
u = tf.expand_dims(u, -1)

# update memory
m = tf.matmul(m, self.A) + tf.matmul(u, self.B)

# re-combine memory/order dimensions
m = tf.reshape(m, (-1, self.memory_d * self.order))

# apply hidden cell
h_in = tf.concat((m, inputs), axis=1) if self.input_to_hidden else m

if self.hidden_cell is None:
o = h_in
h = []
elif hasattr(self.hidden_cell, "state_size"):
o, h = self.hidden_cell(h_in, h, training=training)
#print('removed h to h')
else:
o = self.hidden_cell(h_in, training=training)
h = [o]

return o, h + [m]


Could anyone who can show me how to modify the codes above to achieve the required LMU architecture?

Thanks a lot.

Best,
Ryan

Hi @Ryan. Fortunately no modifications are needed. The memory_d parameter is the number of memory layers. You can set this to be any positive integer when you create any LMU cell or layer. The additional connections are toggled via hidden_to_memory, memory_to_memory, and input_to_hidden, which all default to False.

1 Like

Dear Arvoelke,

Thanks a lot for your help! I will have a try with your suggestions. However, I have some questions:

1. if I set the hidden to memory false. The corresponding codes below will do this:
        # compute memory input
u_in = tf.concat((inputs, h[0]), axis=1) if self.hidden_to_memory else inputs


This indicates:
u_in = inputs

1. if I set the input to hidden false. The corresponding codes below will do this:
        # apply hidden cell
h_in = tf.concat((m, inputs), axis=-1) if self.input_to_hidden else m


This indicates:
h_in = m

Based on above, the LMU cell architecture seems doesn’t fit the figure in the paper as below:

Could you help me to clarify this? Thanks a lot.

Best,
Ryan

You’ll want hidden_to_memory = True in order to get the h_{t-1} -> u_t connection that you see in the architecture diagram. You’ll also want input_to_hidden = True in order to get the x_t -> Nonlinear connection in the architecture.

Also note that the LMU allows passing in an arbitrary hidden_cell, and so you can make this a dense layer or similar to avoid the connection from hidden to hidden.

I took another look at the paper and this part of the description:

we have removed the connection from the nonlinear to the linear layer

is actually incorrect, as we have hidden_to_memory = True. The equations from the paper (below) and the architecture diagram are correct. Thanks for bringing this to our attention.

1 Like

Hi Arvoelke,

Thanks a lot for your help. I will have a try!

PS: I still has a small question:

" Also note that the LMU allows passing in an arbitrary hidden_cell , and so you can make this a dense layer or similar to avoid the connection from hidden to hidden."
The hidden_cell is the nonlinear f part that produces the h_t in the equation above, labelled “Nonlinear” in the architecture diagram.
If you use a Dense layer with a bias and activation function $f$ for the hidden_cell, and input_to_hidden = True, then you will get exactly $\mathbf{h}_t = f(\mathbf{W}_x \mathbf{x}_t + \mathbf{W}_m \mathbf{m}_t + \mathbf{b})$.
If you want $h_{t-1}$ to be an input to $f$ as well then that calls for a different hidden_cell, such as an RNN cell.