First, I want to be clear about the distinction between the LMU as a mathematical construct, and the LMU as implemented in e.g. KerasLMU. The mathematical LMU does not specify what format the input data needs to be in; that is an implementation detail, and the mathematics can handle any input format (with the proper implementation). Our LMU implementation in KerasLMU is designed for densely-sampled (i.e. non-sparse) time series inputs, where you have a sample at each timestep.
In your data, it sounds like you have something more sparsely sampled, where you have information for some timesteps but not for others. Furthermore, for the timesteps where you do not have values, you want to assume that the signal value is the same as was last reported.
In this case, you’ll need a custom LMU implementation that supports this. A basic LMU implementation as a dynamical system uses A
and B
matrices to update the LMU state x
based on the previous state and current input. One way to deal with missing inputs is have these values as np.nan
in your timeseries, and then use the last valid value whenever you encounter an input that is np.nan
:
x = np.zeros(A.shape[0]) # LMU state
lastinput = 0
for i in range(timesteps):
if np.isnan(inputs[i]):
u = lastinput
else:
u = inputs[i]
lastinput = inputs[i]
x = np.dot(A, x) + np.dot(B, u)
Of course, there are other ways to implement this (e.g. using timesteps) that might work better for your data (if it’s a hassle to put it all into one array).
To go back to your first question, it really depends on how the learning portion of your algorithm is set up. Typically, we’ll have what we call multi-dimensional LMUs, where you have a separate LMU dynamical system encoding each dimension of the input (this is the memory_d
parameter in KerasLMU). Remember that the LMU dynamical system itself (i.e. the A
and B
matrices) are not trained; they remain fixed for all of training. The part that is trained is the layer on top of this dynamical system, whether that’s a dense layer, a SimpleRNNCell, or something else. @xchoo is correct that it is more powerful if these learning layers are able to combine information across all the dimensions of a multi-dimensional LMU, rather than operating on each dimension separately.
That said, you don’t necessarily need to combine all your inputs into one array to accomplish this (though this is the approach that we’ve always taken because it’s the most straightforward for basic use cases; your use case is not basic). For example, you could have two separate LMU dynamical systems operating on your two different sensors independently (even in a multidimensional LMU, the dynamical systems themselves are still independent), but then have a learning layer (e.g. a Dense layer) that pools across those dimensions, so that the learning is still able to make use of both sensors together.