Multiple time series data sources per object on LMU?

I have two sets of time series data in nanosecond timestamps from two sensors on the same object. The data sources show nanosecond time.
Question #1: If I combine these data sources into one file so the LMU “knows” it’s from the same object, will the end result be any different than if I trained them separately. I have 800 or so objects, and to combine the data sources into one dataset on each object takes an extreme amount of time due to it being nanosecond data and matching up timestamps and filling in blank areas with previous data so there are no blank cells, over 30 hours each, so before I did that I wanted to ask if the LMU looks at things in this way as a single time-series is one object.
Question#2: If it does matter that these two data sources per object are combined (which I am under the assumption it does), do I need to fill in blank cells for non-matching timestamps (I am under the assumption again I do need to do this). The only reason there would be a blank cell, is if the value has not changed, it doesn’t mean there is no value, but the datasets only show changes, and since the nanosecond data doesn’t always match up exactly, there will be blank cells unless I programatically fill the value in for that time when combining the datasets.

Thanks in advance!

1 Like

Hi @fladventurerob,

I’m not an expert on using the LMU on custom datasets, but I’ll do my best to provide some insights.

I can’t say with absolute certainty, but if the data from your sensors is not independent of each other, then I think that combining the sensor sources together would improve the performance of the LMU network. The LMU network works with both scalar (single dimensional) and vector inputs. If you combine your sensor data together, it essentially forms a vector input, so what the LMU will be training on is the combined vector input.

If you have non-matching timestamps, what you’ll probably want to do is to discretize the data somehow. If you are combining the sensor data together, you’ll want to create a new dataset with a constant timestep, and fill in that dataset with data from your sensors (based on their timestamps). For timesteps where a sensor has no data, just keep the value for that sensor constant. The LMU network should not have an issue working with that kind of data.

First, I want to be clear about the distinction between the LMU as a mathematical construct, and the LMU as implemented in e.g. KerasLMU. The mathematical LMU does not specify what format the input data needs to be in; that is an implementation detail, and the mathematics can handle any input format (with the proper implementation). Our LMU implementation in KerasLMU is designed for densely-sampled (i.e. non-sparse) time series inputs, where you have a sample at each timestep.

In your data, it sounds like you have something more sparsely sampled, where you have information for some timesteps but not for others. Furthermore, for the timesteps where you do not have values, you want to assume that the signal value is the same as was last reported.

In this case, you’ll need a custom LMU implementation that supports this. A basic LMU implementation as a dynamical system uses A and B matrices to update the LMU state x based on the previous state and current input. One way to deal with missing inputs is have these values as np.nan in your timeseries, and then use the last valid value whenever you encounter an input that is np.nan:

x = np.zeros(A.shape[0])  # LMU state
lastinput = 0

for i in range(timesteps):
    if np.isnan(inputs[i]):
        u = lastinput
        u = inputs[i]
        lastinput = inputs[i]

    x =, x) +, u)

Of course, there are other ways to implement this (e.g. using timesteps) that might work better for your data (if it’s a hassle to put it all into one array).

To go back to your first question, it really depends on how the learning portion of your algorithm is set up. Typically, we’ll have what we call multi-dimensional LMUs, where you have a separate LMU dynamical system encoding each dimension of the input (this is the memory_d parameter in KerasLMU). Remember that the LMU dynamical system itself (i.e. the A and B matrices) are not trained; they remain fixed for all of training. The part that is trained is the layer on top of this dynamical system, whether that’s a dense layer, a SimpleRNNCell, or something else. @xchoo is correct that it is more powerful if these learning layers are able to combine information across all the dimensions of a multi-dimensional LMU, rather than operating on each dimension separately.

That said, you don’t necessarily need to combine all your inputs into one array to accomplish this (though this is the approach that we’ve always taken because it’s the most straightforward for basic use cases; your use case is not basic). For example, you could have two separate LMU dynamical systems operating on your two different sensors independently (even in a multidimensional LMU, the dynamical systems themselves are still independent), but then have a learning layer (e.g. a Dense layer) that pools across those dimensions, so that the learning is still able to make use of both sensors together.



I am not an expert either.

  1. We are using the most recent data available when there is missing data for a certain timestep. But we do this right after we download the data form an online source and right before feeding it into the LMU.

  2. Your second point is also great Eric. Currently, we are feeding into the LMU all those timeseries processed as pointed out in 1. and afterwards combining them using another layer. The final result of the whole NN has to be two classes: 0 or 1, so there is a sigmoid in there at the end. More specifically, we are doing the following:

class Model(nn.Module):
    def __init__(self,input_size,output_size, hidden_size, memory_size, seq_len, theta):
        self.lmu_fft=LMUFFT(input_size, hidden_size, memory_size, seq_len, theta)
        self.linear = nn.Linear(hidden_size, output_size)
    def forward(self, x):
        _, h_n = self.lmu_fft(x) # [batch_size, hidden_size]
        h_n = self.dropout(h_n)
        output = self.linear(h_n)
        output = self.sigmoid(output)
        output = output.view(1)
        return output # [batch_size, output_size]
  1. Since I am not an expert in this, I might be misunderstanding. But your other great idea is to use one LMU for each time series and afterwards combine the results? I am not sure if I understand this correctly though, since I am not an expert. Would the results be much different? Would it be slower or faster?

Thank you for your time and insight!

@Eric Thank you very much for this information. Its extremely helpful.

In regards to your comment, “designed for densely-sampled (i.e. non-sparse) time series inputs, where you have a sample at each timestep.” I have three related follow-up questions:

  1. Does this function similar to an LSTM in the regard where it expects evenly spaced timesteps?
  2. When data and timesteps do not occur evenly (for each nanosecond, or each second, etc), but rather, during events of interest, the frequency of the signal from a sensor increases significatly, (i.e., occurs in higher rapidity), does the LMU “know” that this is actually occuring in greater rapidity and time between signals is shorter due to the nanosecond timestamps now appearing much closer together in “time” numerically?
  3. Coat-tailing from question #2, would a seperate variable fed into the LMU that calculated the time beween each signal help or even be needed. This variable would be numerically much larger during a non-event, and much smaller during events of interest.

Thank you again for your help, and please let me know if you need any clarification on any of the points presented.

Mathematically, the LMU is an LTI dynamical system defined in the continuous-time domain. (Note that originally, the term Legendre Delay Network (LDN) was used to distinguish between the LTI system, and the term LMU was reserved for the neural network implementation that also includes an encoding layer, optional hidden layer, etc. I will not bother with that distinction here.)

To run an LMU on any sort of digital hardware, it’s required to take those continuous ODEs and solve them numerically/discretely. There are many approaches to solving ODEs numerically; some allow for variable timesteps. The approach that we take in the keras-lmu package is one that assumes fixed-length (a.k.a. evenly spaced) timesteps. Further to that point, the keras-lmu implementation takes in no timestamps, so the LMU assumes that all samples are one unit of time apart. Note that keras-lmu has no option to set the length of one timestep, i.e. no dt parameter that you might see on e.g. scipy.signal.cont2discrete. This is because we assume a timestep of 1, and require the user to rescale any parameters accordingly. There is only one LMU parameter that has units of time, which is theta. So for example, if you want an LMU with theta = 0.5 seconds and dt = 0.01 seconds, then you’d set the theta parameter in keras-lmu to 0.5 / 0.01 = 50.

It would certainly be possible to do an LMU implementation that supports variable timesteps, since it is possible to solve ODEs with variable timesteps. However, this is not something that we’ve ever looked into.

The easiest way to get rolling is probably to fill in missing values yourself and create an input that has values for every timestep. For example, if you want to “hold” the previous value when no new information is available, you could preprocess like this:

# your raw data, with NaNs at any timesteps you don't have values for
x = np.array([
    [0.1, -0.1],
    [0.2, np.nan],
    [np.nan, -0.3],
    [np.nan, -0.5],
    [0.4, np.nan],
    [np.nan, np.nan]
for i in range(len(x)):
    m = np.isnan(x[i])
    if m.any():
        x[i][m] = 0 if i == 0 else x[i - 1][m]
1 Like