Issue with scaling

fladventurerob · July 11, 2022, 7:41am

I have never had the below issue before, so I am wondering if this is a specific issue to the LMU, and if so, what if any options are available.

I created a model (which took many months of trial and error to arrive at it’s current state) from timeseries data with 3 columns of data, with an RNN in the style of an LSTM (sliding window, etc.). The data was scaled using MinMax of the files when creating the model. Forward tests show great results.

When emulating live streaming data by processing historical data row-by-row, rather than prescaling the data in the forward test. Again, the results are positive and identical to the previous forward test. However, in real life, the true MinMax couldn’t be known in advance for one of the columns, so I have experimented over the course of about a week using averages of historical data for the MinMax, and numerous other variations to set a MinMax manually that would work. I also tried using MinMax with partial fit. So far, I have been unable to produce the excellent results when using the true MinMax, and in fact, the model is as of yet unusable.

My question is, is this something unique to the LMU, and if so or not, are there any ideas on how to resolve this scaling issue. I have tried recreating the model using StandardScaler or some other scaler without success which was expected considering the hyperparameters tuning, it would be like starting over from scratch.

I would be deeply grateful for any assistance that could be given, ideas, or just speculation on how I can resolve this issue. I am hoping I haven’t wasted months of time developing this fantastic model that ends up being unusable.

Thanks in advance!

xchoo · July 12, 2022, 1:26am

Hi @fladventurerob, and welcome back to the Nengo forums.

I don’t have much experience with training LMU (or LSTM) networks, but I did forward your question to the NengoDL devs. Here are some of their feedback:

This is not unique to the LMU, but applies to other RNNs (or ANNs) as well. If you train the network using a specific MinMax scaling value, you’ll need to apply this scaling value to the streaming data as well (regardless of what the actual MinMax of the streaming data is). If you modify the MinMax for the streaming data, you’ll need to re-train your network for that new MinMax value.

There are other approaches (i.e., approaches that don’t use an input scaling) to deal with this type of data. Here is a quote from the devs discussing two such approaches:

A more “learning” approach to the same fixed scaling would be a BatchNormalization layer, which will learn mean/std for each feature dimension, such that that feature has a mean of 0 and std of 1 on the training data (approximately). After training, these means/stds are saved to be applied to each example at inference.

Another normalization approach is LayerNormalization, which normalizes each example such that the mean is 0 and standard deviation is 1. Since each example is only normalized relative to the other values in that same example, each example can be independently normalized at inference time. This can well when all the feature dimensions are representing the same type of value (e.g. all pixels in an image), but I’m not sure if it would work so well if you’ve got heterogeneous data (e.g. you’ve got a bunch of attributes about a person like “height”, “weight”, “age”, etc.).

I hope this helps!

fladventurerob · July 12, 2022, 2:04am

Thanks so much for this feedback. It is very helpful. As a followup to this, if at all possible @xchoo, could you please inquire about the following regarding the recommendation of applying specific MinMax scaling as I may have not provided enough information on my first post:

The model was not trained on fixed MinMax, but rather on historic numerical timeseries datasets. MinMax was taken from each dataset, so if there were 10,000 datapoints, with a Min of .0083234 at some point in the dataset, and a Max of 1.2313 at some point in the datase, those would be the Min and Max that were used when training on that dataset. Moving on to the next dataset, if the MinMax of that dataset was .102 and .948, that would be used as the MinMax values. These many different datasets with different MinMax values, created the model.

Moving on to forward testing on previously unseen data, excellent results are seen, but only when scaled to the MinMax value of that dataset. Attempts to set the MinMax manually, or using statistical analysis to set the MinMax on the streaming data has proved less than fruitful, or, a flip of the coin if it works, even when the MinMax is set to what would be a statistical best guess of the future MinMax of the streaming data based on the current streaming values, taking into account historical MinMax values when the streaming data was within that range. If I record the streaming data however for 6 hours, then run a forward test on that newly recorded dataset. setting the Min and Max from that dataset as the Min, and the as scaling MinMax, presto, it works beautifully.

A big concern I have, is that it’s looking for data anomalies, so a rise from where the dataset is relatively constant with minimal fluctuation, and an anomaly would end at the max of the MinMax for the dataset…so is the model really working, or is it just looking for rises in the scaled data, and estimates this is the start of the event, and if it happens that the event continues and gets close to the max (1) of the MinMax, and then identifies this, rather than looking for the actual event itself? Its not a problem I have had before and am not quite sure what to do or try at this point.

Any further comments would be met with gratitude and my thanks, because I am definitely stuck right now.

xchoo · July 15, 2022, 8:35pm

@fladventurerob,

I’m afraid this is more a general machine learning issue, rather than a Nengo issue, so help here will be limited. I would suggest trying the batch normalization approach mentioned in my previous post to see if that works? Apart from that, I don’t have many recommendations.