Input image normalization


For most of the deep learning papers it is mentioned to normalize the image. To classify them broadly either images are mapped to the set [0.0, 1.0], or [-0.5, 0.5]. It would be great if someone could explain the motive behind choosing the set of mapping?.



Where does it say to do this? One reason I can imagine is the input domain of the tanh or sigmoid neurons being limited to [-1, 1], while the relu neurons have an input domain of $x>0.0$, but without the specific papers you’re referencing, I can’t be sure. This Q&A on seems to imply that reducing the covariance by standardizing the inputs is the general motivation.

Further reading:


Thank you for your answer. When I mentioned [-0.5, 0.5] mapping and combined ReLU activation I was talking with respect to this paper which discusses deblurring text using CNN.


Input normalization, or input preprocessing in general, is a big topic. But, to summarize at a very high level, the idea is that you want to standardize the inputs to your network as much as possible, so that a) learning is more stable (by reducing variability across the training data), b) your network will generalize better to novel data (because the normalization reduces the variability between your training and test data), and c) the inputs will fall within the useful range of your nonlinearities (as Sean pointed out wrt tanh or sigmoid).

Nowadays input normalization is less critical than it used to be though, due to the prevalence of relu's (which are less sensitive to the range of their inputs) and batch normalization (which accomplishes a similar thing to input normalization but on a layer-by-layer basis).