In this paper, we aim to improve the mathematical interpretability of
convolutional neural networks for image classification. When trained on natural
image datasets, such networks tend to learn parameters in the first layer that
closely resemble oriented Gabor filters. By leveraging the properties of
discrete Gabor-like convolutions, we prove that, under specific conditions,
feature maps computed by the subsequent max pooling operator tend to
approximate the modulus of complex Gabor-like coefficients, and as such, are
stable with respect to certain input shifts. We then compute a probabilistic
measure of shift invariance for these layers. More precisely, we show that some
filters, depending on their frequency and orientation, are more likely than
others to produce stable image representations. We experimentally validate our
theory by considering a deterministic feature extractor based on the dual-tree
wavelet packet transform, a particular case of discrete Gabor-like
decomposition. We demonstrate a strong correlation between shift invariance on
the one hand and similarity with complex modulus on the other hand.