We consider the problem of training a deep neural network on a given
classification task, e.g., ImageNet-1K (IN1K), so that it excels at both the
training task as well as at other (future) transfer tasks. These two seemingly
contradictory properties impose a trade-off between improving the model's
generalization and maintaining its performance on the original task. Models
trained with self-supervised learning tend to generalize better than their
supervised counterparts for transfer learning; yet, they still lag behind
supervised models on IN1K. In this paper, we propose a supervised learning
setup that leverages the best of both worlds. We extensively analyze supervised
training using multi-scale crops for data augmentation and an expendable
projector head, and reveal that the design of the projector allows us to
control the trade-off between performance on the training task and
transferability.  We further replace the last layer of class weights with class
prototypes computed on the fly using a memory bank and derive two models: t-ReX
that achieves a new state of the art for transfer learning and outperforms top
methods such as DINO and PAWS on IN1K, and t-ReX* that matches the highly
optimized RSB-A1 model on IN1K while performing better on transfer tasks.

Code and pretrained models: https://europe.naverlabs.com/t-rex