Self-supervised models have been shown to produce comparable or better visual
representations than their supervised counterparts when trained offline on
unlabeled data at scale. However, their efficacy is catastrophically reduced in
a Continual Learning (CL) scenario where data is presented to the model
sequentially. In this paper, we show that self-supervised loss functions can be
seamlessly converted into distillation mechanisms for CL by adding a predictor
network that maps the current state of the representations to their past state.
This enables us to devise a framework for Continual self-supervised visual
representation Learning that (i) significantly improves the quality of the
learned representations, (ii) is compatible with several state-of-the-art
self-supervised objectives, and (iii) needs little to no hyperparameter tuning.
We demonstrate the effectiveness of our approach empirically by training six
popular self-supervised models in various CL settings.