Improving Generalization And Robustness With Noisy Collaboration In Knowledge Distillation
Inspired by trial-to-trial variability in the brain that can result from multiple noise sources, we introduce variability through noise at different levels in a knowledge distillation framework. We introduce "Fickle Teacher" which provides variable supervision signals to the student for the same input. We observe that the response variability from the teacher results in a significant generalization improvement in the student. We further propose "Soft-Randomization" as a novel technique for improving robustness to input variability in the student. This minimizes the dissimilarity between the student's distribution on noisy data with teacher's distribution on clean data. We show that soft-randomization, even with low noise intensity, improves the robustness significantly with minimal drop in generalization. Lastly, we propose a new technique, "Messy-collaboration", which introduces target variability, whereby student and/or teacher are trained with randomly corrupted labels. We find that supervision from a corrupted teacher improves the adversarial robustness of student significantly while preserving its generalization and natural robustness. Our extensive empirical results verify the effectiveness of adding constructive noise in the knowledge distillation framework for improving the generalization and robustness of the model.