Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

Publication
ArXiv