On the Effect of Dropping Layers of Pre-trained Transformer Models

Publication
Computer Speech & Language