AdvGLUE

The Adversarial GLUE Benchmark

Performance of CreAT (single model) on AdvGLUE

Overall Statistics

Performance of CreAT (single model) on each task

The Stanford Sentiment Treebank (SST-2)

Quora Question Pairs (QQP)

MultiNLI (MNLI) matched

MultiNLI (MNLI) mismatched

Question NLI (QNLI)

Recognizing Textual Entailment (RTE)