We need to talk about standard splits

Kyle Gorman; Steven Bedrick

We need to talk about standard splits

Kyle Gorman, Steven Bedrick

Institute on Development and Disability

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

70 Scopus citations

Abstract

It is standard practice in speech & language technology to rank systems according to performance on a test set held out for evaluation. However, few researchers apply statistical tests to determine whether differences in performance are likely to arise by chance, and few examine the stability of system ranking across multiple training-testing splits. We conduct replication and reproduction experiments with nine part-of-speech taggers published between 2000 and 2018, each of which reports state-of-the-art performance on a widely-used “standard split”. We fail to reliably reproduce some rankings using randomly generated splits. We suggest that randomly generated splits should be used in system comparison.

Original language	English (US)
Title of host publication	ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Publisher	Association for Computational Linguistics (ACL)
Pages	2786-2791
Number of pages	6
ISBN (Electronic)	9781950737482
State	Published - 2020
Event	57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Florence, Italy Duration: Jul 28 2019 → Aug 2 2019

Publication series

Name	ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference	57th Annual Meeting of the Association for Computational Linguistics, ACL 2019
Country/Territory	Italy
City	Florence
Period	7/28/19 → 8/2/19

ASJC Scopus subject areas

Language and Linguistics
General Computer Science
Linguistics and Language

Cite this

We need to talk about standard splits. / Gorman, Kyle; Bedrick, Steven.
ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference. Association for Computational Linguistics (ACL), 2020. p. 2786-2791 (ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Gorman, K & Bedrick, S 2020, We need to talk about standard splits. in ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Association for Computational Linguistics (ACL), pp. 2786-2791, 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 7/28/19.

@inproceedings{c5bb0066091c4dce8e43348305396a22,

title = "We need to talk about standard splits",

abstract = "It is standard practice in speech & language technology to rank systems according to performance on a test set held out for evaluation. However, few researchers apply statistical tests to determine whether differences in performance are likely to arise by chance, and few examine the stability of system ranking across multiple training-testing splits. We conduct replication and reproduction experiments with nine part-of-speech taggers published between 2000 and 2018, each of which reports state-of-the-art performance on a widely-used “standard split”. We fail to reliably reproduce some rankings using randomly generated splits. We suggest that randomly generated splits should be used in system comparison.",

author = "Kyle Gorman and Steven Bedrick",

note = "Publisher Copyright: {\textcopyright} 2019 Association for Computational Linguistics.; 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 ; Conference date: 28-07-2019 Through 02-08-2019",

year = "2020",

language = "English (US)",

series = "ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference",

publisher = "Association for Computational Linguistics (ACL)",

pages = "2786--2791",

booktitle = "ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference",

}

TY - GEN

T1 - We need to talk about standard splits

AU - Gorman, Kyle

AU - Bedrick, Steven

PY - 2020

Y1 - 2020

N2 - It is standard practice in speech & language technology to rank systems according to performance on a test set held out for evaluation. However, few researchers apply statistical tests to determine whether differences in performance are likely to arise by chance, and few examine the stability of system ranking across multiple training-testing splits. We conduct replication and reproduction experiments with nine part-of-speech taggers published between 2000 and 2018, each of which reports state-of-the-art performance on a widely-used “standard split”. We fail to reliably reproduce some rankings using randomly generated splits. We suggest that randomly generated splits should be used in system comparison.

AB - It is standard practice in speech & language technology to rank systems according to performance on a test set held out for evaluation. However, few researchers apply statistical tests to determine whether differences in performance are likely to arise by chance, and few examine the stability of system ranking across multiple training-testing splits. We conduct replication and reproduction experiments with nine part-of-speech taggers published between 2000 and 2018, each of which reports state-of-the-art performance on a widely-used “standard split”. We fail to reliably reproduce some rankings using randomly generated splits. We suggest that randomly generated splits should be used in system comparison.

UR - http://www.scopus.com/inward/record.url?scp=85073945480&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073945480&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85073945480

T3 - ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

SP - 2786

EP - 2791

BT - ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

PB - Association for Computational Linguistics (ACL)

T2 - 57th Annual Meeting of the Association for Computational Linguistics, ACL 2019

Y2 - 28 July 2019 through 2 August 2019

ER -

We need to talk about standard splits

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this