TY - JOUR
T1 - Refining Semantic Similarity of Paraphasias Using a Contextual Language Model
AU - Salem, Alexandra C.
AU - Bedrick, Steven
AU - Gale, Robert
AU - Casilio, Marianne
AU - Fleegle, Mikala
AU - Fergadiotis, Gerasimos
N1 - Funding Information:
This work was supported by National Institute on Deafness and Other Communication Disorders Grant R01DC015999 (principal investigators: Steven Bedrick and Gerasimos Fergadiotis). The authors thank the study participants who donated their time; Adelyn Brecher for her assistance with the interpretation of the Philadelphia Naming Test guidelines; Brooke Cowan for her work in code development; Alex Swiderski for his preliminary extraction of the Moss Aphasia Psycholinguistics Project Database data set; Katy McKinney-Bock and Linying Li for their initial development using BERT (Bidirectional Encoder Representations from Transformers); and Hattie Olson, Khanh Nguyen, Emily Tudorache, and Mia Cywinski for their efforts in reviewing paraphasia discrepancies.
Publisher Copyright:
© 2022 American Speech-Language-Hearing Association.
PY - 2023/1
Y1 - 2023/1
N2 - Purpose: ParAlg (Paraphasia Algorithms) is a software that automatically cate-gorizes a person with aphasia’s naming error (paraphasia) in relation to its intended target on a picture-naming test. These classifications (based on lexi-cality as well as semantic, phonological, and morphological similarity to the tar-get) are important for characterizing an individual’s word-finding deficits or anomia. In this study, we applied a modern language model called BERT (Bidi-rectional Encoder Representations from Transformers) as a semantic classifier and evaluated its performance against ParAlg’s original word2vec model. Method: We used a set of 11,999 paraphasias produced during the Philadelphia Naming Test. We trained ParAlg with word2vec or BERT and compared their performance to humans. Finally, we evaluated BERT’s performance in terms of word-sense selection and conducted an item-level discrepancy analysis to iden-tify which aspects of semantic similarity are most challenging to classify. Results: Compared with word2vec, BERT qualitatively reduced word-sense issues and quantitatively reduced semantic classification errors by almost half. A large percentage of errors were attributable to semantic ambiguity. Of the possible semantic similarity subtypes, responses that were associated with or category coordinates of the intended target were most likely to be misclassified by both models and humans alike. Conclusions: BERT outperforms word2vec as a semantic classifier, partially due to its superior handling of polysemy. This work is an important step for further establishing ParAlg as an accurate assessment tool.
AB - Purpose: ParAlg (Paraphasia Algorithms) is a software that automatically cate-gorizes a person with aphasia’s naming error (paraphasia) in relation to its intended target on a picture-naming test. These classifications (based on lexi-cality as well as semantic, phonological, and morphological similarity to the tar-get) are important for characterizing an individual’s word-finding deficits or anomia. In this study, we applied a modern language model called BERT (Bidi-rectional Encoder Representations from Transformers) as a semantic classifier and evaluated its performance against ParAlg’s original word2vec model. Method: We used a set of 11,999 paraphasias produced during the Philadelphia Naming Test. We trained ParAlg with word2vec or BERT and compared their performance to humans. Finally, we evaluated BERT’s performance in terms of word-sense selection and conducted an item-level discrepancy analysis to iden-tify which aspects of semantic similarity are most challenging to classify. Results: Compared with word2vec, BERT qualitatively reduced word-sense issues and quantitatively reduced semantic classification errors by almost half. A large percentage of errors were attributable to semantic ambiguity. Of the possible semantic similarity subtypes, responses that were associated with or category coordinates of the intended target were most likely to be misclassified by both models and humans alike. Conclusions: BERT outperforms word2vec as a semantic classifier, partially due to its superior handling of polysemy. This work is an important step for further establishing ParAlg as an accurate assessment tool.
UR - http://www.scopus.com/inward/record.url?scp=85146322515&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146322515&partnerID=8YFLogxK
U2 - 10.1044/2022_JSLHR-22-00277
DO - 10.1044/2022_JSLHR-22-00277
M3 - Article
C2 - 36492294
AN - SCOPUS:85146322515
SN - 1092-4388
VL - 66
SP - 206
EP - 220
JO - Journal of Speech, Language, and Hearing Research
JF - Journal of Speech, Language, and Hearing Research
IS - 1
ER -