An NLP-Based Exploration of Variance in Student Writing and Syntax: Implications for Automated Writing Evaluation

Goldshtein, Maria; Alhashim, Amin G.; Roscoe, Rod D.

doi:10.3390/computers13070160

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

An NLP-Based Exploration of Variance in Student Writing and Syntax: Implications for Automated Writing Evaluation

by

Maria Goldshtein

^1,*,

Amin G. Alhashim

²

and

Rod D. Roscoe

^1,*

¹

Human Systems Engineering, Arizona State University, Mesa, AZ 85212, USA

²

Mathematics, Statistics, and Computer Science, Macalester College, Saint Paul, MN 55105, USA

^*

Authors to whom correspondence should be addressed.

Computers 2024, 13(7), 160; https://doi.org/10.3390/computers13070160

Submission received: 24 May 2024 / Revised: 12 June 2024 / Accepted: 19 June 2024 / Published: 25 June 2024

(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling)

Download Versions Notes

Abstract

In writing assessment, expert human evaluators ideally judge individual essays with attention to variance among writers’ syntactic patterns. There are many ways to compose text successfully or less successfully. For automated writing evaluation (AWE) systems to provide accurate assessment and relevant feedback, they must be able to consider similar kinds of variance. The current study employed natural language processing (NLP) to explore variance in syntactic complexity and sophistication across clusters characterized in a large corpus (n = 36,207) of middle school and high school argumentative essays. Using NLP tools, k-means clustering, and discriminant function analysis (DFA), we observed that student writers employed four distinct syntactic patterns: (1) familiar and descriptive language, (2) consistently simple noun phrases, (3) variably complex noun phrases, and (4) moderate complexity with less familiar language. Importantly, each pattern spanned the full range of writing quality; there were no syntactic patterns consistently evaluated as “good” or “bad”. These findings support the need for nuanced approaches in automated writing assessment while informing ways that AWE can participate in that process. Future AWE research can and should explore similar variability across other detectable elements of writing (e.g., vocabulary, cohesion, discursive cues, and sentiment) via diverse modeling methods.

Keywords: automated writing evaluation; natural language processing; student writing variability; syntax; writing styles

Share and Cite

MDPI and ACS Style

Goldshtein, M.; Alhashim, A.G.; Roscoe, R.D. An NLP-Based Exploration of Variance in Student Writing and Syntax: Implications for Automated Writing Evaluation. Computers 2024, 13, 160. https://doi.org/10.3390/computers13070160

AMA Style

Goldshtein M, Alhashim AG, Roscoe RD. An NLP-Based Exploration of Variance in Student Writing and Syntax: Implications for Automated Writing Evaluation. Computers. 2024; 13(7):160. https://doi.org/10.3390/computers13070160

Chicago/Turabian Style

Goldshtein, Maria, Amin G. Alhashim, and Rod D. Roscoe. 2024. "An NLP-Based Exploration of Variance in Student Writing and Syntax: Implications for Automated Writing Evaluation" Computers 13, no. 7: 160. https://doi.org/10.3390/computers13070160

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An NLP-Based Exploration of Variance in Student Writing and Syntax: Implications for Automated Writing Evaluation

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI