Document Type


Date of Degree Completion

Summer 2022

Degree Name

Master of Arts (MA)



Committee Chair

Charles Li

Second Committee Member

Loretta Gray

Third Committee Member

Penglin Wang


The Korean College Scholastic Ability Test (CSAT) is a highly competitive standardized assessment that graduating high-school seniors complete in the hope of getting a good score which will improve their chances of admission to a university of choice. The CSAT contains an English Section that has been described by scholars and educators alike as being far too difficult for the official English language curriculum to serve as sufficient preparation. The test’s lack of construct validity has been the basis for calls to revise the test to be better reflective of the school curriculum so that it can serve the evaluative purpose for which it is intended. Use of automated text evaluation methods with the software Coh-Metrix 3.0 in recent years has allowed scholars to quantify different dimensions of the text of the CSAT English Section, such as cohesion and syntactic complexity, that contribute to its reading difficulty. Older research conducted before the introduction of this software into the field used word frequency counts in large corpora such as the British National Corpus (BNC) as a measure of word familiarity or unfamiliarity, which was thought to directly contribute to difficulty because as the proportion of low-frequency words in a text increases against the proportion of high-frequency words, the word knowledge burden of the text increases in proportion. Since the introduction of automated software-based tools like Coh-Metrix 3.0 and Lexical Complexity Analyzer (LCA), these corpus-based research methods have largely fallen by the wayside. In this paper, I maintain that despite its lower sophistication, corpus-based lexical analysis can still produce uniquely meaningful findings because of the degree of manual control the researcher is afforded in calibrating the parameters of the text base and, most importantly, in selecting the ranges of word family frequency that are best tailored to a text rather than having the ranges or functions of frequency assigned automatically by software. This study reports correlations between the outputs of these two methodologies that both inform us about the validity of Coh-Metrix 3.0’s use in CSAT studies and quantify the strength of the role of word frequency in causing the excessive difficulty of the CSAT English Section.