Gelişmiş Arama

Basit öğe kaydını göster

dc.contributor.authorYavuz, Fatih
dc.contributor.authorÇelik, Özgür
dc.contributor.authorÇelik, Gamze Yavaş
dc.date.accessioned2025-01-14T11:41:51Z
dc.date.available2025-01-14T11:41:51Z
dc.date.issued2024en_US
dc.identifier.issn0007-1013 / 1467-8535
dc.identifier.urihttps://doi.org/10.1111/bjet.13494
dc.identifier.urihttps://hdl.handle.net/20.500.12462/15755
dc.descriptionÇelik, Özgür (Balikesir Author)en_US
dc.description.abstractThis study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student essays of varying quality. The grading scale comprised five domains: grammar, content, organization, style & expression and mechanics. The results revealed that fine-tuned ChatGPT model demonstrated a very high level of reliability with an intraclass correlation (ICC) score of 0.972, Default ChatGPT model exhibited an ICC score of 0.947 and Bard showed a substantial level of reliability with an ICC score of 0.919. Additionally, a significant overlap was observed in certain domains when comparing the grades assigned by LLMs and human raters. In conclusion, the findings suggest that while LLMs demonstrated a notable consistency and potential for grading competency, further fine-tuning and adjustment are needed for a more nuanced understanding of non-objective essay criteria. The study not only offers insights into the potential use of LLMs in grading student essays but also highlights the need for continued development and research. Practitioner notes What is already known about this topic Large language models (LLMs), such as OpenAI's ChatGPT and Google's Bard, are known for their ability to generate text that mimics human-like conversation and writing. LLMs can perform various tasks, including essay grading. Intraclass correlation (ICC) is a statistical measure used to assess the reliability of ratings given by different raters (in this case, EFL instructors and LLMs). What this paper adds The study makes a unique contribution by directly comparing the grading performance of expert EFL instructors with two LLMs—ChatGPT and Bard—using an analytical grading scale. It provides robust empirical evidence showing high reliability of LLMs in grading essays, supported by high ICC scores. It specifically highlights that the overall efficacy of LLMs extends to certain domains of essay grading. Implications for practice and/or policy The findings open up potential new avenues for utilizing LLMs in academic settings, particularly for grading student essays, thereby possibly alleviating workload of educators. The paper's insistence on the need for further fine-tuning of LLMs underlines the continual interplay between technological advancement and its practical applications. The results lay down a footprint for future research in advancing the use of AI in essay gradingen_US
dc.language.isoengen_US
dc.publisherJohn Wiley and Sonsen_US
dc.relation.isversionof10.1111/bjet.13494en_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/*
dc.subjectAI-Based Gradingen_US
dc.subjectAutomated Essay Scoringen_US
dc.subjectGenerative AIen_US
dc.subjectLarge Language Modelsen_US
dc.subjectReliabilityen_US
dc.subjectRubric-Based Gradingen_US
dc.subjectValidityen_US
dc.titleUtilizing large language models for EFL essay grading: An examination of reliability and validity in rubric-based assessmentsen_US
dc.typearticleen_US
dc.relation.journalBritish Journal of Educational Technologyen_US
dc.contributor.departmentYabancı Diller Yüksekokuluen_US
dc.contributor.authorID0000-0003-2645-2710en_US
dc.contributor.authorID0000-0002-0300-9073en_US
dc.contributor.authorID0000-0003-1571-9686en_US
dc.identifier.volume2024en_US
dc.identifier.issuejuneen_US
dc.identifier.startpage1en_US
dc.identifier.endpage17en_US
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US


Bu öğenin dosyaları:

Thumbnail

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster

info:eu-repo/semantics/openAccess
Aksi belirtilmediği sürece bu öğenin lisansı: info:eu-repo/semantics/openAccess