How does ChatGPT perform on the european board of pediatric surgery examination? A randomized comparative study

Azizoğlu, Mustafa; Aydoğdu, Bahattin

dc.contributor.author	Azizoğlu, Mustafa
dc.contributor.author	Aydoğdu, Bahattin
dc.date.accessioned	2025-01-20T06:35:53Z
dc.date.available	2025-01-20T06:35:53Z
dc.date.issued	2024	en_US
dc.identifier.issn	1579-5853 / 2255-0569
dc.identifier.uri	https://doi.org/10.3306/AJHS.2024.39.01.23
dc.identifier.uri	https://hdl.handle.net/20.500.12462/15830
dc.description	Aydoğdu, Bahattin (Balikesir Author)	en_US
dc.description.abstract	Purpose: The purpose of this study was to conduct a detailed comparison of the accuracy and responsiveness of GPT-3.5 and GPT-4 in the realm of pediatric surgery. Specifically, we sought to assess their ability to correctly answer a series of sample questions of European Board of Pediatric Surgery (EBPS) exam. Methods: This study was conducted between 20 May 2023 and 30 May 2023. This study undertook a comparative analysis of two AI language models, GPT-3.5 and GPT-4, in the field of pediatric surgery, particularly in the context of EBPS exam sample questions. Two sets of 105 (total 210) sample questions each, derived from the EBPS sample questions, were collated. Results: In General Pediatric Surgery, GPT-3.5 provided correct answers for 7 questions (46.7%), and GPT-4 had a higher accuracy with 13 correct responses (86.7%) (p=0.020). For Newborn Surgery and Pediatric Urology, GPT-3.5 correctly answered 6 questions (40.0%), and GPT-4, however, correctly answered 12 questions (80.0%) (p= 0.025). In total, GPT-3.5 correctly answered 46 questions out of 105 (43.8%), and GPT-4 showed significantly better performance, correctly answering 80 questions (76.2%) (p<0.001). Given the total responses, when GPT-4 was compared with GPT-3.5, the Odds Ratio was found to be 4.1. This suggests that GPT-4 was 4.1 times more likely to provide a correct answer to the pediatric surgery questions compared to GPT-3.5. Conclusion: This comparative study concludes that GPT-4 significantly outperforms GPT-3.5 in responding to EBPS exam questions.	en_US
dc.description.abstract	Introducción: El propósito de este estudio fue realizar una comparación detallada de la precisión y capacidad de respuesta de GPT3.5 y GPT-4 en el ámbito de la cirugía pediátrica. En concreto, pretendíamos evaluar su capacidad para responder correctamente a una serie de preguntas de muestra del examen del European Board of Pediatric Surgery (EBPS). Métodos: Este estudio se llevó a cabo entre el 20 de mayo de 2023 y el 30 de mayo de 2023. Este estudio llevó a cabo un análisis comparativo de dos modelos de lenguaje de IA, GPT-3.5 y GPT-4, en el campo de la cirugía pediátrica, particularmente en el contexto de las preguntas de muestra del examen EBPS. Se cotejaron dos conjuntos de 105 (210 en total) preguntas de muestra cada uno, derivadas de las preguntas de muestra del EBPS. Resultados: En Cirugía Pediátrica General, la GPT-3.5 proporcionó respuestas correctas para 7 preguntas (46,7%), y la GPT-4 tuvo una mayor precisión con 13 respuestas correctas (86,7%) (p=0,020). Para Cirugía neonatal y Urología pediátrica, la GPT3.5 respondió correctamente a 6 preguntas (40,0%), y la GPT-4, sin embargo, respondió correctamente a 12 preguntas (80,0%) (p= 0,025). En total, la GPT-3.5 respondió correctamente a 46 preguntas de 105 (43,8%), y la GPT-4 mostró un rendimiento significativamente mejor, respondiendo correctamente a 80 preguntas (76,2%) (p<0,001). Teniendo en cuenta el total de respuestas, cuando se comparó la GPT-4 con la GPT-3.5, se observó que la Odds Ratio era de 4,1. Esto sugiere que la GPT-4 era 4,2 veces más eficaz que la GPT-3.5. Esto sugiere que GPT-4 tenía 4,1 veces más probabilidades de proporcionar una respuesta correcta a las preguntas de cirugía pediátrica en comparación con GPT-3.5. Conclusiones: Este estudio comparativo concluye que GPT-4 supera significativamente a GPT-3.5 a la hora de responder a las preguntas del examen EBPS	en_US
dc.language.iso	eng	en_US
dc.publisher	Reial Acad Medicina Illes Balears	en_US
dc.relation.isversionof	10.3306/AJHS.2024.39.01.23	en_US
dc.rights	info:eu-repo/semantics/embargoedAccess	en_US
dc.subject	ChatGPT	en_US
dc.subject	Pediatric Surgery	en_US
dc.subject	Exam	en_US
dc.subject	Questions	en_US
dc.subject	Artificial Intelligence	en_US
dc.subject	ChatGPT	en_US
dc.subject	Cirugía Pediátrica	en_US
dc.subject	Examen	en_US
dc.subject	Preguntas	en_US
dc.subject	Inteligencia Artificial	en_US
dc.title	How does ChatGPT perform on the european board of pediatric surgery examination? A randomized comparative study	en_US
dc.title.alternative	¿Cuál es el rendimiento de ChatGPT en el examen del consejo europeo de cirugía pediátrica? Un estudio comparativo aleatorizado	en_US
dc.type	article	en_US
dc.relation.journal	Medicina Balear	en_US
dc.contributor.department	Tıp Fakültesi	en_US
dc.contributor.authorID	0009-0000-3563-1230	en_US
dc.contributor.authorID	0000-0003-2858-3984	en_US
dc.identifier.volume	39	en_US
dc.identifier.issue	1	en_US
dc.identifier.startpage	23	en_US
dc.identifier.endpage	26	en_US
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US

Bu öğenin dosyaları:

Ad:: bahattin-aydoğdu.pdf
Boyut:: 294.9Kb
Biçim:: PDF
Açıklama:: Tam Metin / Full Text

Göster/Aç

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Cerrahi Tıp Bilimleri-Makale Koleksiyonu [636]
Surgical Sciences-Article Collection
WOS İndexli Yayınlar-Makale Koleksiyonu [4943]
WOS Indexed Publications-Article Collection

Basit öğe kaydını göster