Classification of the death ratio of covid-19 pandemic using machine learning techniques

Ulaş, Efehan; Filiz, Enes

Gelişmiş Arama

Göster/Aç

Tam Metin / Full Text (577.5Kb)

Erişim

info:eu-repo/semantics/openAccess

Tarih

2022

Yazar

Ulaş, Efehan
Filiz, Enes

Üst veri

Tüm öğe kaydını göster

Özet

Since the COVID-19 pandemic has appeared, many epidemiological models are developed around the world to estimate the number of infected individuals and the death ratio of the COVID-19 outbreak. There are several models developed on COVID-19 by using machine learning techniques. However, studies that considered feature selection in detail are very limited. Therefore, the aim of this study is to (i) investigate the independent and interactive effects of a diverse set of features and (ii) obtain the algorithms which are significant for classifying the death ratio of the COVID-19 outbreak. It was found that logistic regression and decision tree (C4.5, Random Forests, and REPTree) are the best performed algorithms. A diverse set of variables found by feature selection approaches are the number of new tests per thousand, new cases per million, hospital patients per million, and weekly hospital admissions per million. The importance of this study is that a high rate of classification was obtained with a few features. This study showed that only the most relevant features should be considered in classification and the use of all variables in classification is not necessary.

COVID-19 pandemisi ortaya çıktığından beri, enfekte olmuş bireylerin sayısını ve COVID-19 salgınının ölüm oranını tahmin etmek için dünya çapında birçok epidemiyolojik model geliştirilmiştir. COVID-19 üzerinde makine öğrenimi teknikleri kullanılarak geliştirilmiş birkaç model bulunmaktadır. Ancak öznitelik seçimini ayrıntılı olarak ele alan çalışmalar oldukça sınırlıdır. Bu nedenle, bu çalışmanın amacı (i) çeşitli özelliklerin bağımsız ve etkileşimli etkilerini araştırmak ve (ii) COVID-19 salgınının ölüm oranını sınıflandırmak için önemli olan algoritmaları bulmaktır. Lojistik regresyon ve karar ağacının (C4.5, Random Forests ve REPTree) en uygun algoritmalar olduğu bulunmuştur. Öznitelik seçme yöntemleriyle elde edilen çeşitli öznitelikler, binde yeni test sayısı, milyonda yeni vaka, milyonda hastane hasta sayısı ve milyonda haftalık hastane kabulüdür. Bu çalışmanın önemi, birkaç özellik ile yüksek oranda sınıflandırma elde edilmiş olmasıdır. Bu çalışma, sınıflandırmada sadece en ilgili özelliklerin dikkate alınması gerektiğini ve sınıflandırmada tüm değişkenlerin kullanılmasının gerekli olmadığını göstermiştir.

Kaynak

Erzincan Üniversitesi Fen Bilimleri Enstitüsü Dergisi

Cilt

Sayı

Bağlantı

https://doi.org/10.18185/erzifbed.1090984
https://hdl.handle.net/20.500.12462/14524

Koleksiyonlar

İşletme Bölümü-Makale Koleksiyonu [213]
TR Dizin-Makale Koleksiyonu [3598]