Comparison among feature encoding techniques for HIV-1 protease cleavage specificity
Abstract
HIV-1 protease which is responsible for the generation of infectious viral particles by cleaving the virus polypeptides, play an indispensable role in the life cycle of HIV-1. Knowledge of the substrate specificity of HIV-1 protease will pave the way of development of efficacious HIV-1 protease inhibitors. In the prediction of HIV-1 protease cleavage site techniques, many efforts have been devoted. Last decade, several works have approached the prediction of HIV-1 protease cleavage site problem by applying a number of methods from the field of machine learning. However, it is still difficult for researchers to choose the best method due to the lack of an effective and up-to-date comparison. Here, we have made an extensive study on feature encoding techniques for the problem of HIV-1 protease specificity on diverse machine learning algorithms. Also, for the first time, we applied OEDICHO technique, which is a combination of orthonormal encoding and the binary representation of selected 10 best physicochemical properties of amino acids derived from Amino Acid index database, to predict HIV-1 protease cleavage sites. HIV-1 protease which is responsible for the generation of infectious viral particles by cleaving the virus polypeptides, play an indispensable role in the life cycle of HIV-1. Knowledge of the substrate specificity of HIV-1 protease will pave the way of development of efficacious HIV-1 protease inhibitors. In the prediction of HIV-1 protease cleavage site techniques, many efforts have been devoted. Last decade, several works have approached the prediction of HIV-1 protease cleavage site problem by applying a number of methods from the field of machine learning. However, it is still difficult for researchers to choose the best method due to the lack of an effective and up-to-date comparison. Here, we have made an extensive study on feature encoding techniques for the problem of HIV-1 protease specificity on diverse machine learning algorithms. Also, for the first time, we applied OEDICHO technique, which is a combination of orthonormal encoding and the binary representation of selected 10 best physicochemical properties of amino acids derived from Amino Acid index database, to predict HIV-1 protease cleavage sites.