Research Article

Prediction of Cancer in DNA Sequences Using Unsupervised Learning Methods

Volume: 7 Number: 1 June 21, 2023
EN

Prediction of Cancer in DNA Sequences Using Unsupervised Learning Methods

Abstract

Today, with the development of technology, the decision-making capabilities of machines have also increased. With their high analytical skills, computers can easily catch points and relationships that may escape the human eye. Thanks to these capabilities, machines are also widely used in the field of health. For example, many machine-learning techniques developed on cancer prediction have been successfully applied. Early detection of cancer is crucial to survival. In the early diagnosis of cancer, the rates of drug treatment, chemotherapy, or radiotherapy that the person will be exposed to are significantly reduced and the patient gets through this process with the least amount of wear and tear. Gene Expression Cancer RNA-Seq Dataset was used in this study. This data set includes gene expression values of 5 cancer types (BRCA, KIRC, LUAD, LUSC, UCEC). DNA sequences in the dataset were analyzed using k-means and hierarchical clustering algorithms, which are unsupervised machine learning methods. The aim of the study is to develop a usable machine-learning model for the early detection of cancer at the gene level. Adjusted Rand Index (ARI), Silhouette Score, and Accuracy Metrics were used to evaluate the analysis results. The rand index calculates the similarity between clusters by counting the binaries assigned to clusters. The adjusted Rand Index is a randomly adjusted version of the Rand Index. The silhouette score indicates how well a data point fits within its own set among separated datasets. The accuracy metric is obtained as a percentage of correctly clustered data points divided by all predictions. Different connection methods are used in the hierarchical clustering algorithm. These are 'complete', 'ward', 'average', and 'single'. As a result of the study, the accuracy in the k-means algorithm was 0.990, the Adjusted Rand Index was 0.79, and the Silhouette Score was 0.14. Looking at the hierarchical clustering, ward performed the best of the four linkage methods, with an ARI score of 0.76 and a silhouette score of 0.13. As a result of the study, the accuracy of the hierarchical clustering algorithm was 0.999.

Keywords

Supporting Institution

Bursa Teknik Üniversitesi

Thanks

The authors are grateful to TUBITAK ULAKBIM High Performance and Grid Computing Center (TRUBA), Bursa Technical University High-Performance Computing Laboratory.

References

  1. Prat A , Pineda E,Adamo B, Galván P, Fernández A, Gaba L, et al. Clinical implications of the intrinsic molecular subtypes of breast cancer. Breast, 2015
  2. M.C. de Souto, I.G. Costa, D.S. de Araujo, T.B. Ludermir, A. Schliep, Clustering cancer gene expression data: a comparative study, BMC Bioinforma. 9 (1) (2008) 497, https://doi.org/10.1186/1471-2105-9-497
  3. S. Saha, A. Ekbal, K. Gupta, S. Bandyopadhyay, Gene expression data clustering using a multiobjective symmetry based clustering technique, Comput. Biol. Med. 43 (11) (2013) 1965–1977, https://doi.org/10.1016/j.compbiomed.2013.07.021
  4. Fahad Hussain, Umair Saeed, Ghulam Muhammad, Noman Islam and Ghazala Shafi Sheikh, “Classifying cancer patients based on DNA sequences using machine learning”, 2019
  5. Elaheh Moradi, Antonietta Pepe, Christian Gaser, Heikki Huttunen, Jussi Tohka, “Machine learning framework for early MRI-based Alzheimer’s conversion prediction in MCI subjects”, 2014
  6. KonstantinaKourou, Themis P.Exarchos, Konstantinos P. Exarchos, Michalis V. Karamouzis, Dimitrios I. Fotiadisa, “Machine learning applications in cancer prognosis and prediction”, Computational and Structural Biotechnology Journal, 2015
  7. Gunasekaran Manogaran, V. Vijayakumar R. Varatharajan, Priyan Malarvizhi Kumar, Revathi Sundarasekar, Ching-Hsien Hsu, “Machine Learning Based Big Data Processing Framework for Cancer Diagnosis Using Hidden Markov Model and GM Clustering”, Wireless Personal Communications, 2018
  8. Zeid Khitan, Anna P. Shapiro, Preeya T. Shah, Juan R. Sanabria, Prasanna Santhanam, Komal Sodhi, Nader G. Abraham, and Joseph I. Shapiro, “Predicting Adverse Outcomes in Chronic Kidney Disease Using Machine Learning Methods: Data from the Modification of Diet in Renal Disease”, Marshall Journal of Medicine, 2017

Details

Primary Language

English

Subjects

Engineering

Journal Section

Research Article

Early Pub Date

June 21, 2023

Publication Date

June 21, 2023

Submission Date

July 5, 2022

Acceptance Date

October 17, 2022

Published in Issue

Year 1970 Volume: 7 Number: 1

APA
Doğru, Ş., & Altuntaş, V. (2023). Prediction of Cancer in DNA Sequences Using Unsupervised Learning Methods. Journal of Innovative Science and Engineering, 7(1), 40-47. https://doi.org/10.38088/jise.1134816
AMA
1.Doğru Ş, Altuntaş V. Prediction of Cancer in DNA Sequences Using Unsupervised Learning Methods. JISE. 2023;7(1):40-47. doi:10.38088/jise.1134816
Chicago
Doğru, Şeyma, and Volkan Altuntaş. 2023. “Prediction of Cancer in DNA Sequences Using Unsupervised Learning Methods”. Journal of Innovative Science and Engineering 7 (1): 40-47. https://doi.org/10.38088/jise.1134816.
EndNote
Doğru Ş, Altuntaş V (June 1, 2023) Prediction of Cancer in DNA Sequences Using Unsupervised Learning Methods. Journal of Innovative Science and Engineering 7 1 40–47.
IEEE
[1]Ş. Doğru and V. Altuntaş, “Prediction of Cancer in DNA Sequences Using Unsupervised Learning Methods”, JISE, vol. 7, no. 1, pp. 40–47, June 2023, doi: 10.38088/jise.1134816.
ISNAD
Doğru, Şeyma - Altuntaş, Volkan. “Prediction of Cancer in DNA Sequences Using Unsupervised Learning Methods”. Journal of Innovative Science and Engineering 7/1 (June 1, 2023): 40-47. https://doi.org/10.38088/jise.1134816.
JAMA
1.Doğru Ş, Altuntaş V. Prediction of Cancer in DNA Sequences Using Unsupervised Learning Methods. JISE. 2023;7:40–47.
MLA
Doğru, Şeyma, and Volkan Altuntaş. “Prediction of Cancer in DNA Sequences Using Unsupervised Learning Methods”. Journal of Innovative Science and Engineering, vol. 7, no. 1, June 2023, pp. 40-47, doi:10.38088/jise.1134816.
Vancouver
1.Şeyma Doğru, Volkan Altuntaş. Prediction of Cancer in DNA Sequences Using Unsupervised Learning Methods. JISE. 2023 Jun. 1;7(1):40-7. doi:10.38088/jise.1134816

Cited By


Creative Commons License

The works published in Journal of Innovative Science and Engineering (JISE) are licensed under a  Creative Commons Attribution-NonCommercial 4.0 International License.