Volume 25, Issue 4 (July 2017)                   JSSU 2017, 25(4): 300-310 | Back to browse issues page

XML Persian Abstract Print

Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Seyedmir F, Mirzaie K, Bitaraf Sani M. The Studies of Decision Tree in Estimation of Breast Cancer Risk by Using Polymorphism Nucleotide . JSSU. 2017; 25 (4) :300-310
URL: http://jssu.ssu.ac.ir/article-1-3547-en.html
Abstract:   (1902 Views)



  Decision tree is the data mining tools to collect, accurate prediction and sift information from massive amounts of data that are used widely in the field of computational biology and bioinformatics. In bioinformatics can be predict on diseases, including breast cancer. The use of genomic data including single nucleotide polymorphisms is a very important factor in predicting the risk of diseases. The number of seven important SNP among hundreds of thousands genetic markers were identified as factors associated with breast cancer. The objective of this study is to evaluate the training data on decision tree predictor error of the risk of breast cancer by using single nucleotide polymorphism genotype.


The risk of breast cancer were calculated associated with the use of SNP formula:xj = fo * In human,  The decision tree can be used To predict the probability of disease using single nucleotide polymorphisms .Seven SNP with different odds ratio associated with breast cancer considered and coding and design of decision tree model, C4.5, by  Csharp2013 programming language were done. In the decision tree created with the coding, the four important associated SNP was considered. The decision tree error in two case of coding and using WEKA were assessment and percentage of decision tree accuracy in prediction of breast cancer were calculated. The number of trained samples was obtained with systematic sampling. With coding, two scenarios as well as software WEKA, three scenarios with different sets of data and the number of different learning and testing, were evaluated.


In both scenarios of coding, by increasing the training percentage from 66/66 to 86/42, the error reduced from 55/56 to 9/09. Also by running of WEKA on three scenarios with different sets of data, the number of different education, and different tests by increasing records number from 81 to 2187, the error rate decreased from 48/15 to 13/46. Also in the majority of scenarios, prevalence of the disease, had no effect on errors in the WEKA and code.


The results suggest that with increased training, and thus the accuracy of prediction error decision tree to reduce the risk of breast cancer increases with the use of decision trees. In Biological data, decision trees error is high even with a 66/66% training. On the other hand by increasing the number of SNP from 4 to 7 decision tree, decision tree error dramatically decreased at 70/1% training. In general we can say that with increased training and increasing the number of SNP in the decision tree, the prediction accuracy increased and errors reduced. In the CODING and WEKA, percentage of disease prevalence had no significant effect on errors,” Because of selecting set of training and testing by systemic method “.

Full-Text [PDF 986 kb]   (669 Downloads)    
Type of Study: Original article | Subject: Oncology
Received: 2015/12/29 | Accepted: 2017/07/23 | Published: 2017/09/25

1. Ha S, Bae S, Park S. Web mining for distance education. In IEEE international conference on management of innovation and technology 2000; 715–719.
2. Lior R, Oded M. DATA MINING WITH DECISION TREES. Theory and Application. World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link 2008: 1-244.
3. Kingsford C, Salzberg S. What are decision trees?, nature biotechnology 2008; 26(9): 1011–1013.
4. Moor J, Asselbergs F, Williams F. REVIEW: Bioinformatics challenges for genome-wide association studies. Genetics and population analysis 2010; 26(4): 445–455.
5. Aragones J, Ruiz J, Jimenez G, Perez J, Conejo E. A combined neural network and decision trees model for prognosis of breast cancer relapse. 2003; Artificial Inteligence in Medicine 27: 45-63.
6. Sumbaly R, Vishnusri N, Jeyalatha S. Diagnosis of Breast Cancer using Decision Tree Data Mining Technique. International Journal of Computer Application (0975-8887) 2014; 98(10): 16-24.
7. Ramırez N, Mesa H, Calvet H ,Martınez R. Discovering interobserver variability in the cytodiagnosis of breast cancer using decision trees and Bayesian networks. 2009; Applied Soft Computing 9: 1331–1342.
8. Chen K, Wang K, Tsai M, Wang K, Adrian A, Cheng W, Yang T, Teng N, Tan K, Chang K. Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC Bioinformatics 2014; doi:10.1186/1471-2105-15-49: 1-20.
9. Listgarten J et al. Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms. Clinical Cancer Researc 2004; 10: 2725-37.
10. Kushyar MM, Nasiri M, Bitarafsani M, Aslaminejad AA. Feasibility Study of the Detection of SNPs Associated with Breast Cancer by Genome-Wide Association Virtual Studies. Jurnal of Genetic dar hezare sevom 2014; 11(3): 3190-3199.(Persian)
11. M Deepika, L Mary Gladence, and R Madhu Keerthana. A Review on Prediction Of Breast Cancer Using Various Data Mining Techniques. Research Journal of Pharmaceutical, Biological and Chemical Sciences 2016; ISSN: 0975-8585: 808-814.
12. Wang X, Peng O, Fan Y. Detecting Susceptibility to Breast Cancer with SNP-SNP Interaction Using BPSOHS and Emotional Neural Networks. Hindawi Publishing Corporation BioMed Research International 2016; Article ID 5164347: 1-7.
13. Delshi Howsalya Devi R, Indra Devi M. OUTLIER DETECTION ALGORITHM COMBINED WITH DECISION TREE CLASSIFIER FOR EARLY DIAGNOSIS OF BREAST CANCER. International Journal of Advanced Engineering Technology 2016; E-ISSN 0976-3945: 93-98.

Add your comments about this article : Your username or Email:

Send email to the article author

© 2018 All Rights Reserved | SSU_Journals

Designed & Developed by : Yektaweb