Please use this identifier to cite or link to this item: http://krishikosh.egranth.ac.in/handle/1/5810103563
Authors: MABU, AUDU MUSA
Advisor: Yadav, Dr. Raghav
Title: A DATA MINING FRAMEWORK FOR CLASSIFICATION AND PREDICTION OF GENE DATASETS
Publisher: FACULTY OF ENGINEERING & TECHNOLOGY SAM HIGGINBOTTOM UNIVERSITY OF AGRICULTURE, TECHNOLOGY AND SCIENCES (FORMERLY ALLAHABAD AGRICULTURAL INSTITUTE) NAINI, ALLAHABAD-211007
Language: en
Type: Thesis
Pages: 156p.
Agrotags: null
Abstract: In the recent years, the emergence of data mining techniques in the research makes it possible to extract valuable information from huge volume of data. Good understanding of data mining techniques is necessary for research scholars and industries to make use of this opportunity efficiently to improve the quality of their findings. In fact, gene expression data classification offers a powerful approach to detect cancers from a given dataset. The applications of Gene Expression profiles on cancer diagnosis along with classification become an interesting subject in bioinformatics field. This research work focuses on the classification and prediction of gene expression data for the purpose of diagnosis and prognosis of cancer which is one of the most terrifying diseases. Gene expression profiles have been expansively revised to disclose insight into the threat of cancer and to discover hidden information which provides biological knowledge aimed at cancer classification. Precise cancer classification directly through original gene expression profiles stays challenging as a result of the intrinsic high-dimension feature and small size of the data samples. Therefore, choosing high discriminative genes from the gene expression data has become progressively interesting in the bioinformatics field. In this research work data mining frameworks for gene expression data classification has been proposed. The proposed techniques are termed as Gene Expression Data Classification Based on Entropy_Graph Classifier and Gene Expression Data Classification and Clustering Based Feature Selection. ii Gene Expression (GE) Data Classification Based on Entropy_Graph Classifier technique is aimed at the classification of gene expression data using entropy based graph classifier. Initially, the Signal to Noise Ratio (SNR) values of the gene expression data is evaluated, in addition, selects the relevant features using Krill Herd (KH) optimization process. Usually not all the features are helpful for classification. They contain some redundant and irrelevant features which might characterized them as outliers. To discard the outliers, feature reduction is done with the backing of Euclidean distance. Finally, classification is done using entropy based graph classifier. The experiment’s outcomes for the GE data classification demonstrate the predominance of the graph classifier over latest methodologies relating to precision (81% for gastric cancer together with 72% for colon cancer), recall (100% for gastric cancer together with 100% for colon cancer), F-measure (89.5 for gastric cancer together with 83.72 for colon cancer) together with computational time. Experimental outcome proves the effectiveness of the proposed process in contrast with the existing method concerning classification. The technique of GE Data Classification and Clustering Based Feature Selection first preprocessed the dataset by normalization then feature selection which was accomplished with the help of Feature Clustering Support Vector Machine (FCSVM). FCSVM has two phases, Gene Clustering and Gene Representation. To make the chosen top-positioned features worthy for classification, feature reduction is performed by utilizing SVM-Recursive Feature Elimination (SVM-RFE) algorithm. The reduced features are finally classified by means of Artificial Neural Network (ANN) classifier. The result was promising as the precision, recall and F-measure of the proposed FCSVM_ANN for CNS_dataset are 78%, 90% and 85% while for Colon_dataset were 80%, 95% and 87% respectively. iii As for the accuracy and specificity, FCSVM_ANN exhibits 94% and 96% for CNS_dataset while Colon_dataset shows 96% and 98%
Description: Ph. D. Thesis
Subject: Computer Applications
Theme: A DATA MINING FRAMEWORK FOR CLASSIFICATION AND PREDICTION OF GENE DATASETS
Research Problem: A DATA MINING FRAMEWORK FOR CLASSIFICATION AND PREDICTION OF GENE DATASETS
These Type: Ph.D
Issue Date: 2018
Appears in Collections:Thesis

Files in This Item:
File Description SizeFormat 
15PHCOMP102_Thesis.pdf2.1 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.