Computational Intelligence in the estimation of CRISPR-Cas9 cleavage sites

Loading...
Thumbnail Image
Date
2019
Journal Title
Journal ISSN
Volume Title
Publisher
ICAR-INDIAN AGRICULTURAL STATISTICS RESEARCH INSTITUTE ICAR-INDIAN AGRICULTURAL RESEARCH INSTITUTE NEW DELHI
Abstract
CRISPR-Cas9 system is one of the most used genome editing techniques in the recent time. In spite of its high potentiality to modify the specific target genes and region of the genome which are complementary of the designed guide RNA (or sgRNA), still it suffers from the off-target effect. Here, in this study, an attempt has been made to develop models based on three machine learning based techniques (i.e. Artificial Neural Network, Support Vector Machine and Random Forest) for estimation of the CRISPR-Cas9 cleavage sites to be cleaved by a given sgRNA. All these machine learning based models were exclusively developed on the plant dataset. The models were trained on the 70 percent of the collected on-target and off-target dataset of different plant species. Whereas the performance of the model were evaluated on remaining 30 percent of collected data based on following statistics; specificity, sensitivity, accuracy, precision and AUC. All together eleven models were trained based above machine learning techniques. Relative evaluation of these developed models reveals that model based on random forest technique shows better performance. Its area under ROC curve (AUC) was found to be 99.0%. Total six models based on ANN technique (ANN1-Logistic, ANN1-Tanh, ANN1-ReLU, ANN2-Logistic, ANN2-Tanh, and ANN-ReLU) and four SVM models (SVM-Linear, SVM-Polynomial, SVM-Gaussian and SVM-Sigmoid) were trained. The performance of ANN1-ReLU and SVM-Linear model were found to be better among ANN and SVM based models respectively. The best performing developed models were compared with other available off-target prediction tool (CRISTA) exclusively on plant dataset and it was found that our models outperforms the available tool. Keywords: CRISPR, Cas9, sgRNA, genome editing, off-target, Artificial Neural Network, Support Vector Machine, Random Forest, CRISTA.
Description
T-10260
Keywords
null
Citation
Collections