Computational Intelligence in the estimation of CRISPR-Cas9 cleavage sites
Loading...
Date
2019
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ICAR-INDIAN AGRICULTURAL STATISTICS RESEARCH INSTITUTE ICAR-INDIAN AGRICULTURAL RESEARCH INSTITUTE NEW DELHI
Abstract
CRISPR-Cas9 system is one of the most used genome editing techniques in the recent
time. In spite of its high potentiality to modify the specific target genes and region of
the genome which are complementary of the designed guide RNA (or sgRNA), still it
suffers from the off-target effect. Here, in this study, an attempt has been made to
develop models based on three machine learning based techniques (i.e. Artificial
Neural Network, Support Vector Machine and Random Forest) for estimation of the
CRISPR-Cas9 cleavage sites to be cleaved by a given sgRNA. All these machine
learning based models were exclusively developed on the plant dataset. The models
were trained on the 70 percent of the collected on-target and off-target dataset of
different plant species. Whereas the performance of the model were evaluated on
remaining 30 percent of collected data based on following statistics; specificity,
sensitivity, accuracy, precision and AUC. All together eleven models were trained
based above machine learning techniques. Relative evaluation of these developed
models reveals that model based on random forest technique shows better
performance. Its area under ROC curve (AUC) was found to be 99.0%. Total six
models based on ANN technique (ANN1-Logistic, ANN1-Tanh, ANN1-ReLU,
ANN2-Logistic, ANN2-Tanh, and ANN-ReLU) and four SVM models (SVM-Linear,
SVM-Polynomial, SVM-Gaussian and SVM-Sigmoid) were trained. The performance
of ANN1-ReLU and SVM-Linear model were found to be better among ANN and
SVM based models respectively. The best performing developed models were
compared with other available off-target prediction tool (CRISTA) exclusively on
plant dataset and it was found that our models outperforms the available tool.
Keywords: CRISPR, Cas9, sgRNA, genome editing, off-target, Artificial Neural
Network, Support Vector Machine, Random Forest, CRISTA.
Description
T-10260
Keywords
null