EVALUATION OF VALIDATION TECHNIQUES IN LINEAR & NON-LINEAR STATISTICAL MODELS

Loading...
Thumbnail Image
Date
2019-12-04
Journal Title
Journal ISSN
Volume Title
Publisher
Division of Statistics and Computer Science, Faculty of Basic Sciences, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Main Campus, Chatha, Jammu
Abstract
The present study entitled “Evaluation of Validation Techniques in Linear & Non-Linear Statistical Models” was carried out with a view to evaluate different validation techniques commonly used in various statistical models and to assess their suitability for predictive performance. The basic premise for a validation technique is to assess how results of statistical model will generalize to an independent data set and is mainly used in settings where goal is prediction and one wants to estimate how accurately a prediction model will perform in practice. In the present investigation linear and non-linear statistical models were fitted on simulated data. Both symmetric and asymmetric data were generated through simulation technique, also the fitting of models were carried out with the help of various libraries like minpack.lm,matrices and nlme in R studio (version 3.5.1, 2018) and various functions were also developed. All these functions were run on simulated data generated in the study. Besides, selection criteria like RMSE, MAE, AIC, BIC were also used while fitting of statistical models. Coefficient summary revealed that all statistical models were statistically significant across both symmetric as well as asymmetric distributions. In preliminary analysis TFEM (Type First Exponential Model) was found out to be the best linear model across both symmetric and asymmetric distributions with lower values of RMSE, MAE, BIAS, AIC and BIC. Among non-linear models, Haung model was found out to be best model across both the distributions as it has lower values of RMSE, MAE etc. Different validation techniques like Half splitting, LOOCV, 5-folded cross validation etc., were used in the present study. In order to evaluate different validation techniques the simulated data was divided in training and testing data set and various functions in R were developed for the purpose of validation. Based on the results of evaluation 5-folded cross validation performed better in comparison to other techniques across all the statistical models. The evaluation of validation techniques with respect to symmetric and asymmetric distribution was also assessed graphically. Based on prediction error rate, in case of both distributions across all statistical models 5-folded cross validation was found out to be the best validation technique. Therefore, it is concluded that 5-folded cross validation should be preferred in comparison to its counter parts because it evaluates the model on different subsets of training data and then calculates the average prediction error rate. In comparison to leave out one cross validation and jackknifing, where model performance is tested at each iteration, which results in higher prediction error rates in former and higher values of BIAS in later, especially when data points are outliers, 5-folded cross validation provides solution under such circumstances by taking a good ratio of testing data points. Also the reason behind the lower rates of prediction error in 5-folded cross validation in comparison to half splitting is that every subset of data is used as training as well as testing data.
Description
Keywords
Citation
Collections