Loading...
Thumbnail Image

Theses

Browse

Search Results

Now showing 1 - 8 of 8
  • ThesisItemOpen Access
    Small Area Estimation of Wheat Yield Using Remote Sensing Data in Hisar and Sirsa Districts of Haryana
    (CCSHAU, Hisar, 2021-09) Muhammed Jaslam, P K; Manoj Kumar
    This study focuses on estimating the wheat yield by using the direct and indirect small area estimation techniques at block levels in Hisar and Sirsa districts of Haryana also to develop suitable crop yield models for wheat using satellite spectral data. The simple average of the yield recorded in the villages within the block is the usual estimator for block wheat yield, which is unstable and most of the block level estimates have large CVs. Direct small area estimation techniques such as post stratified and GREG estimation are used to get a precise estimate of wheat yield. Implicit models such as synthetic and composite estimation, as well as explicit models such as unit level and Area level SAE, are used in the indirect small area estimation technique. Furthermore, area level SAE model was developed for a total of 42 blocks in 6 districts of western zone of Haryana. The CV percent value of the block level estimate computed using all small area estimation is lower in comparison to the usual estimate. In the post-stratified direct, synthetic, and composite estimation methods used, the CV values of the composite estimators were found to be less in comparison to post stratified direct and synthetic estimators. In agreement to the basic theory, we obtained good estimation results using the unit level SAE model. Furthermore, using the Robust method of the unit level SAE model to reduce the effect of outliers boosted precision level. This study demonstrated that having a closely related auxiliary variable at the area level (SAE at the area level - Class 3 & 4) can provide a comparable level of precision to a unit level model. Since multicollinearity was detected between the predictor variables for crop yield modelling, we investigated ways in which the simple linear model can be improved by replacing plain least squares fitting with some alternative fitting procedures, such as stepwise regression, ridge regression, LASSO, principal component regression, and partial least square regression. The PLS regression model is found to be the best method (in terms of R2 and RMSE) for predicting block level yields using remote sensing data in western zone of Haryana
  • ThesisItemOpen Access
    Support Vector Machine and Artificial Neural Network Models for Classification of Wheat and Mustard Genotypes
    (CCSHAU, Hisar, 2021-10) Mujahid Khan; Hooda, B. K.
    The aim of this study was to classify the wheat and mustard genotypes using discriminant analyses, artificial neural networks and support vector machine models. The secondary data of 302 wheat and 870 mustard genotypes for 14 morphological variables were used. The class variable grain yield was categorized into 3 classes in wheat dataset, which makes it a multiclass problem. While, mustard genotypes were categorized into binary classes on the basis of grain yield, oil content and combined variables. In discriminant analyses, the performance of regularized discriminant analysis was higher than that of linear and quadratic discriminant analyses for both the datasets. Out of the three artificial neural network (ANN) models used for wheat dataset, training accuracy of resilient propagation was higher whereas less satisfactory results were obtained for radial basis function (RBF) network as compared to multi-layer perceptron (MLP) networks. But in mustard dataset, the training accuracies were notably high and testing accuracies were at par for RBF neural networks as compared to MLP networks. Out of the six kernels used for support vector machine (SVM) classification, RBF kernel outperformed all other kernel functions for both the datasets. Then the outputs of SVM paradigm with six kernels were combined in an Ensemble with Weighted Accuracy (EWA) model. The ensemble model provided high prediction accuracies for both the datasets in comparison to individual kernel classifiers. The particle swarm optimization (PSO) technique has set more suitable parameters, provided higher classification accuracy in both the datasets. The ensemble model outperformed the others with 95.1% training accuracy followed by resilient propagation neural networks (94.7%) and PSO optimized SVM (94.2%) for wheat genotypes. While for testing data set, the EWA model and PSO optimized SVM performed well with 94.9% accuracy. The classification of mustard genotypes was found better with the grain yield as class variable followed by oil content. The ensemble model outperformed the other classifiers with 93.5% training accuracy followed by PSO optimized SVM (92.6%) and RBF neural networks (91.9%) for mustard genotypes. Whereas for testing dataset, highest accuracy of 92.6% was achieved with PSO optimized SVM followed by all neural network models (90.7%). The lowest accuracies were obtained with linear discriminant analysis for both the datasets.
  • ThesisItemOpen Access
    Fitting linear mixed effects models for unbalanced longitudinal data
    (CCSHAU, Hisar, 2020-08) Ravita; Verma, Urmil
    The classical linear regression model is an important statistical tool but its use is limited because of its standard assumptions. Regression models using time series data occur quite oftenly and the assumption of uncorrelated or independent errors is often not appropriate. Moreover, many time series having complex structure calls for the addition of fixed and random effects accounting for the observational design. Such effects are straightforward to add in a mixed model environment (accommodating unbalanced data). The fixed-effects parameters can be either qualitative (as in the traditional analysis of variance) or quantitative (as in standard linear regression). BLUP is a standard method for estimating random effects of a mixed model. The mixed procedure uses the REML method, also known as residual maximum likelihood. It is here that the Gaussian assumptions are exploited. One such class is varying coefficient models, where the response variable is allowed to depend linearly on some regressors, with coefficients as smooth functions of some other predictor variables, called the effect modifiers. Varying coefficient models, where the effect modifier variable is calendar time, leads to time-varying coefficient models. The statistical modelling approaches viz., multiple linear regression and linear mixed effects were applied to develop mustard yield forecasts models on agro-climatic zone basis in Haryana. The mustard yield data for the period 1980-81 to 2016-17 of Hisar, Bhiwani, Sirsa, Mahendragarh and Gurugram, 1989-90 to 2016-17 of Rewari and 1997-98 to 2016-17 of Jhajjar and Fatehabad districts alongwith fortnightly weather data were used for the purpose. The zonal yield forecast models have been developed on the basis of time-trend and weather data from 1980-81 to 2009-10 while the data from 2010-11 to 2016-17 were used for validity checking of the developed models. Trend yield/time variable was included to take care of variation between districts within zone as the weather data are not available for all the districts, though the zonal model utilized the same weather information in adjoining districts under the zone. The linear mixed effects models with time both as fixed and random effects and weather as random effects with covariance structures; VC, AR(1) and Toeplitz have been fitted. The post-sample predictive performance(s) of alternative LMMs and regression based weather-yield models were observed in terms of percent relative deviations from real-time yield(s) and root mean square error(s), and that differed markedly among the alternative models. LMMs with weather as random effect(s) consistently showed the superiority over regression based weather-yield models in capturing lower percent relative deviations. The LMMs with weather as random effects performed well with lower error metrics as compared to the alternative mixed effects/regression models in most of the post-sample time regimes. Sevensteps ahead (i.e. 2010-11 to 2016-17) predicted values favour the use of LMMs. A critical in-depth of the results indicates the preference of using varying coefficients models in comparison to conventional, i.e., constant/fixed coefficients models developed under this empirical study. The linear mixed effects models with Toeplitz type structure substantially improved the predictive accuracy and produced what can be considered as satisfactory district-level mustard yield prediction in Haryana.
  • ThesisItemOpen Access
    Cluster analysis of mixed Data: A genetic algorithm approach
    (CCSHAU, Hisar, 2021-09) Sumbherwal, Nisha; HOODA, B.K
    The present study deals with the problem of clustering of pearl millet, wheat and cotton genotypes with mixed variables. Mixed variables data which is combination of continuous and categorical variables occurs frequently in fields such as medical, agriculture, remote sensing, biology, marketing, ecology etc., but a little work has been done for dealing with such type of data. The study used secondary data on pearl millet, wheat and cotton crops comprised 60, 120 and 218 genotypes respectively. The data on pearl millet and wheat genotypes during kharif and rabi season 2018-19 respectively were obtained from the Department of Genetics and Plant breeding at CCS Haryana Agriculture University, Hisar, Haryana and the data on cotton crop was taken from Ph.D. thesis (Mohan, 2005), available on Krishikosh. Various clustering methods on numeric, categorical and mixed variables data were studied and are explained in detail. The performance of genetic algorithm based clustering method on numeric, categorical and mixed variables data was compared with conventional clustering methods. It was found that genetic algorithm based clustering method performed better than other clustering methods for the types of dataset (i.e. numeric, categorical and mixed). The optimal number of clusters for pearl millet, wheat and cotton data was obtained as 2, 2 and 3 respectively. Agglomerative hierarchical cluster analysis of mixed variables datasets were carried out using Gower, Podani, Huang, Ahmad and Harikumar distances. It was found that in case of pearl millet and wheat data, the performance of agglomerative hierarchical clustering methods using Podani distance performed better as compared to agglomerative hierarchical clustering methods using other distance measures. And for cotton data agglomerative hierarchical clustering methods using Ahmad distance performed better. The cluster analysis of numeric and mixed variables data were carried out using Ward‟s method and it was found that Ward‟s method using mixed variables data performed better under most of the cluster validation measures than by using only numeric variables.
  • ThesisItemOpen Access
    Time series intervention modeling and simulation for mustard yield forecasting in Haryana
    (CCSHAU,HiSAR, 2020-10) Ajay Kumar; Verma, Urmil
    Modeling and Simulation is a discipline for developing a level of understanding of the interaction of the parts of a system, and of the system as a whole. A model is a simplified representation of a system at some particular point in time or space intended to promote understanding of the real system. Simulation permits the evaluation of operating performance prior to the implementation of a system. The study compares the efficacy of time series Intervention models and simulation in quantifying the pre-harvest mustard yield in Hisar, Bhiwani, Sirsa, Fatehabad, Mahendragarh, Rewari, Jhajjar and Gurugram districts of Haryana. The objective of this study was to assess the forecast accuracy of the contending models for district-level mustard yield forecasts in Haryana. The fortnightly weather data on rainfall, minimum temperature and maximum temperature over the crop growth period (September-October to February-March) have been utilized from 1980-81 to 2010-11 for the models‟ building. The weather-yield data from 2011-12 to 2015-16 have been used to check the post-sample validity of the fitted models for mustard yield forecasts in comparison to those obtained from State Department of Agriculture crop yield(s) estimates. The statistical modeling approaches viz., multiple linear regression, ARIMA, regression with ARIMA errors (RegARIMA) and ARIMA-Intervention were applied for the purpose. First of all, weather-yield models based on multiple linear regression were developed to relate mustard yield to fortnightly weather input alongwith linear time-trend yield/crop condition term as an indicator variable.Alternatively, ARIMA, RegARIMA, and ARIMA-Intervention models were fitted as per targeted objectives. Additionally, Student‟s t-copula in SAS is applied as a simulation tool and compared the output to the time series forecasts. The forecasts are compared to determine if there is either a consistent or significant difference between the two output. The forecast performance(s) of the alternative models were observed in terms of percent relative deviations of mustard yield forecasts from observed yield(s) and root mean square error(s). RegARIMA models performed well with lower error metrics as compared to the alternative models in most of the time regimes. Five-steps ahead forecast figures i.e. 2011-12 to 2015-16 favour the use of RegARIMA models to obtain pre-harvest mustard yield forecasts in the districts under study. The forecasts generated by RegARIMA are remarkably close to the forecasts obtained through the simulation process. Empirical evidence from this study confirms that the RegARIMA model can produce reliable forecasts and would therefore provide a more robust approach of forecasting with limited data sets.using the developed forecast models, the district-level mustard yield estimates could be computed successfully well in advance of the actual harvest. On the other hand, the State Department of Agriculture crop yield estimates are obtained quite late after the actual crop harvest.
  • ThesisItemOpen Access
    Structural equation modeling with latent variables to establish relationship between yield and its components for major crops of Haryana
    (CCSHAU, Hisar, 2021-01) Ram Niwas; Sheoran, O.P.
    Structural equation modeling (SEM) is a powerful multivariate statistical analysis technique used to analyze structural relationships with a wide range of applications in the plant sciences using measured variables and latent constructs. This method is preferred over other methods because it estimates the multiple and interrelated dependencies in a single analysis. SEM provides robust estimates of path coefficients that characterizes complex phenomenon and biological processes. The SEM of bread wheat has been hypothesized on the basis of the four latent variables viz. physiological (ξ1), morphological (ξ2), fertility & quality (ξ3) parameters as exogenous latent constructs where as grain parameter (η1) as endogenous latent construct as suggested by the preliminary exploratory factor analysis. The final model has been assessed through fit indices viz. Chi square (16.25 at P-value 0.298), goodness of fit index (0.98), root mean square error approximation (0.031) and chi square ratio (1.17). The SEM of basmati rice has been hypothesized on the basis of the four latent variables viz. physiological (ξ1), morphological (ξ2) and fertility (ξ3) parameters as exogenous latent constructs whereas grain parameter (η1) as endogenous latent construct as suggested by the preliminary exploratory factor analysis. The final model has been assessed through fit indices like Chi square (14.31 at P-value 0.426), goodness of fit index (0.99), root mean square error approximation (0.007) and chi square ratio (1.02). The latent constructs in cotton are horizontal growth (ξ1) and morphological (ξ2) parameters as exogenous where as biochemical (η1) and yield (η2) parameter as endogenous latent construct. The final model of cotton has been assessed through fit indices like Chi square (36.69 at P-value 0.484), goodness of fit index (0.99), root mean square error approximation (0.000) and chi square ratio (0.99). The latent constructs in barley are phenological (ξ1) and grain (ξ2) parameters as exogenous whereas yield (η1) parameter as endogenous latent construct. The final model of barley has been assessed through fit indices like Chi square (29.18 at P-value 0.213), goodness of fit index (0.98), root mean square error approximation (0.089) and Chi square ratio (1.22).The structural equation model of pearl millet has been hypothesized on the basis of the four latent variables viz. physiological (ξ1), morphological (ξ2) and fertility (ξ3) parameters as exogenous latent constructs whereas yield parameter (η1) as endogenous latent construct as suggested by the preliminary exploratory factor analysis. The final model has been assessed through fit indices chi square (34.54 at P-value 0.302), goodness of fit index (0.96), root mean square error approximation (0.047) and chi square ratio (1.15). The SEM model that fits well to the data indicated that there is a positive influence of physiological and morphological parameters on the endogenous latent variable yield whereas a positively highly significant influence of fertility parameter on the endogenous latent variable yield was observed.
  • ThesisItemOpen Access
    Regional Frequency Analysis of Extreme Rainfall Using L-moments and Partial L-moments in Haryana
    (CCSHAU, Hisar, 2021-06) Nain, Mohit; HOODA, B.K.
    Regional frequency analysis (RFA) is of great importance for planning and designing hydraulic structures by policymakers and structural engineers. In this study, we focus on regional frequency analysis of daily and monthly extreme rainfall from 1970–2017 at 27 rain gauge stations of Haryana (India) using L-moments and PL-moments. Based on mean monthly rainfall, these 27 rain gauge stations were grouped into three homogeneous regions (Region-I, Region-II, and Region-III) using Ward‟s method of cluster analysis and homogeneity of each region was tested using heterogeneity measures (H). The best fit regional distribution was selected for each region from the five candidate distributions i.e. GEV, GNO, GLO, GPA, and PE3 using the -statistic and L-and PL-moments ratio diagrams. For maximum monthly rainfall, using L-moments method, it was found that GNO was best-fitted for Region-I and Region-II while PE3 for Region-III. For maximum daily rainfall, for Region-I, Region-II and Region-III; PE3, GEV, and GLO was the best-fitted distribution, respectively. Using PL-moments method, for Region-I, for maximum monthly rainfall, GNO was best fitted. For Region-II, GEV was best fitted and PE3 for Region-III. Quantiles for various return periods were estimated using these best-fitted distributions for each region. The performance of both methods i.e. L-moments and PL-moments in quantiles estimation were studied by Monte Carlo simulations. From these simulations, accuracy measures such as relative RMSE and absolute relative bias were calculated and it was observed that these accuracy measures were smaller in the case of PL-moments as compared to L-moments. Also, quantiles were estimated using the regional and at-site base approach. The performance of regional and at-site based rainfall quantiles was studied in terms of relative RMSE. It was observed that regional analysis provided better estimates of the quantiles compared to the at-site based estimation.
  • ThesisItemOpen Access
    State Space Modelling with Weather as Exogenous Input for Sugarcane Yield Prediction in Haryana
    (CCSHAU, Hisar, 2020-05) Hooda, Ekta; Verma, Urmil
    Parameter constancy is a fundamental issue for empirical models to be useful for forecasting, analyzing or testing any theory. This work addresses the concept of parameter constancy and the implications of predictive failure. Predictive failure is uniquely a post-sample problem. Unlike classical regression analysis, the state space models are time varying parameters models as they allow for known changes in the structure of the system over time and provide a flexible class of dynamic and structural time series models. The study has been performed in two parts i.e. the development of state space models in two forms (the state space and unobserved component approach), and the state space models with weather as an exogenous input for sugarcane yield prediction in Ambala, Karnal, Kurukshetra, Panipat and Yamunanagar districts of Haryana. The time series sugarcane yield data for the period 1966-67 to 2009-10 of Ambala and Karnal districts, 1972-73 to 2009-10 of Kurukshetra and 1970-71 to 2009-10 of Panipat and Yamunanagar districts were used for the development of different models. The validity of fitted models have been checked for the subsequent years i.e., 2010-11 to 2016-17, not included in the development of the yield forecast models. The selection of autoregressive orders, i.e., five, three, two, four and five looked reasonable for Ambala, Karnal, Kurukshetra, Panipat and Yamunanagar districts respectively helped in determining the amount of past information to be used in the canonical correlation analysis and further leading to the selection of state vector. Information from the canonical correlation and preliminary autoregression analyses were used to form preliminary estimate of the parameters of state space models and that provided the sugarcane yield estimates using Kalman filtering technique. The UCMs with level, trend and irregular components were fitted to study the trend of sugarcane yield. For all the five districts, the irregular component was found to be highly significant while both level and trend component variances were observed non-significant. Lastly, the state space models with weather as exogenous input using different types of growth trends viz., polynomial splines; PS(1), PS(2) and PS(3) were developed. The weather variables used for each district were selected on the basis of stepwise regression method and PS(2) with weather input was selected as the best suited model for all districts. The post-sample sugarcane yield estimates were obtained on the basis of fitted SS, UCM and SSM with exogenous input. The predictive performance(s) of the contending models were observed in terms of percent relative deviations and RMSEs of sugarcane yield forecasts in relation to observed yield(s). The SSMs with weather input consistently showed the superiority over SS and UCM models in capturing lower percent relative deviations. Thus, it is inferred that the state space models may be effectively used pertaining to Indian agriculture data, as it takes into account the time dependency of the underlying parameters which may further enhance the predictive accuracy of time-series models with parameter constancy.