Loading...
Thumbnail Image

Thesis

Browse

Search Results

Now showing 1 - 9 of 9
  • ThesisItemOpen Access
    Application of Multivariate Techniques in Data Analysis utilizing SAS/R Softwares
    (SKUAST Kashmir, 2016) Shah, Immad Ahmad; Imran Khan
    Multivariate analysis is the analysis of observations on several correlated random variables, for a number of individuals. Such analysis becomes necessary when one deals with several variables simultaneously. Principal Component Analysis, Factor analysis, Discriminant Analysis and Cluster Analysis lay the foundation of multivariate analysis. Principal components are linear combinations of random or statistical variables which have special properties in terms of variance. A principal component analysis is concerned with the explaining of the variance-covariance structure through a few linear combinations of the original variables. Factor Analysis has similar aims to Principal Component Analysis. The essential purpose of factor analysis is to describe if possible, the covariance relationships among many variables in terms of a few underlying, but unobservable random quantities called as factors. Discriminant Analysis is a multivariate technique concerned with separating distinct sets of objects (or observations) and with allocating new objects to the previously defined groups. Cluster Analysis groups data based on information found in the data that describes the objects and their relationships. The goal is that the objects within a group be similar (or related) to one another or different from (or unrelated to) the objects in other groups. In the present study the data was obtained from DARS (Dryland Agriculture Research Station), Budgam, SKUAST-K comprising of 55 genotypes of maize and, 12 characters were evaluated for each genotype. The data was analysed using various techniques viz. PCA, Factor Analysis, Cluster Analysis and Discriminant Analysis using SAS/R software. The analysis on the data using various multivariate techniques showed that both Principal component analysis and Factor Analysis revealed high loadings for the characters plant height, ear height, row per cob, cob/plant, grain/row, 100 seed weight, cob length, cob diameter and yield/plant in PC1 & Factor 1 respectively. The characters 50%tasseling, 50% silking, 75% HB had high loadings for PC2 and Factor 2. Thus the characters classified under Principal component 1 & Factor 1 are the major characters responsible for divergence amongst the genotypes. Similarly, cluster analysis classified the genotypes into two distinct clusters. Cluster 1 containing 10 genotypes and Cluster 2 containing 45 genotypes. A Fishers Linear Discriminant Analysis revealed a high coefficient of the Linear Discriminant 1 for the character Cob per Plant which means the variable has high contribution towards discrimination. On comparing the results of SAS and R software’s it was observed that for PCA the signs of the coefficients of the principal components (Eigenvectors) obtained from SAS were different from those obtained from R software, however the end results remains the same. MANOVA using R software revealed significant differences between the two types of genotypes. Similarly, univariate ANOVA revealed results showing the significant difference between the two types for all the characters except for 50% Tasseling, 50% Silking and 75% Husk Browning.
  • ThesisItemOpen Access
    On Quantative Aspects of Prediction Models in Commercial Forestry Using R/S-Plus Software
    (SKUAST Kashmir, 2012) Shah, Ajaz Ahmad; Mir, S. A.
    Present investigation entitled, “On quantitative aspects of prediction models in commercial forestry using R/S-plus software” was carried out at RRS and FOA Wadura. The objective of the study was to develop the best fitted model for volume calculation of Populus deltoides and its Validation. The data used for the study was collected from the Populus deltoids plantation of the area. Standing tree height and dbh was measured with the help of Clinometer and measuring tape. The trees were then harvested and volume data of logs was recorded/computed by using the newtons formula. 42 models of linear, non-linear, log-linear and single variable models were separately fitted by using the variables DBH, H, Ln DBH2H etc as explanatory variables. The functions fitted using variable DH and Ln DBH2H were found better than those fitted using variable D, D2+H2, D+H+D2H,Ln√DH on the basis of R2, adj. R2 and χ2 criteria. The linear model V=2.3296+0.0446DH was adjudged best fit with (R2=0.9782, χ2=0.0147 and Error index=0.0382). Techniques such as Cross-Validation and Jack-Knife were used for the validation of models. Cross-Validation and Jack-knife estimate of the linear models using explanatory variable DH was found to be 0.0457 and 0.0479, which is negligible. Based on residual plots, validation techniques and fit statistics, the model V= 2.3296+0.0446DH can be used for predicting volume and stem biomass of Populus deltoides under Kashmir conditions.
  • ThesisItemOpen Access
    Use of Cluster sampling Technique in Agricultural data Analysis using R Software
    (SKUAST Kashmir, 2011) Jeelani, M Iqbal; Mir, A. H.
    The present study was carried out on apple data, which was collected from district Ganderbal at block level for estimation of average yield of apple along with its standard error, intra-class correlation coefficient between orchards within clusters and efficiency of cluster sampling as compared to simple random sampling. Being operationally more convenient, less time consuming and importantly cost wise efficient cluster sampling methodology was implemented in the present study as compared to simple random sampling because collection of data from neighboring elements becomes easier and operationally more convenient than observing units spread out in population. However, in terms of statistical efficiency cluster sampling is generally less efficient than simple random sampling due the usual tendency of elements in the cluster to be similar. But if clusters are formed in such a way that there is a maximum heterogeneity within clusters (i.e., they have full range of variability within them) and be externally homogeneous (i.e., be similar to each other) as possible then cluster sampling becomes more efficient than simple random sampling. The basic aim of the present study is to provide an approach where we can show that cluster sampling is more efficient than simple random sampling provided the mean square within the clusters is maximum and there is a negative intra-class correlation coefficient between elements within clusters, because relative efficiency of cluster sampling increases with increase in mean square within clusters (i.e., clusters should be so formed that the variation within clusters is maximum, while variation between clusters is minimum).Different estimators of cluster sampling are applied and their results are compared with simple random sampling using the same sample size. Different computer programmes are prepared using R-software and analysis as per the objectives is carried out. In the preliminary study correlation and regression analysis is carried out and also for graphics qq norm plots, box plots and density plots are presented. With the help of R -software new functions like cluster1(x,N), cluster2(x), cluster3(x), DEF1(x) and SRSWOR(Y,N) were developed. All these functions were run on real date set generated on Apple crop in year 2010-11 from block Ganderbal.
  • ThesisItemOpen Access
    Bayesian Analysis of Non-linear Models with S-PLUS and R Softwares
    (SKUAST Kashmir, 2008) Raja, Wasim; Mir, A H
    Bayesian statistics is an approach to statistics that uses all information surrounding the likelihood of an event rather than that collected experimentally. In Bayesian approach the use of a prior is more logical as it is part of the Baye’s rule which provides basis for using this information in a formal manner. Bayesian inference is the process of fitting a probability model to a set of data and summarizing the result by a probability distribution on the parameters of model. Thus, essential characteristic of Bayesian method is their explicit use of probability for quantifying uncertainty in inference based on statistical data analysis (Gelman et al., 2003 page no.3). Therefore, the study of different features of posterior density of the parameter of interest is mainly required. This thesis deals with the Bayesian analysis of nonlinear models such as single parameter nonlinear model, simple growth model, Cobb-Douglas production function, logistic model and hierarchical Bayes model and they were fitted for the analysis using different prior information. The analysis shows that the posterior density of parameters does not change significantly with change in prior information. For constructing posterior densities, asymptotic approximations like Normal and Laplace were used and it is shown that Laplace approximations of Tierney and Kadane (1986) provides good approximation even in the tails of distribution also as compared to Normal approximation. The function MCMCmetrop1R ()available in MCMCpack library of R-Software is used in this thesis for getting Metropolis sampling from posterior density using a random walk Metropolis algorithm. The purpose of the simulation was to study the different features of the posterior densities. Thus, we have implemented Metropolis simulation technique to bypass the computation of integrals of posterior distribution. These posterior densities are constructed throughout the research work which contains all sort of information required for Bayesian analysis of nonlinear model and it was concluded that Laplace approximation and Metropolis simulation techniques provides the exact picture of posterior densities whereas, the Normal approximation of regression coefficient do not depict a clear image in the tails of posterior density. It is illustrated practically with the help of S-Plus and R-Software on the basis of newly developed Functions like BayesOne.Summary(),Bayes.Laplace(), BayesTwo.Summary(),logpostNL.Norm(),logpostNL.Cauchy()and Bayes.logistics (). Hierarchical Bayes methodology is also discussed in reference to multilocational trial of oats data. This model has been fitted by nlme () function of nlme library and it was observed on the basis of BIC (Bayesian Information Criterion) that we should treat location as random not fixed. In Bayesian terminology this is equivalent to use of informative prior for location effect. The approximation to the posterior function proposed by Lindstrom and Bates (1990) is the basis of the algorithm currently implemented in the nlme function. Another method is Laplacian approximation to the posterior distribution, this algorithm currently implemented in the nlmer () function of lme4 library which has been used to elucidate the results in pharmacokinetics data of Rabbit.
  • ThesisItemOpen Access
    Bayesian Analysis of Generalized Linear Models with S-PLUS and R Softwares
    (SKUAST Kashmir, 2008) Malik, Masood Hassan; Khan, Athar Ali
    The term Bayesian refers to Reverend Thomas Bayes. The foundation of Bayesian logic is Bayes’ theorem. Bayes’ theorem provides a vehicle for changing or updating, the degree of belief about a parameter in light of more recent information. It is a formal procedure for merging knowledge obtained from experience termed as a prior with the information we get from data termed as likelihood. These two sources of information are combined together to form posterior density. In this methodology, investigators are mainly concerned with construction of posterior density and once posterior density is constructed, every important aspect of Bayesian analysis is supposed to be completed. Obtaining posterior inference using more than one method is an excellent way to debug computer programs and ensure that the results are accurate. Therefore, in this thesis we have implemented analytic approximations along with the simulations tools that are Normal and Laplace approximations to investigate the posterior densities analytically. The Markov Chain Monte Carlo (MCMC) techniques have been used throughout the thesis to bypass the computation of integrals of posterior distribution and to work out the comparison of their respective posterior densities obtained from analytical tools. These posterior densities are constructed throughout the thesis which contain all sort of information required for Bayesian modeling. The above techniques are illustrated through generalized linear models especially probit, logit and complementary log-log models. Practical illustrations have been made with the help of S-PLUS and R softwares through newly developed functions logitPostNI, logitPostcau, logitPostgamma, ProbitPostNI, ProbitPostcau, ProbitPostgamma, and bayes.summary. Several inbuilt functions of R and S-PLUS like MCMCprobit, mcmcsamp of MCMCpack and lme4 library, respectively were also used to obtain the posterior densities for fish breeding data and that of hierarchical generalized linear model. The hierarchical generalized linear models were also fitted by lmer function of lme4 library of Douglas Bates (2007) along with glmmpql function of MASS library. All the programmed functions as well as existing functions were run on dose response data of Bliss and the Venturia inequillis data. This venturia inequilis data is a real data set generated on venturia inequillis (causal organism of apple scab) in year 2007-08 tried at three different locations, with four different chemicals and six doses.
  • ThesisItemOpen Access
    On Some Aspects of Plot Techniques in Field Experiments on Tomato (Lycopersicon esculentum mill.) in Soils of Kashmir
    (SKUAST Kashmir, 2007) Sameera Shafi; Mir, S. A.
    A uniformity trial on S-II variety of tomato (Lycopersicon escueantum mill.) was conducted at the RRS & FOA SKUAST-K, Wadura Campus Sopore to workout optimum size and shape of the experimental plot using maximum curvature and Fairfield Smith’s law. The trial indicated that coefficient of variation decreased with the increase in plot size in either direction, but decrease was more in the North-South direction rather than East-West direction. After mathematical verification, the optimum plot size was obtained to be 8m2 with the plot shape 4m x 2m. The relative efficiency of the obtained plot size was recorded as 35 percent. The fertility contour map of the experimental trial, based on six clusters, showed that fertility gradient does not follow a systematic pattern, but appears in patches and over all fertility gradient is in north-south direction. The effect of shape and size of plots/blocks investigated in the study revealed that the optimum plot size of 8m2 with shape (4m x 2m) of the block of size 10 can achieve 19.7% to 56.16% relative information within 2 to 5 replications.
  • ThesisItemOpen Access
    Bayesian Modelling for Small Area with S-PLUS and R Software
    (SKUAST Kashmir, 2007) Nageena Nazir; Khan, Athar Ali
    Bayesian approach is an approach to statistics which formally seeks use of the prior information and Bayes’ theorem provides basis for using this information in a formal manner. Consequently, study of different features of posterior density of the parameter of interest is mainly required. Bayesian data analysis is the process of fitting a probability model to a set of data, by taking the joint posterior distribution of all observable and unobservable quantities in the analysis, conditioning on observed data, calculating and interpreting the appropriate posterior distribution and finally, evaluating the fit of the model so that conclusions drawn are reasonable. This thesis deals with the Bayesian modelling for small area with S-PLUS and R software. Intercept model, two sample test, correlation analysis, regression model, analysis of variance and Hierarchical Bayes model were fitted for the analysis and it has been shown that posterior distribution of mean of intercept model follows normal distribution when variance is known and Student’s t-distribution when variance is unknown. The posterior distribution of comparison of means in two sample test follows t-distribution. The posterior distribution of regression coefficient  follows multivariate normal distribution when σ2 is assumed known, and each component of  follows univariate normal distribution. In contrast, when σ2 is assumed unknown normal density is replaced by multivariate t-distribution for regression coefficient vector  and marginal posterior for each of the components, that is , is univariate Student’s t-distribution. These posterior densities are constructed throughout the thesis which contain all sort of information required for Bayesian modelling for small area. It is illustrated practically with the help of S-PLUS and R softwares on the basis of newly developed functions postNorm(), predictDist(), postMean(), postVar(), posteriorMean(), postMeanDiff(), postDelta(), postcorr2(), and postRegb(). In analysis of variance several inbuilt function of R and S-PLUS and used to obtain the multiple comparison of means, 95% highest posterior density region and contrast comparisons. Relationship of hierarchical Bayes and Henderson’s mixed model methodology are discussed assuming multivariate normal distributions for fixed and random effects. This model has been fitted by lme() function of nlme library due to Pinheiro and Bates (2000). All these function are run on real date set generated on potato crop (Solanum tuberosum) in year 2005-06 at five different locations with 12 different genotypes.
  • ThesisItemOpen Access
    On Some Aspects of Auxiliary Information as Utilized in Estimating the Extent of Cultivation and Yield of a Fruit Crop in a Sampling Design
    (SKUAST Kashmir, 2007) Shah, Mohd Abdul Rouf; Mir, A.H.
    The study entitled “On Some Aspects of Auxiliary Information as Utilized in Estimating the Extent of Cultivation and Yield of a Fruit Crop in a Sampling Design”, embodies the study of some optimum estimators and use of auxiliary information in different sample survey designs at planning, selection and estimation stages. A number of sampling strategies are considered to provide an insight in different sampling procedures adopted at planning and designing stage of sampling. Various sampling schemes are studied for utilization of auxiliary information at selection stage and a number of estimators are compared for minimum variance of the sampling estimates at estimation stages. The developed sampling methodology is used to define a general sampling design for estimation of the extent of cultivation and production of a fruit crop in a particular region, district or zone. The design is used for a pilot sample survey in the zone Achabal of tehsil Sopore during the year 2005 to estimate in particular the yield of apple crop, the number of trees and the area under apple orchards. The estimates arrived at are more precise and accurate than the estimates obtained by various agencies through traditional methods. The estimation procedures are supplemented by computer software developed for this purpose based on R, rendering the tedious computation process very easy and adaptable for analysis.
  • ThesisItemOpen Access
    Bayesian Regression Analysis of Agronomical Data
    (SKUAST Kashmir, 2006) Shah, Roof Ahmad; Lone, Abdul Hamid
    Bayesian approach is an approach to statistics which formally seeks use of prior information and Bayes’ theorem provides basis for making use of this information. Bayesian data analysis is the process of fitting a probability model to a set of data, by taking the joint posterior distribution of all observable and unobservable quantities in the analysis, conditioning on observed data, calculating and interpreting the appropriate posterior distribution and finally, evaluating the fit of the model so that conclusions drawn are reasonable. This dissertation deals with the Bayesian regression analysis of agronomical data. Intercept model, simple regression and multiple regression model were fitted for the analysis and it has been shown that posterior distribution of regression coefficient follows multivariate normal distribution when is assumed known, and each component of follows univariate normal density. In contrast, when is assumed unknown normal density is replaced by multivariate t- distribution for regression coefficient vector and marginal posterior density for each of the component, that is, is univariate Student’s t distribution. These posterior densities are constructed throughout the whole dissertation which contain all sort of information required for Bayesian regression analysis. In this dissertation, an overview of Bayesian approach is described in detail. A Bayesian model for regression analysis is illustrated practically with the help of S-PLUS and R based newly developed functions BayesReg and BayesMultreg.All illustrations are made on a real data generated on rice crop (Oryaza Stavia). In the situations when response was not following Gaussian distribution power transformation was used. A function post Lambda was developed for calculating posterior density of power .