Loading...
Thumbnail Image

Theses

Browse

Search Results

Now showing 1 - 1 of 1
  • ThesisItemOpen Access
    Multivariate clustering techniques- a comparison based on rose (rosa spp.)
    (Department of Agricultural Statistics, College of Agriculture, Vellayani, 2018) Arya, V Chandran; KAU; Vijayaraghava Kumar
    The study entitled “Multivariate clustering techniques – a comparison based on rose (Rosa spp.)” was undertaken to compare different clustering techniques, to identify the suitable technique for different types of qualitative and quantitative data and to illustrate the procedures using data based on a field experiment on rose (Rosa spp.). Data on quantitative and qualitative traits collected from a field experiment on “Characterization and genetic improvement in Rose (Rosa spp.) through mutagenesis” done during 2014-2017 at College of Agriculture, Vellayani and Regional Agriculture Research Station (RARS), Ambalavayal, Wayanad was used for the study. Twenty five cultivars each coming under the Hybrid Tea and Floribunda groups of rose were evaluated for the study. There were nine quantitative characters and three qualitative characters. Statistical studies were carried out with the help of statistical packages SPSS, STATA, SAS, R and NTSYS. Preliminary statistical analysis by applying Analysis of variance (ANOVA) for all quantitative characters under study revealed significant difference among different genotypes with respect to each character. Multivariate analysis of variance (MANOVA) was carried out to test the significance of varietal means for each group. The results indicated difference among the cultivar means for both groups with respect to all quantitative characters. Linear discriminant function developed using nine quantitative characters for each of the groups were used to elucidate the differences between them. The average score obtained was 11.01 for the Hybrid Tea type and – 2.34 for Floribunda type with an overall average of 4.38. Discriminant function analysis reassured the difference between the two groups under study. Cluster analysis on Hybrid Tea type and Floribunda type were performed for quantitative, qualitative and mixed data. Association measures used were Euclidean distance, Squared Euclidean, Chebychev distance, City Block distance and Mahalanobis D2 for quantitative data, Jaccard, Dice, Simple matching and Hamann’s coefficient for qualitative data and Gower’s measure for mixed data. Different methods such as single linkage, complete linkage, Unweighted Pair Group Average Method (UPGMA), Weighted Pair Group Average Method (WPGMA), Unweighted Pair Group Centroid Method (UPGMC), Ward’s method, modified Tocher method, k means clustering and Principal Component Analysis (PCA) were adopted for the clustering of cultivars. Optimum numbers of clusters were determined by Pseudo t2 statistics for hierarchical clustering and by Pesudo F statistics for k means clustering. SD ( Scatterness- Distance) index was used to test validity of clustering based on quantitative data. Clustering based on qualitative data was carried out using seven characters, three of which are qualitative traits and all others are quantitative characters converted to qualitative traits. Jaccard and Dice coefficient were used for binary data while Simple matching and Hamann’s were used for multi-state data. The result of different clustering techniques based on Squared Euclidean distance gave approximately the same result as that of Euclidean distance. The Jaccard and Dice coefficients were found to be very similar, so that there was no difference in topology of dendrogram but only in branch length. Clustering pattern under Simple matching and Hamann’s coefficient provided were of similar type. For both groups among all the clustering methods, single linkage clustering under different distance measures tends to create a set of one or two clusters including majority of the genotypes and the remaining genotypes are single or two member clusters. Single linkage clustering tends to produce long chain types clusters as opposed to bunched clusters. On the other hand, the single linkage algorithm suffers chaining effect. Among other clustering algorithms, complete linkage method and Ward’s clustering method showed similar results under Squared Euclidean distance. UPGMA, WPGMA and UPGMC methods under Squared Euclidean method gave comparable results. Clustering using UPGMA and WPGMA method gives almost same clustering pattern under different distance measures for qualitative and quantitative data. Results obtained from k means clustering are comparable with results obtained from hierarchical clustering except for single linkage clustering. A certain degree of similarity was observed between k means and D2 analysis but not to up that between other clustering methods. Under Hybrid Tea genotypes, H16 (Mary Jean) formed a single cluster under single linkage method using different distance measures for quantitative, qualitative and mixed data analysis. Under complete linkage method H7 (Alaine Souchen) and H25 (Josepha) came under same cluster, in clustering based on quantitative and qualitative characters. H22 (Mom’s Rose) and H23 (Lois Wilson) came under same cluster in clustering based on complete linkage, UPGMA and WPGMA except under Hamann’s coefficient. These came under the same cluster under D2 analysis also. Among Floribunda genotypes F2 (Tickled Pink) and F5 (Princess de Monaco) were included in the same cluster under UPGMA method for both quantitative and qualitative data. F1 (Versailles) and F24 (Golden Fairy) also came under the same cluster except for multistage distances under UPGMA. Clustering based on mixed data gave approximately the same results as that of quantitative data under different clustering algorithms except for single linkage clustering. Comparison using SD index indicated high index value for clustering based on Gower’s measure. Comparison among single linkage, complete linkage and Average linkage under different association measures using SD index were carried out. Average linkage method under Squared Euclidean was found to be the best for both type with SD index 0.651 for Hybrid Tea and 0.659 for Floribunda type. Clustering pattern observed from score plot of PCA is comparable with the pattern obtained from quantitative data especially with D2 analysis. Contribution of characters towards variance obtained D2 analysis and PCA showed similar results. From the study it is possible to compare different methods and exclude inappropriate methods. Groups formed from modified Tocher method and PCA are different from other methods. SD index indicated that UPGMA under Squared Euclidean distance is the best for quantitative data.