Cluster analysis of mixed Data: A genetic algorithm approach

HOODA, B.KSumbherwal, Nisha2022-07-162022-07-162021-09https://krishikosh.egranth.ac.in/handle/1/5810185503The present study deals with the problem of clustering of pearl millet, wheat and cotton genotypes with mixed variables. Mixed variables data which is combination of continuous and categorical variables occurs frequently in fields such as medical, agriculture, remote sensing, biology, marketing, ecology etc., but a little work has been done for dealing with such type of data. The study used secondary data on pearl millet, wheat and cotton crops comprised 60, 120 and 218 genotypes respectively. The data on pearl millet and wheat genotypes during kharif and rabi season 2018-19 respectively were obtained from the Department of Genetics and Plant breeding at CCS Haryana Agriculture University, Hisar, Haryana and the data on cotton crop was taken from Ph.D. thesis (Mohan, 2005), available on Krishikosh. Various clustering methods on numeric, categorical and mixed variables data were studied and are explained in detail. The performance of genetic algorithm based clustering method on numeric, categorical and mixed variables data was compared with conventional clustering methods. It was found that genetic algorithm based clustering method performed better than other clustering methods for the types of dataset (i.e. numeric, categorical and mixed). The optimal number of clusters for pearl millet, wheat and cotton data was obtained as 2, 2 and 3 respectively. Agglomerative hierarchical cluster analysis of mixed variables datasets were carried out using Gower, Podani, Huang, Ahmad and Harikumar distances. It was found that in case of pearl millet and wheat data, the performance of agglomerative hierarchical clustering methods using Podani distance performed better as compared to agglomerative hierarchical clustering methods using other distance measures. And for cotton data agglomerative hierarchical clustering methods using Ahmad distance performed better. The cluster analysis of numeric and mixed variables data were carried out using Ward‟s method and it was found that Ward‟s method using mixed variables data performed better under most of the cluster validation measures than by using only numeric variables.EnglishCluster analysis of mixed Data: A genetic algorithm approachThesis