Cluster analysis of mixed Data: A genetic algorithm approach

Sumbherwal, Nisha

Cluster analysis of mixed Data: A genetic algorithm approach

dc.contributor.advisor	HOODA, B.K
dc.contributor.author	Sumbherwal, Nisha
dc.date.accessioned	2022-07-16T02:45:03Z
dc.date.available	2022-07-16T02:45:03Z
dc.date.issued	2021-09
dc.description.abstract	The present study deals with the problem of clustering of pearl millet, wheat and cotton genotypes with mixed variables. Mixed variables data which is combination of continuous and categorical variables occurs frequently in fields such as medical, agriculture, remote sensing, biology, marketing, ecology etc., but a little work has been done for dealing with such type of data. The study used secondary data on pearl millet, wheat and cotton crops comprised 60, 120 and 218 genotypes respectively. The data on pearl millet and wheat genotypes during kharif and rabi season 2018-19 respectively were obtained from the Department of Genetics and Plant breeding at CCS Haryana Agriculture University, Hisar, Haryana and the data on cotton crop was taken from Ph.D. thesis (Mohan, 2005), available on Krishikosh. Various clustering methods on numeric, categorical and mixed variables data were studied and are explained in detail. The performance of genetic algorithm based clustering method on numeric, categorical and mixed variables data was compared with conventional clustering methods. It was found that genetic algorithm based clustering method performed better than other clustering methods for the types of dataset (i.e. numeric, categorical and mixed). The optimal number of clusters for pearl millet, wheat and cotton data was obtained as 2, 2 and 3 respectively. Agglomerative hierarchical cluster analysis of mixed variables datasets were carried out using Gower, Podani, Huang, Ahmad and Harikumar distances. It was found that in case of pearl millet and wheat data, the performance of agglomerative hierarchical clustering methods using Podani distance performed better as compared to agglomerative hierarchical clustering methods using other distance measures. And for cotton data agglomerative hierarchical clustering methods using Ahmad distance performed better. The cluster analysis of numeric and mixed variables data were carried out using Ward‟s method and it was found that Ward‟s method using mixed variables data performed better under most of the cluster validation measures than by using only numeric variables.	en_US
dc.identifier.uri	https://krishikosh.egranth.ac.in/handle/1/5810185503
dc.keywords	Mixed data, Unsupervised learning, Cluster validity measures, Distance measures, Cluster analysis	en_US
dc.language.iso	English	en_US
dc.pages	100 + ix	en_US
dc.publisher	CCSHAU, Hisar	en_US
dc.sub	Statistics	en_US
dc.theme	Cluster analysis of mixed Data: A genetic algorithm approach	en_US
dc.these.type	Ph.D	en_US
dc.title	Cluster analysis of mixed Data: A genetic algorithm approach	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: CCSHAU-376207-Sumbherwal, Nisha.pdf
Size:: 2.7 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses