Cluster analysis of mixed Data: A genetic algorithm approach

dc.contributor.advisorHOODA, B.K
dc.contributor.authorSumbherwal, Nisha
dc.date.accessioned2022-07-16T02:45:03Z
dc.date.available2022-07-16T02:45:03Z
dc.date.issued2021-09
dc.description.abstractThe present study deals with the problem of clustering of pearl millet, wheat and cotton genotypes with mixed variables. Mixed variables data which is combination of continuous and categorical variables occurs frequently in fields such as medical, agriculture, remote sensing, biology, marketing, ecology etc., but a little work has been done for dealing with such type of data. The study used secondary data on pearl millet, wheat and cotton crops comprised 60, 120 and 218 genotypes respectively. The data on pearl millet and wheat genotypes during kharif and rabi season 2018-19 respectively were obtained from the Department of Genetics and Plant breeding at CCS Haryana Agriculture University, Hisar, Haryana and the data on cotton crop was taken from Ph.D. thesis (Mohan, 2005), available on Krishikosh. Various clustering methods on numeric, categorical and mixed variables data were studied and are explained in detail. The performance of genetic algorithm based clustering method on numeric, categorical and mixed variables data was compared with conventional clustering methods. It was found that genetic algorithm based clustering method performed better than other clustering methods for the types of dataset (i.e. numeric, categorical and mixed). The optimal number of clusters for pearl millet, wheat and cotton data was obtained as 2, 2 and 3 respectively. Agglomerative hierarchical cluster analysis of mixed variables datasets were carried out using Gower, Podani, Huang, Ahmad and Harikumar distances. It was found that in case of pearl millet and wheat data, the performance of agglomerative hierarchical clustering methods using Podani distance performed better as compared to agglomerative hierarchical clustering methods using other distance measures. And for cotton data agglomerative hierarchical clustering methods using Ahmad distance performed better. The cluster analysis of numeric and mixed variables data were carried out using Ward‟s method and it was found that Ward‟s method using mixed variables data performed better under most of the cluster validation measures than by using only numeric variables.en_US
dc.identifier.urihttps://krishikosh.egranth.ac.in/handle/1/5810185503
dc.keywordsMixed data, Unsupervised learning, Cluster validity measures, Distance measures, Cluster analysisen_US
dc.language.isoEnglishen_US
dc.pages100 + ixen_US
dc.publisherCCSHAU, Hisaren_US
dc.subStatisticsen_US
dc.themeCluster analysis of mixed Data: A genetic algorithm approachen_US
dc.these.typePh.Den_US
dc.titleCluster analysis of mixed Data: A genetic algorithm approachen_US
dc.typeThesisen_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
CCSHAU-376207-Sumbherwal, Nisha.pdf
Size:
2.7 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections