Variable selection for classification and discrimination of Indian Mustard (Brassica juncea) genotypes for yield and oil content

Loading...
Thumbnail Image
Date
2019-07-10
Journal Title
Journal ISSN
Volume Title
Publisher
CCSHAU, Hisar
Abstract
The present study deals with the problem of variable selection for classification and discrimination of Indian Mustard (Brassica juncea) genotypes for yield and oil content. The study used secondary data on 310 Indian mustard genotypes obtained from Oilseeds section of the department of Genetics and Plant Breeding, CCS HAU, Hisar. The experiment was conducted during rabi season of 2015-16. Five variable selection methods (Univariate Two-Sample t-test, Rao´s F test for Additional Information, STEPDISC Procedure (backward and forward) using Wilk´s Lambda criterion and Random Forests Algorithm) for classification and discrimination were compared using Monte Carlo simulation. Performance of the methods was assessed in terms of leave one out cross validation error for classification. Comparing the performance of various methods affecting seed yield for samples of equal sizes in scheme I, Rao's F test, Wilkˊs lambda (Backward) and Wilkˊs lambda (Forward) were found better than others. In scheme II, the most suitable methods affecting oil content with least leave one out cross validation error rate were Wilkˊs lambda (Backward) and Wilkˊs lambda (Forward). Based on results of the scheme I and II, Wilk´s Lambda (backward and forward) were found most suitable method for classification affecting the seed yield and oil content significantly. In scheme I using leave one out cross validation error rate four important variables for discrimination affecting the seed yield per plants were secondary branches, primary branches, days to maturity and siliqua number on main shoot with least error of rate of 21.72 per cent. The important variables for discrimination which significantly affected the oil content were siliqua length, Secondary branches, primary branches and days to maturity with least error rate of 33.90 per cent. Secondary branches, siliqua number on main shoot, seeds per siliqua and 1000 seed weight were found to be important variables in scheme III with least error rate of 27.68 per cent. Three characters which discriminate the groups having low seed yield and high seed yield were 1000 seed weight, siliqua length and seeds per siliqua, while siliqua length 1000 seed weight and primary branches were found the most discriminating variables affecting oil content. Using the correlation between variables and discriminant score, the most important variables affecting the seed yield were secondary branches, primary branches and days to maturity. The three most important variables discriminating between oil content were siliqua length, secondary branches and seeds per siliqua. Most important variables discriminating between low seed yield with low oil content and high seed yield with high oil content groups were secondary branches, primary branches and siliqua number of main shoot. The variable, number of secondary branches have been found to be the most important for classification and discrimination of Indian mustard genotypes for seed yield and oil content.
Description
Keywords
Citation
Collections