Marzieh Sadat Neiband
Abstract
Feature selection is crucial in Quantitative Structure-Activity Relationship (QSAR) studies, enhancing learning algorithms’ performance and reducing computational costs. This study evaluates the impact of eight variable selection methods on the classification of isoform-selective ligands for Bcl-2 ...
Read More
Feature selection is crucial in Quantitative Structure-Activity Relationship (QSAR) studies, enhancing learning algorithms’ performance and reducing computational costs. This study evaluates the impact of eight variable selection methods on the classification of isoform-selective ligands for Bcl-2 and Bcl-xL targets using three machine learning techniques: Supervised Kohonen Network (SKN), Support Vector Machine (SVM), and Partial Least Squares Discriminant Analysis (PLS-DA). Classification models were assessed using confusion matrix parameters, 10-fold Venetian blind cross-validation, and test sets.The results show that PLS-DA and SVM have comparable classification capabilities, outperforming SKN. However, PLS-DA occasionally leaves some ligands unassigned, making SVM a more robust and efficient choice. Despite using different variable selection methods, no clear advantage was found for any specific method, with all achieving around 70% classification accuracy in validation and test series. This suggests that the choice of variable selection method does not consistently affect outcomes across all techniques.Ensuring the reliability of selected variables involves meticulous data quality assessments, literature review, and robust cross-validation. Eliminating redundant features is essential for accurate classification models, as many physicochemical properties may be irrelevant to target bioactivity. While no single method guarantees superior models, selecting important variables is vital for extracting relevant features. This study highlights the importance of careful variable selection in QSAR studies, emphasizing its role in reducing dimensionality and improving model interpretability. Ultimately, this enhances drug discovery efficiency by identifying safer and more effective compounds, reducing time and cost.