Feature Selection Techniques in Machine Learning.

6 min readJun 12, 2023

In the field of machine learning, feature selection plays a crucial role in improving model performance, reducing computational complexity, and enhancing interpretability. Feature selection involves identifying the most relevant and informative features from a given dataset.

The best feature subset is chosen for maximum accuracy.

Let’s say you’re a pizza lover and want to create an AI-powered pizza recommendation system. You have a massive database with thousands of attributes describing each pizza, including ingredients, crust type, cheese type, sauce flavors, and more. However, including all these attributes in the recommendation system would result in a slow and resource-intensive process. By selecting only the most influential features (e.g., popular ingredients, customer preferences), you can quickly generate personalized pizza recommendations without overwhelming your AI-powered pizzeria.

The goal of feature selection is to improve the performance of a machine learning model by reducing the dimensionality of the input data and removing irrelevant or redundant features.

There are several common techniques for feature selection:

1.Filter Methods

The filter method uses statistical measures to assess the relevance of features independently of any machine learning algorithm. It involves ranking features based on their individual characteristics. Common statistical measures used in the filter method include correlation, information gain, and chi-square tests. Features are selected based on predefined thresholds or by selecting the top-k highest ranked features.

This is an output for finding the best features using information gain.

Consider a booking cancellation prediction system. The filter method can be used to identify the most relevant features by measuring the information gain of each features to the target variable (booking cancelled or not). Features with high information gain are likely to have a significant impact on booking cancellation system and should be selected.

2. Wrapper method.

This feature selection process is based on a specific machine learning algorithm we are trying to fit on a given dataset. It follows a greedy search approach by evaluating all the possible combinations of features against the evaluation criterion. The wrapper methods usually result in better predictive accuracy than filter methods. It treats feature selection as a search problem, exploring various subsets of features and selecting the one that yields the best model performance. Wrapper methods are computationally expensive but can capture feature interactions and provide accurate predictions.

There are 3 methods in Wrapper Methods:

a. Forward Selection: Forward selection is an iterative method in which we start with having no feature in the model. In each iteration, we keep adding the feature which best improves our model till an addition of a new variable does not improve the performance of the model.

b. Backward Elimination: In backward elimination, we start with all the features and removes the least significant feature at each iteration which improves the performance of the model. We repeat this until no improvement is observed on removal of features.

c. RFE [Recursive Feature Elimination]: is used for feature selection, It Eliminates the feature which having lowest coefficient. It repeatedly creates models and keeps aside the best or the worst performing feature at each iteration. It constructs the next model with the left over features until all the features are exhausted. It then ranks the features based on the order of their elimination.

Imagine a spam email classification task. The wrapper method can be employed by training and evaluating the model on different combinations of features such as word frequency, email length, sender’s reputation, and attachment presence. The subset of features that achieves the highest accuracy or lowest error rate is selected as the optimal feature subset.

3. Embedded Method

These methods encompass the benefits of both the wrapper and filter methods by including interactions of features.

Embedded methods perform feature selection as an integral part of the machine learning algorithm. These methods select features while training the model, considering their importance in the learning process. Techniques like regularization, decision trees, and gradient boosting inherently perform feature selection by assigning weights or importance scores to features during training.

Lasso Regularization consists of adding a penalty to the different parameters of the machine learning model to reduce the freedom of the model, i.e., to avoid over-fitting. The penalty term promotes the selection of features by shrinking the coefficients towards zero. By setting some coefficients to zero, the Lasso effectively performs feature selection by excluding those features from the model. This property makes Lasso particularly useful when dealing with high-dimensional datasets, where only a subset of features may be relevant.

The Lasso model’s ability to set coefficients to zero allows for automatic feature selection and can be helpful in situations where there is a need to identify a sparse set of influential features.

Suppose you are working on a housing price prediction problem and have a dataset with multiple features such as the number of rooms, square footage, location, and various amenities. By applying Lasso regression, you can determine which features have the most significant impact on housing prices.

For example, after training a Lasso regression model, you may find that the number of rooms, square footage, and location are the most influential features, while other features like amenities have coefficients close to zero. This indicates that amenities have little impact on housing prices and can be excluded from the model.

The feature_importances_ feature in Random Forest help identify which features are most important in the Random Forest model’s decision-making process.

In the context of feature selection, feature_importances_ can guide the selection of the most relevant features for a Random Forest model. By examining the feature_importances_, you can identify the features that contribute the most to the model’s predictive power. This information can help you focus on the most informative features and potentially discard less important ones.

You can use feature_importances_ from a Random Forest model to select the top-K most important features. By sorting the features based on their importance's in descending order, you can select the K features with the highest importance and discard the rest.

We can see that after using feature_importances_, the importance’s of features are different from information gain method.

4. Hybrid Method

The hybrid method combines feature selection techniques with dimensionality reduction methods like Principal Component Analysis (PCA) and Singular Value Decomposition (SVD). These methods transform the original features into a new set of uncorrelated variables (principal components) while retaining most of the information. By selecting the top-k principal components, feature selection and dimensionality reduction are achieved simultaneously.

Let’s say you are working on a facial recognition system. The hybrid method can be employed by applying PCA to extract the most important facial features (e.g., eyes, nose, and mouth) from a large set of raw pixel data. By selecting the top-k principal components, you reduce the dimensionality and retain the most discriminative facial features for accurate recognition.

Conclusion

We don’t need all the features for model building. Only the optimal number will do. Just like we only need the right set of people around us to stay happy.

Feature selection is a crucial step in machine learning to improve model performance, reduce complexity, and enhance interpretability. By selecting only the most relevant and easily understandable features (e.g., weather conditions), we can create a simplified and more accurate machine learning models. Excelling in life can be compared to feature selection, as it involves recognizing and prioritizing your key strengths, managing your time and energy effectively, investing in personal growth and learning, and surrounding yourself with supportive influences.

Machine-Learning/Hotel Cancellation prediction with Feature Selection.ipynb at main · anirudh998/Machine-Learning (github.com)

This is the link to my project where I’ve done prediction for a classification problem with all the types of feature selection.

BECOME a WRITER at MLearning.ai //FREE ML Tools// AI Film Critics

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com

Feature Selection Techniques in Machine Learning.

BECOME a WRITER at MLearning.ai //FREE ML Tools// AI Film Critics

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

Written by Anirudh S