Updated: Apr 12
Machine learning is all about learning the relationship between dependent and independent variables. A significant part of any ML project is spent in Exploratory Data Analysis, Feature Engineering, etc. Another essential process in building any model is Hyperparameter Tuning. It is all about tuning the Hyperparameters so that the Machine learns to build a robust model that generalizes well on out-of-sample data. It is also important to note that there is a difference between Hyperparameters and Parameters. Hyperparameters are something that we pass to the model so that it learns the relationship between the features. For example, in a Random Forest model, the maximum depth we pass to the model is an example of a hyperparameter. For a machine to learn parameters, data is needed to estimate the model parameters. For example, in a Linear Regression model, the model coefficients are the parameters.
Apart from good feature engineering, tuning the hyperparameters can cause a significant improvement in the model that we build. Although most advanced machine learning models such as Bagging (Random Forests) and Boosting (XGBoost, LightGBM, etc.) are optimized for the best hyperparameters by default, sometimes tuning them can help build a better model. In this tutorial, we will see some of the commonly used methods to tune hyperparameters. Some of the common methods used are,
1. Manual Hyperparameter Tuning.
2. Grid Search.
3. Random Search.
4. Automated Hyperparameter Tuning using Bayesian Optimization.
Manual Tuning of the hyperparameters can be time-consuming, and, in most cases, we might need domain knowledge to find the best hyperparameters to pass to the model. It might not always lead to a better-performing model. Grid Search is useful when we have narrowed down our hyperparameters to a set of a few values, and we want to find the best set out of them. It can be time-consuming most of the time. A random search is used to find the best set of hyperparameters out of a large collection of hyperparameters. It performs a few random searches and gives the best set. Most of the time, random search is performed first, and then the hyperparameters are further narrowed down using grid search. Of all the methods described above, tuning the hyperparameters using Bayesian Optimization is the most effective as it uses probabilities to find out the best set of hyperparameters.
We will now go through each of the methods described above in detail, along with the Python and Scikit-Learn code to implement them. Throughout the article, we will be using basic to advanced machine learning models to explain the tuning methods, and the dataset used will be a dummy dataset created from the sci-kit-learn library.
Manual Hyperparameter Tuning
Manual hyperparameter tuning is an iterative method wherein different combinations of hyperparameters are set and models are built and then evaluated for performance. This is a tedious and time-consuming process, and often, the results are not that encouraging. It is not feasible to tune hyperparameters manually in the case of complex models like LightGBM or Neural Networks that have many hyperparameters.
Before starting the tuning process, it is good to know a few hyperparameters of Random Forests and what they mean. Some of them are explained in short below.
● n_estimators: It defines the number of trees we build in the forest. The larger the trees, the better the generalization, although using a substantial number of trees might slow down the model building process.
● max_depth: It defines the maximum depth at which an individual tree in a forest can be built. A large value can cause overfitting.
● max_features: It defines the number of features available for splitting while building a tree in a forest. It is important to note that if you have all the features available for splitting, each tree will almost give the same result.
● min_samples_leaf: It is the minimum number of samples needed in a leaf. If the number of samples in a leaf is less than the specified value, it is not split further. This can cause overfitting if the value is set at 1, in which case the number of leaves in a tree will be equal to the number of data points.
●min_samples_split: This hyperparameter can be tuned with the min_samples_leaf parameter described above. It defines the splitting criteria: if the number of samples in a node is less than this value, it is not split further.
With the critical hyperparameters in a Random Forest defined, let us get to hyperparameter tuning. Explaining hyperparameters of all machine learning models in one article is difficult. Only important hyperparameters in Random Forest are explained. To demonstrate how manual hyperparameter tuning works, we will vary the n_estimators parameter, keeping all other variables constant, and see how the model performs. When we do the manual tuning of the n_estimators, we obtain the graph above. We can see that the accuracy score increases for some time, but then it starts to decrease.
So, manual tuning of hyperparameters is not recommended when you have a lot of them to tune and you are not sure which ones give the best performance. It is difficult, time-consuming, and does not always give the best results. A Python implementation of manual hyperparameter tuning can be found here.
Hyperparameter Tuning with Random Search
Hyperparameters can also be tuned using the Random search method using the RandomizedsearchCV library from sci-kit-learn. It is easier and works faster than Manual tuning when you have a lot of parameters. It searches for a few combinations of hyperparameters and displays the best result.
It also has an inbuilt cross-validation framework that performs stratified K-fold cross-validation in the case of classification problems and normal K-fold in the case of regression problems. Using stratified K-fold cross-validation in classification problems helps in case the dependent variable is imbalanced, in which case the distribution of class labels within each fold should be the same.
Random search is always better than manual tuning because you can search for a wide range of parameters, whereas in manual tuning, you must search one by one. While performing a Random search, it is not necessary to split the data into training and validation sets as the library has a built-in cross-validation framework. It also ensures that the parameters obtained are not overfitting to the training data and perform poorly on the validation sets. In random search, we pass the hyperparameters either in the form of a grid or in the form of distributions with key-value pairs.
Once the Random Search model is trained, we can also check the best estimator, the best set of parameters that the model has learned, and the best score using the built-in methods. An implementation of randomized search in Python can be found here.
Hyperparameter Tuning with Grid Search
Grid Search is another method using which we can tune the hyperparameters effectively. Although it is not as fast as the random search, it often gives better results when the range of hyperparameters is not that large, meaning when we have fewer parameters to tune or fewer values to check. Like the random search method, we pass the parameters in the form of a grid with key-value pairs.
The only difference in grid search is that every combination of hyperparameters is searched instead of a few random combinations. This leads to increased time complexity in training the model.
Grid search also has a built-in cross-validation framework that performs stratified K-Fold cross-validation in the case of classification problems and normal K-fold cross-validation in the case of regression problems. The following figure clearly explains the difference between the grid search and random search methods.
A Python implementation of the Grid search method can be found here.
Hyperparameter Tuning with Bayesian Optimization
Another important hyperparameter tuning method is using the Bayesian optimization technique. This is different from the grid search or random search methods we just saw. It is an advanced and automated hyperparameter tuning technique. It uses probabilities to find the best value for the hyperparameter. Several libraries use Bayesian searches, such as Hyperopt, Optuna, and Scikit-optimize. In this tutorial, we will focus on the Hyperopt library.
Hyperopt is a large-scale hyperparameter optimization library created by James Bergstra that uses Bayesian optimization techniques to get the best parameters. There are three essential functions that the Hyperopt library uses:
● Objective Function.
● The Fmin function.
● Search Space.
An objective function is where we define the model and the cross-validation framework, etc. We also define the loss function that we would like to minimize. The search space is where we define the set of hyperparameters and their values. It should also be noted that we do not define the search space as we do in grid search or random search; it is done using the hp function from the Hyperopt library. The Fmin function is where we combine everything, the objective, search space, the type of search algorithm, the number of evaluations to be performed, etc. The Tree Parzen Estimator is a commonly used search method, but we can also use random search and some other methods.
Hyperopt also has a Trials function that stores the output of the fmin function, such as the list of scores obtained, and the best parameters obtained. These values can be further used to evaluate our results. In short, the following is the process to run optimization using the Hyperopt library.
1. Define the search space using the hp function from the Hyperopt library.
2. Define the Objective function, cross-validation function
3. Define the fmin function.
4. Evaluate the model using the function of the trial.
5. Continue until you achieve better results.
Implementation of Hyperopt Optimization using Python can be found here.
Hyperparameter tuning is one of the crucial processes in building any machine learning model. The results may not be that encouraging if you have built good features and did not tune the hyperparameters. The methods described above are not the only methods to optimize the hyperparameters; there are plenty of others, but these are the most commonly used ones, and they deliver decent results.