How to create fast and reproducible machine learning models with steppy? – Analytics India Magazine

Posted: May 5, 2022 at 1:43 am


without comments

In machine learning procedures, making pipelines and extracting the best out of them is very crucial nowadays. We can understand that for a library to provide all the best services is difficult and even if they are providing such high-performing functions then they become heavy-weighted. Steppy is a library that tries to build an optimal pipeline but it is a lightweight library. In this article, we are going to discuss the steppy library and we will look at its implementation for a simple classification problem. The major points to be discussed in the article are listed below.

Lets start with introducing the steppy.

Steppy is an open-source library that can be used for performing data science experiments developed using the python language. The main reason behind developing this library is to make the procedure of experiments fast and reproducible. Along with this, it is a lightweight library and enables us to make high-performing machine learning pipelines. Developers of this library aim to make data science practitioners focused on the data side instead of focusing on issues regarding software development.

In the above section, we have discussed what steppy is and by looking at such points we can say this library can provide an environment where the experiments are fast, reproducible, and easy. With these capabilities, this library also helps in removing the difficulties with reproducibility and provides functions that can also be used by beginners. This library has two main abstractions using which we can make machine learning pipelines. Abstractions are as follows:

Any simple implementation can make the intentions behind the development of this library clear but before all this, we need to install this library that requires Python 3.5 or above in the environment. If we have it we can install this library using the following lines of codes:

After installation, we are ready to use steppy for data science experiments. Lets take a look at a basic implementation.

In this implementation of steppy, we will look at how we can use it for creating steps in a classification task.

In this article we are going to sklearn provided iris dataset that can be imported using the following lines of codes:

from sklearn.datasets import load_iris

Lets split the dataset into train and test.

One thing that we need to perform while using steppy is to put our data into dictionaries so that the step we are going to create can communicate with each other. We can do this in the following way:

Now we are ready to create steps.

In this article, we are going to fit a random forest algorithm to classify the iris data which means for steppy we are defining random forest as a transformer.

Here we have defined some of the functions that will help in initializing random forest, fitting and transforming data, and saving the parameters. Now we can fit the above transformer into the steps in the following ways:

Output:

Lets visualize the step.

step

Output:

Here we can see what are the step we have defined in the pipeline lets train the pipeline.

We can train our defined pipeline using the following lines of codes.

Output:

In the output, we can see that what is the step has been followed to train the pipeline. Lets evaluate the pipeline with test data.

Output:

Here we can see the testing procedure followed by the library. Lets check the accuracy of the model.

Output:

Here we can see the results are good and also if you will use it anytime you will find out how light this library is.

In this article, we have discussed the steppy library which is an open-source, lightweight and easy way to implement machine learning pipelines. Along with this, we also looked at the need for such a library and implementation to create steps in a pipeline using a steppy library.

Read this article:

How to create fast and reproducible machine learning models with steppy? - Analytics India Magazine

Related Posts

Written by admin |

May 5th, 2022 at 1:43 am

Posted in Machine Learning




matomo tracker