There are distinct phases or steps that has been carried out to build a complete machine learning model. The sequence of the phases or steps can be defined as a Machine Learning Workflow.
A brief of Machine Learning
Machine Learning is a apparatus for turning information into knowledge. Machine learning (ML) is a type of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict new output values.
Machine Learning Workflow
- Specifying Problem
- Data Preparation
- Selection of Algorithm
- Training the Model
- Testing the Model.
- Ask the right question: ML workflow starts with defining a specific question or problem with defined boundary. The right question will lead you to know about data and its preparation, identifying algorithm, testing model and overall outcome of Model. Some examples: 1. Suppose you need to predict an individual’s credit risk based on the information they gave on a credit application. Credit risk assessment is a complex problem, however ML solution can add new dimension for effective analysis. 2. A solution that will tell which tweets will get retweets.
- Data preparation: This is most important phase of Machine learning solution which absolutely depend on phase 1 i.e. Problem. Defining the problem or accurate question leads to know about data and its preparation. Almost 60% of overall time will be spent in data preparation. Data Preparation, in generally, means transforming raw data into a formatted which can be modeled using machine learning algorithms. This phase includes number of sub-steps like Data cleaning, Filtering, Manipulating, Scaling and Reduction, Sample and Splitting. Furthermore, the actions which are carried out for data cleaning or manipulating, are: adding column/rows, Clean missing data, edit metadata, join data, remove duplicate rows, categorization and many more. Another important point to note that we always split data into at least two parts: Training and Testing dataset which is also considered as a part of data preparation.
- Selecting the Algorithm: Choosing the algorithm is solely depend upon the problem (Phase 1: Question) for which we are designing ML model. There are numerous well established algorithms available and are ready to apply for machine learning solution. Anomaly Detection, Classification, Clustering, Regressions are the types of model or algorithm which are categorized based on the problems. There are, furthermore, many matured algorithms are available under each category. Some of the examples of machine learning algorithms are Linear Regression, Neural Network Regression, Two class Decision Forest, Multiclass Decision Jungle, K-means Clustering, PCA-Based Anomaly Detection etc. As a ML solution, we never work on designing or creating algorithms and this is not part of machine learning solution. Nevertheless, we only do trails with different established algorithms and find suitable one for our problem.
- Training the Model: This stage is also known as the fitting stage, where the prepared and formatted data are used in selected algorithm to train the model. This process, alternatively means the model will learn from the prepared training data.
- Testing and Evaluating the Model: As in earlier stage ( data preparation), data are divided into two parts: Training and Testing dataset. In this stage, testing data are used to check score of model and to know how well it performs. Test data are feed into the trained model and evaluate the output with actual data to know the accuracy level.
- Maintenance: This is also one the curial part to maximize the model performance where the new or recent data are again used for model and proceed through all the processes.
Most of the phases are repetitive depending on result of testing and evaluation. If the evaluation score is below the expectation, then again the process will step back by one phase and select another algorithm to process further. This is continuous process of machine learning. Sometime, we may need to jump back in data preparation phase based on evaluation.
Machine Learning workflow is a combination of the defined steps in a specific succession. It starts with defining problem and processes through Data preparation, Algorithm Selection, Training Model, Testing and Evaluation respectively. More importantly, the later phases are iterative depending upon the evaluation. Maintenance, in addition, has also great significance in machine learning performance.