The Technology Headlines

Decoding Data Science: Machine Learning

Decoding Data Science: Machine Learning

As we’ve come to know, data science is a vast domain wherein multiple processes are involved to arrive at the required answer. We have seen that statistics and math, which is one of the crucial components, is responsible for the analytics phase in data science. Now, once the analysis phase is over, the data takes a clearer form at which point it is ready to head into the modeling phase. As you would expect, this too involves performing several sub processes. But, at the heart of it all, the technology that is the primary impetus behind the whole modeling phase is machine learning. 

What  is machine learning? Machine learning is an application of artificial intelligence that enables computers to find patterns in the data and generate models based on them. When new data is provided to these models, they recognize the patterns and use them to help the computer make decisions autonomously. This way they can make decisions on their own without any need for explicit programming to carry out tasks. 

Machine Learning ( ML) Pipeline

Data scientists create a pipeline for data as it flows through their ML solution. This is a two-way iterative pipeline wherein each step of the pipeline is fed data from its preceding step. 

The key stages of this pipeline are as follows:

  • We first define the concerned business problem.
  • We then gather relevant data.
  • The data is then prepped to make it more readable as most of it would be unstructured. This is when we analyze the data using statistical functions to extract information which is then used later to make models of the data.
  • Now, we split the subsets of data to train and test the model.
  • The ML algorithm is then used to recognize patterns in the training data.
  • We then iteratively assess the performance of the model to understand how accurate the predictions are.
  • When testing is done, the chosen model is embedded in decision-making frameworks to help make decisions.
  • The model is also continuously monitored to assess its real world performance, and updates are made with every new scenario it encounters. 


ML algorithms are also broadly divided into two categories: supervised and unsupervised. We use supervised learning techniques when the value we want to predict is actually present in the dataset making the process of prediction simpler. On the other hand, unsupervised techniques are used when the required value is not present in the dataset. This scenario makes it a lot more challenging as the algorithm is tasked with finding hidden patterns as opposed to looking for something that is already present.  


There are various techniques one can use to deploy learning techniques. Here are some popular examples that are preferred by most data scientists around the world. 

  • Classification (a subcategory of supervised learning): This technique splits the input into categories and makes predictions based on them. It is useful for detecting anomalies in the data; for example, it could be used to differentiate between regular and spam mail, or answer “yes or no” questions, etc. 
  • Regression (a subcategory of supervised learning): This technique is used to predict continuous responses, i.e., they are generally used to answer questions like “how much” or “how many”. Changes in price, fluctuations in temperature, and many more can be predicted using this technique. 
  • Clustering: This is the process of using exploratory data analysis to find hidden patterns. It can be used for scenarios such as customer segregation, market research, etc. 


Following the pattern of our previous article just like an ML algorithm, we have kept this introduction to another important component of data science short and sweet. Machine learning is arguably one of the most exciting prospects in technology. With a wide variety of applications ranging from medical diagnosis to market analysis, it is no doubt that machine learning will help mankind take the next step in furthering its evolution.