Posts

Showing posts with the label steps in EDA(ML)

Churn Prediction- Commercial use of Data Science download Share

Image
Introduction Churn prediction is probably one of the most important applications of data science in the commercial sector. The thing which makes it popular is that its effects are more tangible to comprehend and it plays a major factor in the overall profits earned by the business. Let’s get started! What exactly is Churn Prediction? Churn is defined in business terms as ‘when a client cancels a subscription to a service they have been using.’ A common example is people cancelling Spotify/Netflix subscriptions. So, Churn Prediction is essentially predicting which clients are most likely to cancel a subscription i.e ‘leave a company’ based on their usage of the service. From a company point of view, it is necessary to gain this information because acquiring new customers is often arduous and costlier than retaining old ones. Hence, the insights gained from Churn Prediction helps them to focus more on the customers that are at a high risk of leaving. The output in the case of Churn predi...

Dimensionality Reduction in Machine Learning

Image
  Machine learning algorithms are typically able to extract important information from feature-rich datasets, whether they are tables with many columns and rows or images with millions of pixels. Couple this with breakthroughs in cloud computing, and the result is that larger and larger machine learning models can be run with great power. However, each feature that is added increases the complexity of the executable, which makes locating information using these powerful algorithms also complicated. The solution is dimensionality reduction, which consists of employing a set of techniques to remove excessive and unneeded features from Machine Learning models. Dimensionality reduction also severely reduces the cost of machine learning and allows complex problems to be solved with simple models. This technique is especially useful in predictive models, as they are datasets that contain a large number of input features, and makes their function more complicated. What is Dimensionality?...

A Comprehensive Guide to Data Exploration

Image
  Overview A complete tutorial on data exploration (EDA) We cover several data exploration aspects, including missing value imputation, outlier removal, and the art of feature engineering Introduction There are no shortcuts for data exploration. If you are in a state of mind, that machine learning can sail you away from every data storm, trust me, it won’t. After some point in time, you’ll realize that you are struggling at improving the model’s accuracy. In such a situation, data exploration techniques will come to your rescue. I can confidently say this because I’ve been through such situations, a lot. 1. Steps of Data Exploration and Preparation Remember the quality of your inputs decide the quality of your output. So, once you have got your business hypothesis ready, it makes sense to spend lot of time and efforts here. With my personal estimate, data exploration, cleaning and preparation can take up to 70% of your total project time. Below are the step...

Steps in Exploratory Data Analysis (EDA)

Image
Exploratory Data Analysis, or EDA, is an important step in any Data Analysis or Data Science  project. E D A is the process of investigating the dataset to discover patterns, and anomalies (outliers), and form hypotheses based on our understanding of the dataset. EDA involves generating summary statistics for numerical data in the dataset and creating various graphical representations to understand the data better. In this article, we will understand EDA with the help of an example dataset. We will use  Python  language ( Pandas  library) for this purpose. Importing libraries We will start by importing the libraries we will require for performing EDA. These include NumPy, Pandas, Matplotlib, and Seaborn. import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline import seaborn as sns Reading data We will now read the data from a CSV file into a Pandas DataFrame. You can  download the dataset  for your reference. df = pd.read_c...