Posts

Showing posts with the label deep learning

Decision Tree Algorithm – A Complete Guide

Image
  This article was published as a part of the  Data Science Blogathon Introduction Till now we have learned about linear regression, logistic regression, and they were pretty hard to understand. Let’s now start with Decision tree’s and I assure you this is probably the easiest algorithm in Machine Learning. There’s not much mathematics involved here. Since it is very easy to use and interpret it is one of the most widely used and practical methods used in Machine Learning. Contents 1. What is a Decision Tree? 2. Example of a Decision Tree 3. Entropy 4. Information Gain 5. When to stop Splitting? 6. How to stop overfitting? max_depth min_samples_split min_samples_leaf max_features 7. Pruning, Post-pruning Pre-pruning 8. Endnotes What is a Decision Tree? It is a tool that has applications spanning several different areas. Decision trees can be used for classification as well as regression problems. The name itself suggests that it uses a flowchart like a tree structure to show t...

Churn Prediction- Commercial use of Data Science download Share

Image
Introduction Churn prediction is probably one of the most important applications of data science in the commercial sector. The thing which makes it popular is that its effects are more tangible to comprehend and it plays a major factor in the overall profits earned by the business. Let’s get started! What exactly is Churn Prediction? Churn is defined in business terms as ‘when a client cancels a subscription to a service they have been using.’ A common example is people cancelling Spotify/Netflix subscriptions. So, Churn Prediction is essentially predicting which clients are most likely to cancel a subscription i.e ‘leave a company’ based on their usage of the service. From a company point of view, it is necessary to gain this information because acquiring new customers is often arduous and costlier than retaining old ones. Hence, the insights gained from Churn Prediction helps them to focus more on the customers that are at a high risk of leaving. The output in the case of Churn predi...

Dimensionality Reduction in Machine Learning

Image
  Machine learning algorithms are typically able to extract important information from feature-rich datasets, whether they are tables with many columns and rows or images with millions of pixels. Couple this with breakthroughs in cloud computing, and the result is that larger and larger machine learning models can be run with great power. However, each feature that is added increases the complexity of the executable, which makes locating information using these powerful algorithms also complicated. The solution is dimensionality reduction, which consists of employing a set of techniques to remove excessive and unneeded features from Machine Learning models. Dimensionality reduction also severely reduces the cost of machine learning and allows complex problems to be solved with simple models. This technique is especially useful in predictive models, as they are datasets that contain a large number of input features, and makes their function more complicated. What is Dimensionality?...

A Comprehensive Guide to Data Exploration

Image
  Overview A complete tutorial on data exploration (EDA) We cover several data exploration aspects, including missing value imputation, outlier removal, and the art of feature engineering Introduction There are no shortcuts for data exploration. If you are in a state of mind, that machine learning can sail you away from every data storm, trust me, it won’t. After some point in time, you’ll realize that you are struggling at improving the model’s accuracy. In such a situation, data exploration techniques will come to your rescue. I can confidently say this because I’ve been through such situations, a lot. 1. Steps of Data Exploration and Preparation Remember the quality of your inputs decide the quality of your output. So, once you have got your business hypothesis ready, it makes sense to spend lot of time and efforts here. With my personal estimate, data exploration, cleaning and preparation can take up to 70% of your total project time. Below are the step...