Posts

Showing posts with the label machine learning

Step by Step process of Feature Engineering for Machine Learning Algorithms in Data Science

Image
  Introduction Data Science is not a field where theoretical understanding helps you to start a carrier. It totally depends on the projects you do and the practice you have done that determines your probability of success. Feature engineering is a very important aspect of machine learning and data science and should never be ignored. The main goal of Feature engineering is to get the best results from the algorithms. Table of Contents Why should we use Feature Engineering in data science? Feature Selection Handling missing values Handling imbalanced data Handling outliers Binning Encoding Feature Scaling 1. Why should we use Feature Engineering in data science? In Data Science, the performance of the model is depending on data preprocessing and data handling. Suppose if we build a model without Handling data, we got an accuracy of around 70%. By applying the Feature engineering on the same model there is a chance to increase the performance from 70% to more. Simply, by using Featur...

Decision Tree Algorithm – A Complete Guide

Image
  This article was published as a part of the  Data Science Blogathon Introduction Till now we have learned about linear regression, logistic regression, and they were pretty hard to understand. Let’s now start with Decision tree’s and I assure you this is probably the easiest algorithm in Machine Learning. There’s not much mathematics involved here. Since it is very easy to use and interpret it is one of the most widely used and practical methods used in Machine Learning. Contents 1. What is a Decision Tree? 2. Example of a Decision Tree 3. Entropy 4. Information Gain 5. When to stop Splitting? 6. How to stop overfitting? max_depth min_samples_split min_samples_leaf max_features 7. Pruning, Post-pruning Pre-pruning 8. Endnotes What is a Decision Tree? It is a tool that has applications spanning several different areas. Decision trees can be used for classification as well as regression problems. The name itself suggests that it uses a flowchart like a tree structure to show t...