Skip to content

Lacerdash/ML-for-Churn-predicting

Repository files navigation

Machine Learning for Churn Prediction

Churn prediction is crucial for businesses to identify potential customers who are likely to discontinue using their service. This repository contains my project aiming to tackle this issue.

🚀 Click Here to Access the App! 🌟

Project Overview

Objectives: To identify potential churn customers and understand the associated patterns, thus enabling businesses to take proactive measures to retain them.

Data: The data is avaible here and the data dictionary here

Structure: The project is in 3 parts:

  1. Extract, Transform and load (ETL) and Exploratory Data Analysis (EDA)
  2. Creating, Selecting and Optimizing models
  3. Creating and Streamlit app to deploy our model

1 - Extract, Transform and Load (ETL) and Exploratory Data Analysis (EDA)

ETL

  • The dataset, in JSON format, is imported into Python and undergoes transformation and cleaning. After processing, the data is saved to a csv file.

EDA

  • Post-ETL, various visualizations are generated to delve deeper into the patterns within the data, identify potential problems, and better understand the overall structure of the dataset.

Challenges faced:

  • Handling missing values
  • Data enconding
  • Correcting data types,
  • Ploting relevants graphs for analysis
  • Ceation of functions in a python file (helper.py) for a more clean notebook.

Detailed documentation of the ETL and EDA processes can be found in this notebook, which covers data cleaning, handling missing values, data exploration, and preliminary analysis.


2 - Creating, Selecting and Optimizing models

The second part of the project is dedicated to:

  • Process the encoded_churn_data.csv file generated in the first part: "1 - Extract, Transform and Load" to create and compare models"
    • Creating new columns
    • Scaling data
    • Balacing data
  • Creating Baseline Models
    • Creating 9 models (Decision Tree Regressor, Random Forest Regressor, Logistic Regression, KNeighborsClassifier, SVC, GradientBoostingClassifier, GaussianNB, AdaBoostClassifier and MLPClassifier)
  • Select best model basead on choosen metric
  • Optimize best models Hyperparameters and access its results
    • Usinig nested Cross validation to peform hyperparameter tunning and model assessment (You can check my in depth notebook on Nested Cross Validation)
  • Save model

All activities performed are documented in this notebook.


3 - Streamlit app and Model deployment

To deploy the model we created a streamlit app that allows users to interact with it through an easy interface. This includes:

  • Model selection
  • Data insertion
  • Prediction of Churn probability
  • Exploratory Data Analysis tab with visualizations

Streamlit app