Heart Disease Risk Prediction

Overview

This project aims to predict heart disease risk using real health data. The project included data cleaning, variable transformation, exploratory analysis, and building machine learning models for patient classification.

91.1%

Logistic Regression

89.8%

Random Forest

16,859

Total Records

17

Features

Dataset Overview

The dataset contains 16,859 records with 17 features including BMI, smoking status, physical and mental health, physical activity, sleep patterns, and chronic diseases.

Class Distribution

15,205 without heart disease
1,654 with heart disease

Data Split

Training: 13,487
Testing: 3,372

Problem Statement

Early prediction of heart disease risk can save lives through preventive interventions. The goal was to build accurate classification models to identify patients at risk based on health indicators and lifestyle factors.

Key Questions:

What health factors are most predictive of heart disease?
Can we achieve high accuracy with different ML algorithms?
How can these insights improve preventive healthcare?

Exploratory Insights

Key Factors Identified:

Age Group: Strong correlation with heart disease risk
BMI: Higher BMI increases risk
Sleep Quality: Poor sleep patterns linked to higher risk
Physical Health: Days of poor physical health matter

Model Performance

91.1%

Logistic Regression

89.8%

Random Forest

84.4%

Decision Tree

Feature Importance (Top Factors):

BMI (Body Mass Index)
Age Group
Sleep Quality
Physical Health Days
Mental Health Days

Business Impact

Early Intervention: Identify at-risk patients before symptoms appear
Preventive Recommendations: Personalized lifestyle suggestions
Healthcare Optimization: Better resource allocation for high-risk patients

Conclusion

The results highlight the importance of health indicators and lifestyle patterns in early heart disease prediction, enabling preventive recommendations and improved healthcare outcomes.