Back to Projects
Overview
This project aims to predict heart disease risk using real health data. The project included data cleaning, variable transformation, exploratory analysis, and building machine learning models for patient classification.
91.1%
Logistic Regression
Dataset Overview
The dataset contains 16,859 records with 17 features including BMI, smoking status, physical and mental health, physical activity, sleep patterns, and chronic diseases.
Class Distribution
15,205 without heart disease
1,654 with heart disease
Data Split
Training: 13,487
Testing: 3,372
Problem Statement
Early prediction of heart disease risk can save lives through preventive interventions. The goal was to build accurate classification models to identify patients at risk based on health indicators and lifestyle factors.
Key Questions:
- What health factors are most predictive of heart disease?
- Can we achieve high accuracy with different ML algorithms?
- How can these insights improve preventive healthcare?
Exploratory Insights
Key Factors Identified:
- Age Group: Strong correlation with heart disease risk
- BMI: Higher BMI increases risk
- Sleep Quality: Poor sleep patterns linked to higher risk
- Physical Health: Days of poor physical health matter
Model Performance
91.1%
Logistic Regression
Feature Importance (Top Factors):
- BMI (Body Mass Index)
- Age Group
- Sleep Quality
- Physical Health Days
- Mental Health Days
Business Impact
- Early Intervention: Identify at-risk patients before symptoms appear
- Preventive Recommendations: Personalized lifestyle suggestions
- Healthcare Optimization: Better resource allocation for high-risk patients
Conclusion
The results highlight the importance of health indicators and lifestyle patterns in early heart disease prediction, enabling preventive recommendations and improved healthcare outcomes.