Skip to content

Sebastijan-Dominis/diabetes-insight

Repository files navigation

Diabetes Insight

Diabetes Insight is a demonstration project that generates a well-structured PDF report based on user input.
A pretrained machine learning model predicts whether a user is diabetic, or estimates their diabetes risk score. SHAP values are calculated and, together with the prediction, passed to gpt-5-mini, which produces a user-friendly, medically-styled explanation. This response is then formatted into a downloadable PDF report.

This project is designed as a practice exploration of how traditional machine learning, model explainability, and LLM engineering can work together in a healthcare-style application.
It is not intended for real clinical use.


Table of Contents


Technologies Used

  • Frontend: Dash
  • Backend: FastAPI
  • Machine Learning: Custom-trained tree-based models (CatBoost + variants)
  • LLM Integration: gpt-5-mini with prompt engineering
  • PDF Generation: Python libraries for layout and export
  • Docker

Features

  • Generate a diabetes diagnosis report based on user input
  • Generate a diabetes risk score report
  • Automatic SHAP explainability
  • Automatic LLM-generated narrative for users
  • Cleanly formatted PDF download

Model Training

The training pipeline includes data cleaning, exploratory analysis, feature processing, model training, and SHAP explainability.
The following models were evaluated:

  • KNN
  • Logistic Regression
  • Decision Tree
  • Voting Classifier
  • Random Forest
  • Gradient Boosting
  • AdaBoost
  • Extra Trees
  • XGBoost
  • LightGBM
  • CatBoost

CatBoost showed slightly better performance and was selected for both final models.
Other tree-based methods performed similarly.

Note: Some training notebooks take considerable time to run, even on modern machines (late 2025).


Project Structure

Backend (backend/)

  • models/ — Serialized CatBoost classification models
  • routers/reports.py — Two API endpoints for generating reports
  • limiter.py — Simple rate limiter to prevent abuse
  • main.py — FastAPI initialization + CORS middleware
  • models.py — Pydantic request validation schemas
  • utils.py — Utility functions used in the endpoints

Frontend (frontend/)

  • frontend.ipynb — Dash application (UI built with Dash Bootstrap Components & Templates)

Training (training/)

  • Jupyter notebooks with EDA, preprocessing, and model training
  • models/ — Final models (mirrors backend/models)
  • parquet files — Processed datasets
  • diabetes_dataset.csv — Original Kaggle dataset
  • utils.py — Common helper functions
  • Final_training.ipynb — Re-training and SHAP generation workflow

Environment Variables

Include a .env file with:

- OPENAI_API_KEY=your_api_key
- API_URL=http://frontend:8050
- BACKEND_URL=http://backend:8000/reports

Notes

  • If you decide to use a different API_URL, make sure to update this line of code in frontend.py (and frontend.ipynb for consistency - not necessary):
if __name__ == '__main__':
    app.run(host="0.0.0.0", port=8050)
  • The BACKEND_URL currently includes the /reports prefix, since it is the only one called by the frontend in the current version of the app. If you decide to expand the app, make sure to exclude and, and update the relevant parts of the code in the frontend, namely these two:
response = requests.post(f"{BACKEND_URL}/diagnosis", json=input_data)
response = requests.post(f"{BACKEND_URL}/risk_score", json=input_data)

Installation

Requirements

  • Docker
  • Python 3.13 (optional)

Notes

  • This app uses some python packages, such as weasyprint, that tend to act differently on different machines. It is recommended you use Docker to run it, in order to avoid any potential issues.
  • Some of the packages used during training, such as cupy, act very differently on different machines, depending on both OS and hardware. F.x. cupy only works on Nvidia GPUs that support CUDA drivers. Hence, there is no requirements.txt with a list of all packages used in training, but the notebooks can be accessed nonetheless.

1. Clone the Repository

git clone https://github.com/Sebastijan-Dominis/diabetes-insight
cd diabetes-insight

2. Build a docker image

docker-compose build --no-cache

3. Run docker

docker compose up

4. Use the app

  • If you used the default ports, you can now access the frontend on localhost:8050, and the backend on localhost:8000/docs.

Screenshots

Below are examples of how the app looks and what the generated reports contain.

App use

alt text

alt text

Reports

alt text

alt text

alt text


Notes

  • SHAP values are computed on the backend for inference-time explainability.
  • The included dataset is synthetic and the project is for learning/demonstration only.
  • The frontend is intentionally simplified; in production, a React or Vue SPA would be preferable.

License

  • This repository includes a LICENSE file — please review it for terms of reuse.

Contributing

  • Improvements and bug fixes welcome. Open an issue or submit a pull request with a clear description of the change.

Author / Contact

About

A dashboard where a pretrained ML model predicts whether a user is diabetic, or estimates their diabetes risk score, while gpt-5-mini uses its shap values and prediction to generate a report, which is then formatted into a downloadable PDF.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages