This repository contains the coursework for Machine Learning for Healthcare, taught at ETH Zurich in the Spring Semester 2025.
Group members: Anna Toidze, Gonzalo Cardenal Antolin, Tae Kim
Project 1 centers on modeling ICU patient trajectories using the PhysioNet 2012 Challenge dataset. The goal is to predict in-hospital mortality based on the first 48 hours of multivariate time-series data. The project is structured as follows:
- Data preparation & exploration: converting irregular clinical measurements into a consistent temporal representation, handling missingness, and examining variable distributions.
- Supervised modeling: training a range of models including classic ML approaches, LSTMs, bidirectional RNNs, and Transformer-based architectures.
- Representation learning: experimenting with self-supervised techniques, contrastive objectives, and evaluating embeddings via linear probes and visualization tools.
- Foundation models: applying both small LLMs (via text-based summaries) and time-series foundation models such as Chronos for downstream prediction and embedding generation.
In the end, we summarize the model behavior, method trade-offs, and insights gained from working with real ICU data.
Project 2 examines interpretability techniques for both structured and imaging data in clinical settings. It consists of three main components:
- Tabular data analysis: using the Heart Failure Prediction dataset to study logistic models with L1 regularization, MLPs paired with SHAP explanations, and Neural Additive Models (NAMs) for inherently interpretable nonlinear modeling.
- Medical imaging classification: training a CNN on the Chest X-Ray Pneumonia dataset and applying post-hoc attribution methods such as Integrated Gradients and Grad-CAM to understand spatial decision patterns.
- Synthesis & evaluation: comparing explanation methods, assessing their reliability (including via sanity checks), and discussing how well they align with clinical intuition and practical deployment considerations.