This project implements a machine learning pipeline to predict Acute Kidney Injury (AKI) from patient data using scikit-learn. The pipeline includes **feature extraction**, **model training**, and **inference**.
## Features
1. Feature extraction from tabular data.
2. Model training using a StackingClassifier with various ensemble models.
3. Inference and prediction export for unseen test datasets.
4. Evaluation metrics like F1 score to assess model performance.
## Model Selection
The following models were considered and their results are as follows:
-**RandomForestClassifier**: 0.9740
-**AdaBoostClassifier**: 0.9789
-**ExtraTreesClassifier**: 0.9832
-**HistGradientBoostingClassifier**: 0.9796
The final model, a **StackingClassifier**, combines the strengths of:
1. AdaBoostClassifier
2. ExtraTreesClassifier
3. HistGradientBoostingClassifier
The final estimator, `SGDClassifier`, is chosen for its simplicity and efficiency. This ensemble approach enhances prediction robustness by leveraging diverse learning strategies.