Skip to content
Snippets Groups Projects

SWEMLS_Coursework_1

This project is part of the SWEMLS coursework, focusing on building a machine learning model to detect Acute Kidney Injury (AKI) using patient blood test data. The system leverages LightGBM, a state-of-the-art gradient boosting framework, to develop an accurate and efficient prediction model tailored for clinical use.

Features

  • Data Preprocessing: Handles missing values, encodes categorical variables, and standardizes features.
  • Feature Engineering: Generates interaction features, statistical features, and date-based transformations.
  • Imbalanced Data Handling: Balances the dataset using the SMOTE algorithm.
  • Model Training: Implements the LightGBM framework with optimized hyperparameters and class weighting.
  • Custom Thresholding: Applies a custom classification threshold for predictions to prioritize reducing false negatives.

Requirements

The project relies on the following Python packages, specified in requirements.txt:

pandas==2.2.3             # Essential for data manipulation and analysis, widely used and stable.
argparse==1.4.0           # Explicitly listed for clarity and dependency management.
lightgbm==4.5.0           # Fast and efficient gradient boosting framework, well-tested in production systems.
scikit-learn==1.4.0       # Versatile machine learning library, known for reliability and extensive community support.
imbalanced-learn==0.12.4  # Useful for handling imbalanced datasets, widely trusted in the ML community.