-
Chen, Wenqi authoredChen, Wenqi authored
README.md 1.50 KiB
SWEMLS_Coursework_1
This project is part of the SWEMLS coursework, focusing on building a machine learning model to detect Acute Kidney Injury (AKI) using patient blood test data. The system leverages LightGBM, a state-of-the-art gradient boosting framework, to develop an accurate and efficient prediction model tailored for clinical use.
Features
- Data Preprocessing: Handles missing values, encodes categorical variables, and standardizes features.
- Feature Engineering: Generates interaction features, statistical features, and date-based transformations.
- Imbalanced Data Handling: Balances the dataset using the SMOTE algorithm.
- Model Training: Implements the LightGBM framework with optimized hyperparameters and class weighting.
- Custom Thresholding: Applies a custom classification threshold for predictions to prioritize reducing false negatives.
Requirements
The project relies on the following Python packages, specified in requirements.txt
:
pandas==2.2.3 # Essential for data manipulation and analysis, widely used and stable.
argparse==1.4.0 # Explicitly listed for clarity and dependency management.
lightgbm==4.5.0 # Fast and efficient gradient boosting framework, well-tested in production systems.
scikit-learn==1.4.0 # Versatile machine learning library, known for reliability and extensive community support.
imbalanced-learn==0.12.4 # Useful for handling imbalanced datasets, widely trusted in the ML community.