Decision Tree Learning methods
There are a total of 5 methods developed here that contribute the automatic construction of the decision tree.
-
load_txt_data: This method loads the data from the specified.txtfile (file_name). The file can be located anywhere inside of the directory of the repository. -
calculate_entropy: This method calculates the value of entropy (entropy) given the label attribute column (label_attribute). -
calculate_info_gain: This method calculates the information gain (info_gain) given the label attribute column (label_attribute), input attribute column (input_attribute) and the threshold of the input attribute (threshold). -
find_split: This method finds the best split threshold (threshold) that maximises the information gain (info_gain) and its corresponding input attribute column. -
decision_tree_learning: This method creates the best decision tree in a recursive manner and stores the tree in a single dictionary. More explanation of the logic and the dictionary keys are provided in the code comments.
Next steps: Implement 10-fold cross validation on both the clean and noisy datasets, accuracy metrics and tree pruning.