Decision Tree Learning methods
There are a total of 5 methods developed here that contribute the automatic construction of the decision tree.
-
load_txt_data
: This method loads the data from the specified.txt
file (file_name
). The file can be located anywhere inside of the directory of the repository. -
calculate_entropy
: This method calculates the value of entropy (entropy
) given the label attribute column (label_attribute
). -
calculate_info_gain
: This method calculates the information gain (info_gain
) given the label attribute column (label_attribute
), input attribute column (input_attribute
) and the threshold of the input attribute (threshold
). -
find_split
: This method finds the best split threshold (threshold
) that maximises the information gain (info_gain
) and its corresponding input attribute column. -
decision_tree_learning
: This method creates the best decision tree in a recursive manner and stores the tree in a single dictionary. More explanation of the logic and the dictionary keys are provided in the code comments.
Next steps: Implement 10-fold cross validation on both the clean and noisy datasets, accuracy metrics and tree pruning.