Skip to content

Decision Tree Learning methods

Anuchitanukul, Atijit requested to merge decision_tree_learning into master

There are a total of 5 methods developed here that contribute the automatic construction of the decision tree.

  1. load_txt_data: This method loads the data from the specified .txt file (file_name). The file can be located anywhere inside of the directory of the repository.

  2. calculate_entropy: This method calculates the value of entropy (entropy) given the label attribute column (label_attribute).

  3. calculate_info_gain: This method calculates the information gain (info_gain) given the label attribute column (label_attribute), input attribute column (input_attribute) and the threshold of the input attribute (threshold).

  4. find_split: This method finds the best split threshold (threshold) that maximises the information gain (info_gain) and its corresponding input attribute column.

  5. decision_tree_learning: This method creates the best decision tree in a recursive manner and stores the tree in a single dictionary. More explanation of the logic and the dictionary keys are provided in the code comments.

Next steps: Implement 10-fold cross validation on both the clean and noisy datasets, accuracy metrics and tree pruning.

Merge request reports