What is a Decision Tree?
It is a supervised algorithm that is used for both classification and regression. But, in most of the cases we use classification algorithms. It is a tree-structured classifier where internal nodes represent the features of the data set, branches represent the decision rules and leaf nodes represent the outcome. In this algorithm there are two nodes which are decision node and leaf node. The decision node is the node on which the decision is taken as well as the data resides. The other node, leaf node, is the one in which decisions are made and it cannot further have nodes while decision nodes can have. Its name decision tree is because its structure starts with a root node and then its branches further grow just like trees.
In the above figure we can see that there is a root node where we enter the data to classify it into a decision tree. Then there are a set of rules that are added into the decision node. Then there is a leaf node where the outputs are stored.
Terminologies:
- Root Node: It is the place where the decision tree starts and represents all the data, after this the data is distributed into further sub-datasets that are homogeneous in nature.
- Leaf Node: It is the place where the algorithm stops and gives an output.
- Splitting: It is the dividing of the dataset into different sub datasets according to the situation.
- Pruning: It is the process of removing the unwanted part of the tree that is formed unnecessarily
How does Decision Tree work:
In this process of making a decision tree the first step usually is to collect the data set and then save it into the root node, then we find the best attribute using Attribute selection method and divide it into the subsets for the best values possible. Then next generate the best decision tree containing the best attributes for our subsets. Then we jump back to finding the best attribute using ASM until the final leaf node is found.
Now, there are some attribute selection measures that are useful to select the best attribute in the decision tree. Some of them are mentioned below:
- Information Gain: It is the measurement of the changes in entropy after segmentation of the dataset based on the attribute.
Or in simple words when the entropy is changed after segregating the data set into a subset there is some amount of information the class provides us, that is information gain.
Its formula is:
Information gain = Entropy(S)-[(Weighted Avg)*Entropy (each feature)
Where, entropy is a metric to measure the impurity in each attribute, it also specifies the randomness of the data. Its formula is:
Entropy(S) = -P(yes)log2 P(yes)- P(no) log2 P(no)
Where, S = total number of samples, P(yes) = probability of yes, P(no) = probability of no
- Gini Index: It is basically the impurity or purity of the decision tree while creating it. When the Gini index of any attribute is low then it should be compared to a high value Gini index such that the binary split we obtain should be an optimum decision tree.
Its formula is:
Gini Index = 1 – ∑j Pj2
- Pruning: It is the process of removal of the unnecessary nodes from the decision tree to get an optimal decision tree.
Advantages of Decision Tree:
- It is very helpful to solve decision based problems.
- It is also an algorithm that helps to think about all the possibilities that are possible in that case.
- It is a simple algorithm that follows the same approach as humans’ approach.
Disadvantages of Decision Tree:
- It is moreover a complex algorithm because of the decision tree layers of the nodes.
- It has overfitting issues.
- Due to more classes in the data set, the complexity of the decision tree increases.