As you may know "scikit-learn" library in python is not able to make a decision tree based on categorical data, and you have to convert categorical data to numerical before passing them to the classifier method. Also, the resulted decision tree is a binary tree while a decision tree does not need to be binary.
Here, we provide a library which is able to make a visual decision tree based on categorical data. You can read more about decision trees here.
The main algorithm which is used is ID3 with the following features:
- Information gain based on entropy
- Information gain based on gini
- Some pruning capabilities like:
- Minimum number of samples
- Minimum information gain
- The resulted tree is not binary
You can find all the requirements in "requirements.txt" file, and it can be installed easily by the following command:
- pip install -r requirements.txt
Also to be able to see visual tree, you need to install graphviz package. Here you can find the right package with respect to your operation system.
from p_decision_tree.DecisionTree import DecisionTree
import pandas as pd
#Reading CSV file as data set by Pandas
data = pd.read_csv('playtennis.csv')
columns = data.columns
#All columns except the last one are descriptive by default
descriptive_features = columns[:-1]
#The last column is considered as label
label = columns[-1]
#Converting all the columns to string
for column in columns:
data[column]= data[column].astype(str)
data_descriptive = data[descriptive_features].values
data_label = data[label].values
#Calling DecisionTree constructor (the last parameter is criterion which can also be "gini")
decisionTree = DecisionTree(data_descriptive.tolist(), descriptive_features.tolist(), data_label.tolist(), "entropy")
#Here you can pass pruning features (gain_threshold and minimum_samples)
decisionTree.id3(0,0)
#Visualizing decision tree by Graphviz
dot = decisionTree.print_visualTree( render=True )
# When using Jupyter
#display( dot )
print("System entropy: ", format(decisionTree.entropy))
print("System gini: ", format(decisionTree.gini))