library(ISLR)
attach(OJ)
set.seed(1013)
train = sample(dim(OJ)[1], 800)
OJ.train = OJ[train, ]
OJ.test = OJ[-train, ]
library(tree)
oj.tree = tree(Purchase ~ ., data = OJ.train)
summary(oj.tree)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = OJ.train)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff"
## Number of terminal nodes: 7
## Residual mean deviance: 0.752 = 596 / 793
## Misclassification error rate: 0.155 = 124 / 800
The tree only uses two variables: \( \tt{LoyalCH} \) and \( \tt{PriceDiff} \). It has \( 7 \) terminal nodes. Training error rate (misclassification error) for the tree is \( 0.155 \).
oj.tree
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 800 1000 CH ( 0.60 0.40 )
## 2) LoyalCH < 0.5036 359 400 MM ( 0.28 0.72 )
## 4) LoyalCH < 0.276142 170 100 MM ( 0.11 0.89 ) *
## 5) LoyalCH > 0.276142 189 300 MM ( 0.42 0.58 )
## 10) PriceDiff < 0.05 79 80 MM ( 0.19 0.81 ) *
## 11) PriceDiff > 0.05 110 100 CH ( 0.59 0.41 ) *
## 3) LoyalCH > 0.5036 441 300 CH ( 0.87 0.13 )
## 6) LoyalCH < 0.764572 186 200 CH ( 0.75 0.25 )
## 12) PriceDiff < -0.165 29 30 MM ( 0.28 0.72 ) *
## 13) PriceDiff > -0.165 157 100 CH ( 0.83 0.17 )
## 26) PriceDiff < 0.265 82 100 CH ( 0.73 0.27 ) *
## 27) PriceDiff > 0.265 75 30 CH ( 0.95 0.05 ) *
## 7) LoyalCH > 0.764572 255 90 CH ( 0.96 0.04 ) *
Let's pick terminal node labeled “10)”. The splitting variable at this node is \( \tt{PriceDiff} \). The splitting value of this node is \( 0.05 \). There are \( 79 \) points in the subtree below this node. The deviance for all points contained in region below this node is \( 80 \). A * in the line denotes that this is in fact a terminal node. The prediction at this node is \( \tt{Sales} \) = \( \tt{MM} \). About \( 19 \)% points in this node have \( \tt{CH} \) as value of \( \tt{Sales} \). Remaining \( 81 \)% points have \( \tt{MM} \) as value of \( \tt{Sales} \).
plot(oj.tree)
text(oj.tree, pretty = 0)
\( \tt{LoyalCH} \) is the most important variable of the tree, in fact top 3 nodes contain \( \tt{LoyalCH} \). If \( \tt{LoyalCH} < 0.27 \), the tree predicts \( \tt{MM} \). If \( \tt{LoyalCH} > 0.76 \), the tree predicts \( \tt{CH} \). For intermediate values of \( \tt{LoyalCH} \), the decision also depends on the value of \( \tt{PriceDiff} \).
oj.pred = predict(oj.tree, OJ.test, type = "class")
table(OJ.test$Purchase, oj.pred)
## oj.pred
## CH MM
## CH 152 19
## MM 32 67
cv.oj = cv.tree(oj.tree, FUN = prune.tree)
plot(cv.oj$size, cv.oj$dev, type = "b", xlab = "Tree Size", ylab = "Deviance")
Size of 6 gives lowest cross-validation error.
oj.pruned = prune.tree(oj.tree, best = 6)
summary(oj.pruned)
##
## Classification tree:
## snip.tree(tree = oj.tree, nodes = 13L)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff"
## Number of terminal nodes: 6
## Residual mean deviance: 0.769 = 610 / 794
## Misclassification error rate: 0.155 = 124 / 800
Misclassification error of pruned tree is exactly same as that of original tree — \( 0.155 \).
pred.unpruned = predict(oj.tree, OJ.test, type = "class")
misclass.unpruned = sum(OJ.test$Purchase != pred.unpruned)
misclass.unpruned/length(pred.unpruned)
## [1] 0.1889
pred.pruned = predict(oj.pruned, OJ.test, type = "class")
misclass.pruned = sum(OJ.test$Purchase != pred.pruned)
misclass.pruned/length(pred.pruned)
## [1] 0.1889
Pruned and unpruned trees have same test error rate of \( 0.189 \).