Chapter 8: Exercise 9

a

library(ISLR)
attach(OJ)
set.seed(1013)

train = sample(dim(OJ)[1], 800)
OJ.train = OJ[train, ]
OJ.test = OJ[-train, ]

b

library(tree)
oj.tree = tree(Purchase ~ ., data = OJ.train)
summary(oj.tree)
## 
## Classification tree:
## tree(formula = Purchase ~ ., data = OJ.train)
## Variables actually used in tree construction:
## [1] "LoyalCH"   "PriceDiff"
## Number of terminal nodes:  7 
## Residual mean deviance:  0.752 = 596 / 793 
## Misclassification error rate: 0.155 = 124 / 800

The tree only uses two variables: \( \tt{LoyalCH} \) and \( \tt{PriceDiff} \). It has \( 7 \) terminal nodes. Training error rate (misclassification error) for the tree is \( 0.155 \).

c

oj.tree
## node), split, n, deviance, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 800 1000 CH ( 0.60 0.40 )  
##    2) LoyalCH < 0.5036 359  400 MM ( 0.28 0.72 )  
##      4) LoyalCH < 0.276142 170  100 MM ( 0.11 0.89 ) *
##      5) LoyalCH > 0.276142 189  300 MM ( 0.42 0.58 )  
##       10) PriceDiff < 0.05 79   80 MM ( 0.19 0.81 ) *
##       11) PriceDiff > 0.05 110  100 CH ( 0.59 0.41 ) *
##    3) LoyalCH > 0.5036 441  300 CH ( 0.87 0.13 )  
##      6) LoyalCH < 0.764572 186  200 CH ( 0.75 0.25 )  
##       12) PriceDiff < -0.165 29   30 MM ( 0.28 0.72 ) *
##       13) PriceDiff > -0.165 157  100 CH ( 0.83 0.17 )  
##         26) PriceDiff < 0.265 82  100 CH ( 0.73 0.27 ) *
##         27) PriceDiff > 0.265 75   30 CH ( 0.95 0.05 ) *
##      7) LoyalCH > 0.764572 255   90 CH ( 0.96 0.04 ) *

Let's pick terminal node labeled “10)”. The splitting variable at this node is \( \tt{PriceDiff} \). The splitting value of this node is \( 0.05 \). There are \( 79 \) points in the subtree below this node. The deviance for all points contained in region below this node is \( 80 \). A * in the line denotes that this is in fact a terminal node. The prediction at this node is \( \tt{Sales} \) = \( \tt{MM} \). About \( 19 \)% points in this node have \( \tt{CH} \) as value of \( \tt{Sales} \). Remaining \( 81 \)% points have \( \tt{MM} \) as value of \( \tt{Sales} \).

d

plot(oj.tree)
text(oj.tree, pretty = 0)

plot of chunk 9d

\( \tt{LoyalCH} \) is the most important variable of the tree, in fact top 3 nodes contain \( \tt{LoyalCH} \). If \( \tt{LoyalCH} < 0.27 \), the tree predicts \( \tt{MM} \). If \( \tt{LoyalCH} > 0.76 \), the tree predicts \( \tt{CH} \). For intermediate values of \( \tt{LoyalCH} \), the decision also depends on the value of \( \tt{PriceDiff} \).

e

oj.pred = predict(oj.tree, OJ.test, type = "class")
table(OJ.test$Purchase, oj.pred)
##     oj.pred
##       CH  MM
##   CH 152  19
##   MM  32  67

f

cv.oj = cv.tree(oj.tree, FUN = prune.tree)

g

plot(cv.oj$size, cv.oj$dev, type = "b", xlab = "Tree Size", ylab = "Deviance")

plot of chunk 9g

h

Size of 6 gives lowest cross-validation error.

i

oj.pruned = prune.tree(oj.tree, best = 6)

j

summary(oj.pruned)
## 
## Classification tree:
## snip.tree(tree = oj.tree, nodes = 13L)
## Variables actually used in tree construction:
## [1] "LoyalCH"   "PriceDiff"
## Number of terminal nodes:  6 
## Residual mean deviance:  0.769 = 610 / 794 
## Misclassification error rate: 0.155 = 124 / 800

Misclassification error of pruned tree is exactly same as that of original tree — \( 0.155 \).

k

pred.unpruned = predict(oj.tree, OJ.test, type = "class")
misclass.unpruned = sum(OJ.test$Purchase != pred.unpruned)
misclass.unpruned/length(pred.unpruned)
## [1] 0.1889
pred.pruned = predict(oj.pruned, OJ.test, type = "class")
misclass.pruned = sum(OJ.test$Purchase != pred.pruned)
misclass.pruned/length(pred.pruned)
## [1] 0.1889

Pruned and unpruned trees have same test error rate of \( 0.189 \).