library(ISLR)
set.seed(1)
summary(Wage$maritl)
## 1. Never Married 2. Married 3. Widowed 4. Divorced
## 648 2074 19 204
## 5. Separated
## 55
summary(Wage$jobclass)
## 1. Industrial 2. Information
## 1544 1456
par(mfrow = c(1, 2))
plot(Wage$maritl, Wage$wage)
plot(Wage$jobclass, Wage$wage)
It appears a married couple makes more money on average than other groups. It also appears that Informational jobs are higher-wage than Industrial jobs on average.
fit = lm(wage ~ maritl, data = Wage)
deviance(fit)
## [1] 4858941
fit = lm(wage ~ jobclass, data = Wage)
deviance(fit)
## [1] 4998547
fit = lm(wage ~ maritl + jobclass, data = Wage)
deviance(fit)
## [1] 4654752
Unable to fit splines on categorical variables.
library(gam)
## Loading required package: splines Loaded gam 1.09
fit = gam(wage ~ maritl + jobclass + s(age, 4), data = Wage)
deviance(fit)
## [1] 4476501
Without more advanced techniques, we cannot fit splines to categorical
variables (factors). maritl and jobclass do add statistically significant
improvements to the previously discussed models.