Chapter 6: Exercise 5

a

A general form of Ridge regression optimization looks like

Minimize: $$\sum\limits_{i=1}^n {(y_i - \hat{\beta}_0 - \sum\limits_{j=1}^p {\hat{\beta}_jx_j} )^2} + \lambda \sum\limits_{i=1}^p \hat{\beta}_i^2$$

In this case, $$\hat{\beta}_0 = 0$$ and $$n = p = 2$$. So, the optimization looks like:

Minimize: $$(y_1 - \hat{\beta}_1x_{11} - \hat{\beta}_2x_{12})^2 + (y_2 - \hat{\beta}_1x_{21} - \hat{\beta}_2x_{22})^2 + \lambda (\hat{\beta}_1^2 + \hat{\beta}_2^2)$$

b

Now we are given that, $$x_{11} = x_{12} = x_1$$ and $$x_{21} = x_{22} = x_2$$. We take derivatives of above expression with respect to both $$\hat{\beta_1}$$ and $$\hat{\beta_2}$$ and setting them equal to zero find that, $$\hat{\beta^*}_1 = \frac{x_1y_1 + x_2y_2 - \hat{\beta^*}_2(x_1^2 + x_2^2)}{\lambda + x_1^2 + x_2^2}$$ and $$\hat{\beta^*}_2 = \frac{x_1y_1 + x_2y_2 - \hat{\beta^*}_1(x_1^2 + x_2^2)}{\lambda + x_1^2 + x_2^2}$$

Symmetry in these expressions suggests that $$\hat{\beta^*}_1 = \hat{\beta^*}_2$$

c

Like Ridge regression,

Minimize: $$(y_1 - \hat{\beta}_1x_{11} - \hat{\beta}_2x_{12})^2 + (y_2 - \hat{\beta}_1x_{21} - \hat{\beta}_2x_{22})^2 + \lambda (| \hat{\beta}_1 | + | \hat{\beta}_2 |)$$

d

Here is a geometric interpretation of the solutions for the equation in c above. We use the alternate form of Lasso constraints $$| \hat{\beta}_1 | + | \hat{\beta}_2 | < s$$.

The Lasso constraint take the form $$| \hat{\beta}_1 | + | \hat{\beta}_2 | < s$$, which when plotted take the familiar shape of a diamond centered at origin $$(0, 0)$$. Next consider the squared optimization constraint $$(y_1 - \hat{\beta}_1x_{11} - \hat{\beta}_2x_{12})^2 + (y_2 - \hat{\beta}_1x_{21} - \hat{\beta}_2x_{22})^2$$. We use the facts $$x_{11} = x_{12}$$, $$x_{21} = x_{22}$$, $$x_{11} + x_{21} = 0$$, $$x_{12} + x_{22} = 0$$ and $$y_1 + y_2 = 0$$ to simplify it to

Minimize: $$2.(y_1 - (\hat{\beta}_1 + \hat{\beta}_2)x_{11})^2$$.

This optimization problem has a simple solution: $$\hat{\beta}_1 + \hat{\beta}_2 = \frac{y_1}{x_{11}}$$. This is a line parallel to the edge of Lasso-diamond $$\hat{\beta}_1 + \hat{\beta}_2 = s$$. Now solutions to the original Lasso optimization problem are contours of the function $$(y_1 - (\hat{\beta}_1 + \hat{\beta}_2)x_{11})^2$$ that touch the Lasso-diamond $$\hat{\beta}_1 + \hat{\beta}_2 = s$$. Finally, as $$\hat{\beta}_1$$ and $$\hat{\beta}_2$$ very along the line $$\hat{\beta}_1 + \hat{\beta}_2 = \frac{y_1}{x_{11}}$$, these contours touch the Lasso-diamond edge $$\hat{\beta}_1 + \hat{\beta}_2 = s$$ at different points. As a result, the entire edge $$\hat{\beta}_1 + \hat{\beta}_2 = s$$ is a potential solution to the Lasso optimization problem!

Similar argument can be made for the opposite Lasso-diamond edge: $$\hat{\beta}_1 + \hat{\beta}_2 = -s$$.

Thus, the Lasso problem does not have a unique solution. The general form of solution is given by two line segments:

$$\hat{\beta}_1 + \hat{\beta}_2 = s; \hat{\beta}_1 \geq 0; \hat{\beta}_2 \geq 0$$ and $$\hat{\beta}_1 + \hat{\beta}_2 = -s; \hat{\beta}_1 \leq 0; \hat{\beta}_2 \leq 0$$