Chapter 6: Exercise 5

a

A general form of Ridge regression optimization looks like

Minimize: \( \sum\limits_{i=1}^n {(y_i - \hat{\beta}_0 - \sum\limits_{j=1}^p {\hat{\beta}_jx_j} )^2} + \lambda \sum\limits_{i=1}^p \hat{\beta}_i^2 \)

In this case, \( \hat{\beta}_0 = 0 \) and \( n = p = 2 \). So, the optimization looks like:

Minimize: \( (y_1 - \hat{\beta}_1x_{11} - \hat{\beta}_2x_{12})^2 + (y_2 - \hat{\beta}_1x_{21} - \hat{\beta}_2x_{22})^2 + \lambda (\hat{\beta}_1^2 + \hat{\beta}_2^2) \)

b

Now we are given that, \( x_{11} = x_{12} = x_1 \) and \( x_{21} = x_{22} = x_2 \). We take derivatives of above expression with respect to both \( \hat{\beta_1} \) and \( \hat{\beta_2} \) and setting them equal to zero find that, \( \hat{\beta^*}_1 = \frac{x_1y_1 + x_2y_2 - \hat{\beta^*}_2(x_1^2 + x_2^2)}{\lambda + x_1^2 + x_2^2} \) and \( \hat{\beta^*}_2 = \frac{x_1y_1 + x_2y_2 - \hat{\beta^*}_1(x_1^2 + x_2^2)}{\lambda + x_1^2 + x_2^2} \)

Symmetry in these expressions suggests that \( \hat{\beta^*}_1 = \hat{\beta^*}_2 \)

c

Like Ridge regression,

Minimize: \( (y_1 - \hat{\beta}_1x_{11} - \hat{\beta}_2x_{12})^2 + (y_2 - \hat{\beta}_1x_{21} - \hat{\beta}_2x_{22})^2 + \lambda (| \hat{\beta}_1 | + | \hat{\beta}_2 |) \)

d

Here is a geometric interpretation of the solutions for the equation in c above. We use the alternate form of Lasso constraints \( | \hat{\beta}_1 | + | \hat{\beta}_2 | < s \).

The Lasso constraint take the form \( | \hat{\beta}_1 | + | \hat{\beta}_2 | < s \), which when plotted take the familiar shape of a diamond centered at origin \( (0, 0) \). Next consider the squared optimization constraint \( (y_1 - \hat{\beta}_1x_{11} - \hat{\beta}_2x_{12})^2 + (y_2 - \hat{\beta}_1x_{21} - \hat{\beta}_2x_{22})^2 \). We use the facts \( x_{11} = x_{12} \), \( x_{21} = x_{22} \), \( x_{11} + x_{21} = 0 \), \( x_{12} + x_{22} = 0 \) and \( y_1 + y_2 = 0 \) to simplify it to

Minimize: \( 2.(y_1 - (\hat{\beta}_1 + \hat{\beta}_2)x_{11})^2 \).

This optimization problem has a simple solution: \( \hat{\beta}_1 + \hat{\beta}_2 = \frac{y_1}{x_{11}} \). This is a line parallel to the edge of Lasso-diamond \( \hat{\beta}_1 + \hat{\beta}_2 = s \). Now solutions to the original Lasso optimization problem are contours of the function \( (y_1 - (\hat{\beta}_1 + \hat{\beta}_2)x_{11})^2 \) that touch the Lasso-diamond \( \hat{\beta}_1 + \hat{\beta}_2 = s \). Finally, as \( \hat{\beta}_1 \) and \( \hat{\beta}_2 \) very along the line \( \hat{\beta}_1 + \hat{\beta}_2 = \frac{y_1}{x_{11}} \), these contours touch the Lasso-diamond edge \( \hat{\beta}_1 + \hat{\beta}_2 = s \) at different points. As a result, the entire edge \( \hat{\beta}_1 + \hat{\beta}_2 = s \) is a potential solution to the Lasso optimization problem!

Similar argument can be made for the opposite Lasso-diamond edge: \( \hat{\beta}_1 + \hat{\beta}_2 = -s \).

Thus, the Lasso problem does not have a unique solution. The general form of solution is given by two line segments:

\( \hat{\beta}_1 + \hat{\beta}_2 = s; \hat{\beta}_1 \geq 0; \hat{\beta}_2 \geq 0 \) and \( \hat{\beta}_1 + \hat{\beta}_2 = -s; \hat{\beta}_1 \leq 0; \hat{\beta}_2 \leq 0 \)