Using the notation from Wikipedia:
The coefficient of determination is usually evaluated on the train set. If you evaluate it on the test data, weird things can happen (especially when a test set is small). Let’s consider the $ R^2 $ for one fold that contains two observations, where the $ y $ values by pure chance are VERY close to each other.
\[ R^2 = 1 - \frac{SS_{\text{res}}}{SS_{\text{tot}}} \]
$ SS_{} $ is calculated by
\[ \sum_{i=1}^{2} (y_i - \bar{y})^2 \]
where \(\bar{y}\) is the mean of those two observations (at least this is how it is done in scikit-learn). Since the two observations are VERY close to each other, the $ SS_{} $ gets VERY small, while the test-residuals (and hence $ SS_{} $) might still be moderately high. This way, $ R^2 $ can be VERY low. By putting the two $ y $-values arbitrarily close to each other, you can make the $ R^2 $ for this test-fold go to \(-\infty\). Note that those $ y $-values are not outliers, but just very close to each other.
Minimal example:
$ n=4 $, $ k=2 $, estimator = mean, $ $ are the predictions from the other fold using the mean
fold | $ y $ | $ $ |
---|---|---|
1 | 4 | 5.0005 |
1 | 7 | 5.0005 |
2 | 5 | 5.5 |
2 | 5.001 | 5.5 |
Consider fold 2: \[ SS_{\text{res}} = 0.5^2 + 0.499^2 = 0.499 \] \[ SS_{\text{tot}} = (5 - 5.0005)^2 + (5.001 - 5.0005)^2 = 5 \times 10^{-7} \] \[ R^2 = 1 - \frac{SS_{\text{res}}}{SS_{\text{tot}}} = -998001 \]
Alternatives:
- Use the (R)MSE
- Use the MAE
- “pseudo R-squared”: 1 - mse / Var(y) (c.f. the R-implementation of randomForest)