I performed lasso and then leave-one-out cross validation

cv<-cv.glmnet(df, df$Price, nfolds = 1500) 

When I plot cv I get the following: enter image description here

I also noticed that I get 2 different lambdas: lambda.min and lambda.1se

  • What is the difference between these lambdas?
  • What can I understand from the above plot in general (what are these confidence intervals about, what are the two dotted lines etc)?

If I change to nfolds=10 to perform 10-fold validation, I get different lambda.1se and different coefficients for this lambda. Based on what criterio can I choose the best for me?

share|improve this question
1  
Have you tried looking here: web.stanford.edu/~hastie/glmnet/glmnet_alpha.html – ilanman 7 hours ago
    
@ilanman That is great, thank you ! But still which lambda should I prefer? My intuition would say lambda.min but I see that lambda.1se is usually suggested.. – Elemar 7 hours ago

This isn't really about statistics, just reading the documentation.

  • The two different values of $\lambda$ reflect two common choices for $\lambda$. The $\lambda_{\min}$ is the one which minimizes out-of-sample loss in CV. The $\lambda_{1se}$ is the one which is the largest $\lambda$ value within 1 standard error of $\lambda_{\min}$. One line of reasoning suggests using $\lambda_{1se}$ because it hedges against overfitting by selecting a larger $\lambda$ value than the min. Which choice is best is context-dependent.
  • Confidence intervals represent error estimates for the loss metric (red dots). They're computed using CV. The vertical lines show the locations of $\lambda_{\min}$ and $\lambda_{1se}$. The numbers across the top are the number of nonzero coefficient estimates.
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.