The Effect of the Second Stage Estimator on Model Performance in Post-LASSO Method

Murat Genç; Ömer Özbilen

doi:10.55525/tjst.1244925

Araştırma Makalesi

The Effect of the Second Stage Estimator on Model Performance in Post-LASSO Method

Yıl 2023, Cilt: 18 Sayı: 2, 319 - 330, 01.09.2023

Murat Genç Ömer Özbilen

https://doi.org/10.55525/tjst.1244925

Cited By: 1

Öz

Penalized linear regression methods are used for the accurate prediction of new observations and to obtain interpretable models. The performance of these methods depends on the properties of the true coefficient vector. The LASSO method is a penalized regression method that can simultaneously perform coefficient shrinkage and variable selection in a continuous process. Depending on the structure of the dataset, different estimators have been proposed to overcome the problems faced by LASSO. The estimation method used in the second stage of the post-LASSO two-stage regression method proposed as an alternative to LASSO has a considerable effect on model performance.
In this study, the performance of the post-LASSO is compared with classical penalized regression methods ridge, LASSO, elastic net, adaptive LASSO and Post-LASSO by using different estimation methods in the second stage of the post-LASSO. In addition, the effect of the magnitude and position of the signal values in the real coefficient vector on the performance of the models obtained by these methods is analyzed. The mean squared error and standard deviation of the predictions calculated on the test set are used to compare the prediction performance of the models, while the active set sizes are used to compare their performance in variable selection. According to the findings obtained from the simulation studies, the choice of the second-stage estimator and the structure of the true coefficient vector significantly affect the success of the post-LASSO method compared to other methods.

Anahtar Kelimeler

Linear Regression, LASSO, Post-LASSO, Multicollinearity

Kaynakça

Montgomery DC, Runger GC, Hubele NF. Engineering Statistics. New York: John Wiley & Sons; 2009.
Bzovsky S, Phillips MR, Guymer RH, Wykoff CC, Thabane L, Bhandari M, Chaudhary V. The clinician’s guide to interpreting a regression analysis. Eye 2022; 36(9):1715-1717.
Venkateshan SP. Mechanical Measurements. New York: John Wiley & Sons; 2015.
Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970; 12(1):55-67.
Liu K. Using Liu-type estimator to combat collinearity. Commun Stat - Theory Methods 2003; 32(5):1009-1020.
Rao CR, Toutenburg H. Linear Models: Springer; 1995.
Sarkar N. A new estimator combining the ridge regression and the restricted least squares methods of estimation. Commun Stat - Theory Methods 1992; 21(7):1987-2000.
Breiman L. Better subset regression using the nonnegative garrote. Technometrics 1995; 37(4):373-384.
Frank LE, Friedman JH. A statistical view of some chemometrics regression tools. Technometrics 1993; 35(2):109-135.
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 1996; 58(1):267-288.
Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Methodol 2005; 67(2):301-320.
Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc 2006; 101(476):1418-1429.
Belloni A, Chernozhukov V. Least squares after model selection in high-dimensional sparse models. Bernoulli 2013; 19(2):521-547.
Ahrens A, Bhattacharjee A. Two-step lasso estimation of the spatial weights matrix. Econometrics 2015; 3(1):128-155.
De Mol C, Mosci S, Traskine M, Verri A. A regularized method for selecting nested groups of relevant genes from microarray data. J Comput Biol 2009; 16(5):677-690.
Urminsky O, Hansen C, Chernozhukov V. Using double-lasso regression for principled variable selection. SSRN Working Paper No. 273374. 2016.
Shahriari S, Faria S, Gonçalves AM. Variable selection methods in high-dimensional regression-A simulation study. Commun Stat - Simul Comput 2015; 44(10):2548-2561.
Ahmed SE, Kim H, Yıldırım G, Yüzbaşı B. High-Dimensional Regression Under Correlated Design: An Extensive Simulation Study. International Workshop on Matrices and Statistics, Springer. 2016:145-175.
Genç M. Bir Simülasyon Çalışması ile Cezalı Regresyon Yöntemlerinin Karşılaştırılması. Bilecik Şeyh Edebali Üniv Fen Bilim Derg 2022; 9(1):80-91.
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York: Springer series in statistics; 2001.
Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Stat 2004; 32(2):407-499.
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 2011; 3(1):1-122.
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010; 33(1):1-22.
Chang L, Roberts S, Welsh A. Robust lasso regression using Tukey's biweight criterion. Technometrics 2018; 30(1):36-47.
Chong IG, Jun CH. Performance of some variable selection methods when multicollinearity is present. Chemom Intell Lab Syst 2005; 78(1-2):103-112.
Hussami N, Tibshirani RJ. A component lasso. Can J Stat 2015; 43(4):624-646.

Post-LASSO Yönteminde İkinci Aşama Tahmin Edicisinin Model Performansına Etkisi

Yıl 2023, Cilt: 18 Sayı: 2, 319 - 330, 01.09.2023

Murat Genç Ömer Özbilen

https://doi.org/10.55525/tjst.1244925

Cited By: 1

Öz

Cezalı doğrusal regresyon yöntemleri yeni gözlemlerin doğru ön tahmini ve yorumlanabilir modeller elde edilmesi için kullanılır. Bu yöntemlerin performansı gerçek katsayı vektörünün özelliklerine bağlı olarak değişmektedir. LASSO yöntemi sürekli bir süreçte eşanlı olarak katsayı büzme ve değişken seçimi yapabilen bir cezalı regresyon yöntemidir. Veri kümesinin yapısına bağlı olarak LASSO’nun karşılaştığı problemlerin aşılabilmesi için farklı tahmin ediciler önerilmiştir. LASSO’ya alternatif olarak önerilen Post-LASSO iki aşamalı regresyon yönteminin ikinci aşamasında kullanılan tahmin yöntemi model performansı üzerinde kayda değer bir etkiye sahiptir.
Bu çalışmada Post-LASSO’nun ikinci aşamasında farklı tahminleme yöntemleri kullanılarak klasik cezalı regresyon yöntemleri olan ridge, LASSO, elastik net, uyarlanabilir LASSO ile Post-LASSO’nun performansı karşılaştırılmıştır. Ayrıca gerçek katsayı vektöründeki sinyal değerlerinin büyüklük ve konumunun söz konusu yöntemlerle elde edilen modellerin performansı üzerindeki etkisi incelenmiştir. Modellerin ön tahmin performansının karşılaştırılması için test kümesi üzerinde hesaplanan hata kareler ortalaması ve tahminlerin standart sapması; değişken seçimindeki performanslarının karşılaştırılması için aktif küme büyüklükleri kullanılmıştır. Simülasyon çalışmalarından elde edilen bulgulara göre ikinci aşama tahmin edicinin seçimi ile gerçek katsayı vektörünün yapısı Post-LASSO yönteminin diğer yöntemlere göre başarısını önemli ölçüde etkilemektedir.

Anahtar Kelimeler

Doğrusal regresyon, LASSO, Post-LASSO, Çoklu İç İlişki

Kaynakça

Montgomery DC, Runger GC, Hubele NF. Engineering Statistics. New York: John Wiley & Sons; 2009.
Bzovsky S, Phillips MR, Guymer RH, Wykoff CC, Thabane L, Bhandari M, Chaudhary V. The clinician’s guide to interpreting a regression analysis. Eye 2022; 36(9):1715-1717.
Venkateshan SP. Mechanical Measurements. New York: John Wiley & Sons; 2015.
Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970; 12(1):55-67.
Liu K. Using Liu-type estimator to combat collinearity. Commun Stat - Theory Methods 2003; 32(5):1009-1020.
Rao CR, Toutenburg H. Linear Models: Springer; 1995.
Sarkar N. A new estimator combining the ridge regression and the restricted least squares methods of estimation. Commun Stat - Theory Methods 1992; 21(7):1987-2000.
Breiman L. Better subset regression using the nonnegative garrote. Technometrics 1995; 37(4):373-384.
Frank LE, Friedman JH. A statistical view of some chemometrics regression tools. Technometrics 1993; 35(2):109-135.
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 1996; 58(1):267-288.
Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Methodol 2005; 67(2):301-320.
Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc 2006; 101(476):1418-1429.
Belloni A, Chernozhukov V. Least squares after model selection in high-dimensional sparse models. Bernoulli 2013; 19(2):521-547.
Ahrens A, Bhattacharjee A. Two-step lasso estimation of the spatial weights matrix. Econometrics 2015; 3(1):128-155.
De Mol C, Mosci S, Traskine M, Verri A. A regularized method for selecting nested groups of relevant genes from microarray data. J Comput Biol 2009; 16(5):677-690.
Urminsky O, Hansen C, Chernozhukov V. Using double-lasso regression for principled variable selection. SSRN Working Paper No. 273374. 2016.
Shahriari S, Faria S, Gonçalves AM. Variable selection methods in high-dimensional regression-A simulation study. Commun Stat - Simul Comput 2015; 44(10):2548-2561.
Ahmed SE, Kim H, Yıldırım G, Yüzbaşı B. High-Dimensional Regression Under Correlated Design: An Extensive Simulation Study. International Workshop on Matrices and Statistics, Springer. 2016:145-175.
Genç M. Bir Simülasyon Çalışması ile Cezalı Regresyon Yöntemlerinin Karşılaştırılması. Bilecik Şeyh Edebali Üniv Fen Bilim Derg 2022; 9(1):80-91.
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York: Springer series in statistics; 2001.
Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Stat 2004; 32(2):407-499.
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 2011; 3(1):1-122.
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010; 33(1):1-22.
Chang L, Roberts S, Welsh A. Robust lasso regression using Tukey's biweight criterion. Technometrics 2018; 30(1):36-47.
Chong IG, Jun CH. Performance of some variable selection methods when multicollinearity is present. Chemom Intell Lab Syst 2005; 78(1-2):103-112.
Hussami N, Tibshirani RJ. A component lasso. Can J Stat 2015; 43(4):624-646.

Toplam 26 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	İstatistiksel Teori
Bölüm	TJST
Yazarlar	Murat Genç 0000-0002-6335-3044 Ömer Özbilen 0000-0001-6110-1911
Yayımlanma Tarihi	1 Eylül 2023
Gönderilme Tarihi	30 Ocak 2023
Yayımlandığı Sayı	Yıl 2023 Cilt: 18 Sayı: 2

Kaynak Göster

APA	Genç, M., & Özbilen, Ö. (2023). The Effect of the Second Stage Estimator on Model Performance in Post-LASSO Method. Turkish Journal of Science and Technology, 18(2), 319-330. https://doi.org/10.55525/tjst.1244925
AMA	Genç M, Özbilen Ö. The Effect of the Second Stage Estimator on Model Performance in Post-LASSO Method. TJST. Eylül 2023;18(2):319-330. doi:10.55525/tjst.1244925
Chicago	Genç, Murat, ve Ömer Özbilen. “The Effect of the Second Stage Estimator on Model Performance in Post-LASSO Method”. Turkish Journal of Science and Technology 18, sy. 2 (Eylül 2023): 319-30. https://doi.org/10.55525/tjst.1244925.
EndNote	Genç M, Özbilen Ö (01 Eylül 2023) The Effect of the Second Stage Estimator on Model Performance in Post-LASSO Method. Turkish Journal of Science and Technology 18 2 319–330.
IEEE	M. Genç ve Ö. Özbilen, “The Effect of the Second Stage Estimator on Model Performance in Post-LASSO Method”, TJST, c. 18, sy. 2, ss. 319–330, 2023, doi: 10.55525/tjst.1244925.
ISNAD	Genç, Murat - Özbilen, Ömer. “The Effect of the Second Stage Estimator on Model Performance in Post-LASSO Method”. Turkish Journal of Science and Technology 18/2 (Eylül 2023), 319-330. https://doi.org/10.55525/tjst.1244925.
JAMA	Genç M, Özbilen Ö. The Effect of the Second Stage Estimator on Model Performance in Post-LASSO Method. TJST. 2023;18:319–330.
MLA	Genç, Murat ve Ömer Özbilen. “The Effect of the Second Stage Estimator on Model Performance in Post-LASSO Method”. Turkish Journal of Science and Technology, c. 18, sy. 2, 2023, ss. 319-30, doi:10.55525/tjst.1244925.
Vancouver	Genç M, Özbilen Ö. The Effect of the Second Stage Estimator on Model Performance in Post-LASSO Method. TJST. 2023;18(2):319-30.

Turkish Journal of Science and Technology

The Effect of the Second Stage Estimator on Model Performance in Post-LASSO Method

Öz

Anahtar Kelimeler

Kaynakça

Post-LASSO Yönteminde İkinci Aşama Tahmin Edicisinin Model Performansına Etkisi

Öz

Anahtar Kelimeler

Kaynakça

Ayrıntılar

Kaynak Göster

Cited By

Konveks ve konveks olmayan cezalı regresyon yöntemlerinin karşılaştırılması üzerine bir çalışma

Balıkesir Üniversitesi Fen Bilimleri Enstitüsü Dergisi

https://doi.org/10.25092/baunfbed.1299583