A "Function-Weighted Elastic Network" framework to overcome the "dimensional curse" in predictive modeling

Bassam Fayyad Kanaan Abdul-Jabbar

doi:10.31185/bsj.Vol21.Iss35.1276

المؤلفون

بسام فياض كنعان عبد الجبار جامعة محقق أردبيلي كلية الرياضيات - قسم الإحصاء الرياضي .

DOI:

https://doi.org/10.31185/bsj.Vol21.Iss35.1276

الكلمات المفتاحية:

: التنظيم، الشبكة المرنة، البيانات عالية الابعاد، النمذجة التنبؤية، هندسة الميزات.

الملخص

النماذج التنبؤية التقليدية عند التعامل مع البيانات عالية الأبعاد -مثل نموذج الشبكة المرنة (Elastic Net)- من قصور يتمثل في تجاهل المعلومات الخارجية القيمة المتاحة حول المتغيرات المستقلة. تهدف هذه الورقة إلى معالجة هذا القصور من خلال تقديم إطار عمل مبتكر يُعرف بـ "الشبكة المرنة الموزونة بالخصائص" (fwelnet). يعتمد هذا الإطار على دمج البيانات الوصفية للميزات (Feature Metadata) بشكل منهجي ومباشر في عملية التقدير، وذلك عبر تخصيص أوزان جزاء تكيفية لكل متغير بناءً على المعلومات المسبقة.

لإثبات فاعلية الإطار المقترح، تم تطبيقه على بيانات جينية معقدة للتنبؤ باستجابة مرضى سرطان الرئة (NSCLC) للعلاج المناعي، حيث شملت الدراسة n=1,500 مريض و p=18,500 جين. أظهرت النتائج تفوقاً إحصائياً ملموساً لنموذج "fwelnet"، حيث حقق دقة تنبؤية عالية بمتوسط مساحة تحت المنحنى (AUC) بلغ 0.91 مقارنة بـ 0.82 للشبكة المرنة القياسية. بالإضافة إلى تحسين الدقة، تمكن النموذج من تحديد بصمة جينية موجزة ومتماسكة بيولوجياً، وكشف عن مؤشرات حيوية جديدة عالية الثقة أبرزها الجين (CXCL13)، مما يثبت قدرة الإطار على توليد فرضيات علمية دقيقة والتعامل بمتانة مع البيانات المشوشة.

المراجع

1. Bergersen, L. C., Glad, I. K. & Lyng, H. (2011), ‘Weighted lasso with data integration’, Statistical Applications in Genetics and Molecular Biology 10(1).

2. Bergstra, J. & Bengio, Y. (2012), ‘Random search for hyper-parameter optimization’, Journal of Machine Learning Research 13, 281–305.

3. Boulesteix, A.-L., De Bin, R., Jiang, X. & Fuchs, M. (2017), ‘IPF-LASSO: Integrative L1-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data’, Computational and Mathematical Methods in Medicine 2017, 1–14.

4. Erez, O., Romero, R., Maymon, E., Chaemsaithong, P., Done, B., Pacora, P., Panaitescu, B., Chaiworapongsa, T., Hassan, S. S. & Tarca, A. L. (2017), ‘The prediction of early preeclampsia: Results from a longitudinal proteomics study’, PLoS ONE 12(7), 1–28.

5. Friedman, J., Hastie, T. & Tibshirani, R. (2010), ‘Regularization Paths for Generalized Linear Models via Coordinate Descent’, Journal of Statistical Software 33(1), 1–24.

6. Hoerl, A. E. & Kennard, R. W. (1970), ‘Ridge regression: Biased estimation for nonorthogonal problems’, Technometrics 12(1), 55–67.

7. Jabeen, M., Yakoob, M. Y., Imdad, A. & Bhutta, Z. A. (2011), ‘Impact of interventions to prevent and manage preeclampsia and eclampsia on stillbirths’, BMC Public Health 11(S3), S6.

8. Jacob, L., Obozinski, G. & Vert, J.-P. (2009), Group Lasso with overlap and graph Lasso, in ‘Proceedings of the 26th annual international conference on machine learning’, pp. 433–440.

9. Mollaysa, A., Strasser, P. & Kalousis, A. (2017), ‘Regularising non-linear models using feature side-information’, Proceedings of the 34th International Conference on Machine Learning pp. 2508–2517.

10. Slawski, M., zu Castell, W. & Tutz, G. (2010), ‘Feature selection guided by structural information’, Annals of Applied Statistics 4(2), 1056–1080.

11. Snoek, J., Larochelle, H. & Adams, R. P. (2012), Practical Bayesian optimization of machine learning algorithms, in ‘Advances in neural information processing systems’, pp. 2951–2959.

12. Tai, F. & Pan, W. (2007), ‘Incorporating prior knowledge of predictors into penalized classifiers with multiple penalty terms’, Bioinformatics 23(14), 1775–1782.

13. Tibshirani, R. (1996), ‘Regression Shrinkage and Selection via the Lasso’, Journal of the Royal Statistical Society. Series B (Methodological) 58(1), 267–288.

14. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. & Knight, K. (2005), ‘Sparsity and smoothness via the fused lasso’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(1), 91–108.

15. van de Wiel, M. A., Lien, T. G., Verlaat, W., van Wieringen, W. N. & Wilting, S. M. (2016), ‘Better prediction by use of co-data: adaptive group-regularized ridge regression’, Statistics in Medicine 35(3), 368–381.

16. Velten, B. & Huber, W. (2018), ‘Adaptive penalization in high-dimensional regression and classification with external covariates using variational Bayes’, arXiv preprint arXiv:1811.02962.

17. Yuan, M. & Lin, Y. (2006), ‘Model Selection and Estimation in Regression with Grouped Variables’, Journal of the Royal Statistical Society. Series B (Statistical Methodology) 68(1), 49–67.

18. Zeisler, H., Llurba, E., Chantraine, F., Vatish, M., Staff, A. C., Sennström, M., Olovsson, M., Brennecke, S. P., Stepan, H., Allegranza, D., Dilba, P., Schoedl, M., Hund, M. & Verlohren, S. (2016), ‘Predictive value of the sFlt-1:PlGF ratio in women with suspected preeclampsia’, New England Journal of Medicine 374(1), 13–22.

19. Zou, H. & Hastie, T. (2005), ‘Regularization and variable selection via the elastic net’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67(2), 301–320.

20. Zou, H. (2006), ‘The adaptive lasso and its oracle properties’, Journal of the American Statistical Association 101(476), 1418–1429.

21. Aydın, P. A. (2025). Prior-Informed Multivariate LSTM (PIM-LSTM) for Economic Time Series (Doctoral dissertation, Middle East Technical University (Turkey)).‏.

22. Aghziel, A., Mahraz, M. A., Tairi, H., & Aherrahrou, N. (2025). Artificial intelligence for comprehensive DNA methylation analysis: overview, challenges, and future directions. Briefings in Bioinformatics, 26(5), bbaf468.‏

23. Keller, M., Arora, V., Dakri, A., Chandhok, S., Machann, J., Fritsche, A., … & Pujades, S. (2024). HIT: Estimating internal human implicit tissues from the body surface. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3480-3490).‏

24. Zhang, M., Liu, S., Shao, H., Ba, Z., Liu, J., Albu Kaya, M. G., … & Han, G. (2025). Development trend in non-destructive techniques for cultural heritage: From material characterization to AI-driven diagnosis. Heritage, 8(9), 381.‏

25. Oancea, B., & Simionescu, M. (2024). Gross Domestic Product Forecasting: Harnessing Machine Learning for Accurate Economic Predictions in a Univariate Setting. Electronics, 13(24), 4918.‏

26. Lazcano, A., Jaramillo-Morán, M. A., & Sandubete, J. E. (2024). Back to basics: The power of the multilayer perceptron in financial time series forecasting. Mathematics, 12(12), 1920.‏

27. Rudd, W., Bondell, H., & Silver, J. (2025). Augmenting Neural Networks With Time‐Varying Weights. Journal of Forecasting.‏

28. Romero, R., Leon, D., Sandoval, J., Hernandez, G., & Zapata, C. (2025, October). Forecasting the USD/COP Exchange Rate Using LSTM Networks: A Comparison with ARIMA Models. In Workshop on Engineering Applications (pp. 74-85). Cham: Springer Nature Switzerland.‏

29. Lazcano de Rojas, A., Jaramillo Morán, M. Á., & Sandubete Galán, J. E. (2024). Back to Basics: The Power of the Multilayer Perceptron in Financial Time Series Forecasting.‏

30. Hao, X. (2024). A novel FNN-based deep learning model for the forecasting of long-term multivariate time series. Authorea Preprints.‏

إطار عمل "الشبكة المرنة الموزونة بالخصائص" للتغلب على "لعنة الأبعاد" في النمذجة التنبؤية

المؤلفون

DOI:

الكلمات المفتاحية:

الملخص

المراجع

التنزيلات

منشور

إصدار

القسم

الرخصة

ISSN

اللغة

sidebar