Evaluation of Deep Learning and Traditional Models for Index-Level and Stock-Level Financial Forecasting

Tao Yu

Authors

Tao Yu Author

Keywords:

Financial Time Series Forecasting, Stock Index Prediction, Stock Price Prediction, Deep Learning Models, Traditional Forecasting Models (6)Benchmark Evaluation

Abstract

In financial market forecasting, numerous methods have been proposed for predicting both stock indices and individual stock prices. However, systematic evaluations comparing these models across different prediction tasks remain limited. To address this gap, this study conducts a unified comparative analysis of three deep learning models (MLP, LSTM, and Transformer) and two traditional benchmark models (ARIMA and SVR). All models are evaluated under a consistent experimental framework using the same dataset, input window lengths, and prediction horizons, and are tested using a rolling forecasting mechanism. Model performance is assessed using mean absolute error (MAE), root mean square error (RMSE), and prediction accuracy based on relative error thresholds. The results show that: (1) performance differences among forecasting models are largely determined by their structural characteristics, and conclusions derived from index-level forecasting cannot be directly generalized to stock-level prediction tasks. (2) Among deep learning models, LSTM demonstrates the most stable overall performance across different prediction settings, as its gated recurrent structure enables robust modeling of temporal dependencies in non-stationary financial time series. Transformer is more effective in capturing long-term trends but is less capable of modeling local fluctuations under high volatility, while MLP shows strong sensitivity to prediction configurations such as input window length and forecasting horizon. (3) Among traditional models, ARIMA exhibits strong stability in short-term index forecasting and maintains relatively low prediction errors across different settings, whereas SVR experiences significant performance degradation when longer historical input windows are used, highlighting the limitations of static kernel-based regression in modeling high-dimensional temporal features. (4) At the stock level, the higher heterogeneity of individual stock price series further amplifies performance differences among models. For example, on NVIDIA stock, most models achieve prediction accuracies below 50% under the ±10% tolerance, indicating that stocklevel forecasting requires models tailored to specific prediction targets.

Evaluation of Deep Learning and Traditional Models for Index-Level and Stock-Level Financial Forecasting

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section