Turning Weak Signals into Strong Predictions
Why some machine learning models unlock economic forecasting potential
- By
- October 25, 2024
- CBR - Finance
In fields including computer science and data science, it is common practice when predicting outcomes such as customer churn or image recognition to focus on variables with the highest predictive power. This often involves identifying a few “strong” signals—such as user engagement metrics for churn prediction or edge detection features in image recognition—while discarding “weak” variables that contribute less to overall model accuracy.
But making accurate predictions in financial markets is notoriously challenging because the most easily exploitable opportunities for abnormal returns (alpha) have already been identified and capitalized on by sophisticated investors. This leaves financial datasets with weaker, more subtle signals—such as minor price inefficiencies or anomalies in trading patterns—which offer smaller potential gains and are far more difficult to detect.
Chicago Booth PhD student Zhouyu Shen and Booth’s Dacheng Xiu suggest that these weak signals provide an important opportunity and that discovering how to best make use of them has become critical for anyone looking to improve predictive accuracy. A commonly used prediction method can struggle with them, their research finds—while an older, less-used model outperformed in their tests.
Weak signals are prevalent in economic data. For example, changes to personal income, the unemployment rate, or corporate bond spreads are not seemingly relevant to someone trying to predict a move in industrial production. But such data could be helpful in combination, the researchers explain. After all, personal income changes are tied to consumer demand. Corporate bond spreads signal shifts in business borrowing costs. The unemployment rate provides a read on labor dynamics. Together, these variables could start to paint a more comprehensive picture of the factors influencing industrial production.
A prediction model that works for strong signals might not necessarily work for a data set full of subtle signals, however. In this case, which machine learning models can best capture faint patterns in high-dimensional data sets (those with a lot of variables)?
The common approach of focusing on strong signals and eliminating most weak signals to build predictive models has an advantage: It helps avoid overfitting, which occurs when a model becomes too tailored to its training data and loses the crucial ability to generalize to new, unseen data. However, when signals are weak, this selective process can lead to errors, undermining the benefits of a parsimonious (essentially simple) model by potentially excluding subtle yet valuable information or relying on incorrectly chosen signals.
To discover which ML methods remain effective at making use of subtle signals, the researchers employed an approach that combined theoretical work, simulations, and empirical analysis.
Regression is a popular technique for economic and financial forecasting, especially the least absolute shrinkage and selection operator model, which automatically weeds out weaker variables. Shen and Xiu compared LASSO with Ridge regression, an older method that has become somewhat out of fashion. They then extended their analysis to include tree-based ML models (random forest and gradient-boosted regression trees) and neural networks.
LASSO works well when there is a mix of strong and weak signals, but it struggles with data sets that consist mostly of faint signals, as is often the case in economics and finance. In fact, the researchers find that its performance can be worse than ignoring the signals altogether. Ridge regression, on the other hand, tends to do a better job of leveraging the cumulative power of less prominent signals, according to the research.
To validate their theoretical findings, the researchers performed simulations and empirical analyses that applied the methods to six real-world datasets from finance, macroeconomics, and microeconomics. These included datasets used to predict equity returns (for both individual stocks and the broader market), forecast industrial production growth and global economic growth, and analyze crime rates and pro-plaintiff decisions.
Ridge regression consistently provided predictions with higher accuracy than LASSO in data sets dominated by weak signals. This suggests Ridge regression is a more reliable tool for economic and financial prediction in these scenarios, the researchers write. Ridge keeps all variables in the model but ensures that less relevant details don’t dominate the prediction, whereas LASSO eliminates the less impactful variables altogether. This resulted in LASSO missing the subtle yet collectively significant weak signals.
The researchers’ findings highlight that in scenarios where all signals are weak, Ridge regression delivers more accurate predictions than models such as LASSO that are focused on pruning datasets down to only the strongest signals.
Random forest was the better of the tree-based methods when signals were weak, outperforming gradient boosted regression trees. Neural networks, which avoid overfitting by applying certain penalties, performed better when these penalties prevented any single part of the model from having too much influence. This approach worked more effectively than methods such as LASSO, which use penalties to eliminate the influence of many model components entirely.
The research suggests that in a landscape where the obvious signals have been fully exploited, the real advantage lies in uncovering and utilizing the subtle, often overlooked patterns within the data. Shen and Xiu’s work finds that by embracing weak signals, researchers and practitioners alike can gain a more nuanced and comprehensive understanding of economic dynamics. Finding the appropriate ML method for a dataset is a gateway to recognizing the hidden value within seemingly inconsequential data points.
Zhouyu Shen and Dacheng Xiu, “Can Machines Learn Weak Signals?” Working paper, March 2024.
People are just as impatient when it comes to paying money as receiving it.
Why Paying Off Your Debt Sooner May Cost You MoreIt’s even effective in identifying risks related to AI itself.
AI Reads between the Lines to Discover Corporate RiskThe central bank could bear some blame for financial markets’ current fragility.
How US Fed Policy Could Prolong InflationYour Privacy
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.