Credit: eamesBot/Shutterstock

Turning Weak Signals into Strong Predictions

Why some machine learning models unlock economic forecasting potential

By Monika Brown
October 25, 2024
CBR - Finance

In fields including computer science and data science, it is common practice when predicting outcomes such as customer churn or image recognition to focus on variables with the highest predictive power. This often involves identifying a few “strong” signals—such as user engagement metrics for churn prediction or edge detection features in image recognition—while discarding “weak” variables that contribute less to overall model accuracy.

But making accurate predictions in financial markets is notoriously challenging because the most easily exploitable opportunities for abnormal returns (alpha) have already been identified and capitalized on by sophisticated investors. This leaves financial datasets with weaker, more subtle signals—such as minor price inefficiencies or anomalies in trading patterns—which offer smaller potential gains and are far more difficult to detect.

Chicago Booth PhD student Zhouyu Shen and Booth’s Dacheng Xiu suggest that these weak signals provide an important opportunity and that discovering how to best make use of them has become critical for anyone looking to improve predictive accuracy. A commonly used prediction method can struggle with them, their research finds—while an older, less-used model outperformed in their tests.

Weak signals are prevalent in economic data. For example, changes to personal income, the unemployment rate, or corporate bond spreads are not seemingly relevant to someone trying to predict a move in industrial production. But such data could be helpful in combination, the researchers explain. After all, personal income changes are tied to consumer demand. Corporate bond spreads signal shifts in business borrowing costs. The unemployment rate provides a read on labor dynamics. Together, these variables could start to paint a more comprehensive picture of the factors influencing industrial production.

A prediction model that works for strong signals might not necessarily work for a data set full of subtle signals, however. In this case, which machine learning models can best capture faint patterns in high-dimensional data sets (those with a lot of variables)?

Related Reading

The common approach of focusing on strong signals and eliminating most weak signals to build predictive models has an advantage: It helps avoid overfitting, which occurs when a model becomes too tailored to its training data and loses the crucial ability to generalize to new, unseen data. However, when signals are weak, this selective process can lead to errors, undermining the benefits of a parsimonious (essentially simple) model by potentially excluding subtle yet valuable information or relying on incorrectly chosen signals.

To discover which ML methods remain effective at making use of subtle signals, the researchers employed an approach that combined theoretical work, simulations, and empirical analysis.

Regression is a popular technique for economic and financial forecasting, especially the least absolute shrinkage and selection operator model, which automatically weeds out weaker variables. Shen and Xiu compared LASSO with Ridge regression, an older method that has become somewhat out of fashion. They then extended their analysis to include tree-based ML models (random forest and gradient-boosted regression trees) and neural networks.

LASSO works well when there is a mix of strong and weak signals, but it struggles with data sets that consist mostly of faint signals, as is often the case in economics and finance. In fact, the researchers find that its performance can be worse than ignoring the signals altogether. Ridge regression, on the other hand, tends to do a better job of leveraging the cumulative power of less prominent signals, according to the research.

To validate their theoretical findings, the researchers performed simulations and empirical analyses that applied the methods to six real-world datasets from finance, macroeconomics, and microeconomics. These included datasets used to predict equity returns (for both individual stocks and the broader market), forecast industrial production growth and global economic growth, and analyze crime rates and pro-plaintiff decisions.

Ridge regression consistently provided predictions with higher accuracy than LASSO in data sets dominated by weak signals. This suggests Ridge regression is a more reliable tool for economic and financial prediction in these scenarios, the researchers write. Ridge keeps all variables in the model but ensures that less relevant details don’t dominate the prediction, whereas LASSO eliminates the less impactful variables altogether. This resulted in LASSO missing the subtle yet collectively significant weak signals.

The researchers’ findings highlight that in scenarios where all signals are weak, Ridge regression delivers more accurate predictions than models such as LASSO that are focused on pruning datasets down to only the strongest signals.

Random forest was the better of the tree-based methods when signals were weak, outperforming gradient boosted regression trees. Neural networks, which avoid overfitting by applying certain penalties, performed better when these penalties prevented any single part of the model from having too much influence. This approach worked more effectively than methods such as LASSO, which use penalties to eliminate the influence of many model components entirely.

The research suggests that in a landscape where the obvious signals have been fully exploited, the real advantage lies in uncovering and utilizing the subtle, often overlooked patterns within the data. Shen and Xiu’s work finds that by embracing weak signals, researchers and practitioners alike can gain a more nuanced and comprehensive understanding of economic dynamics. Finding the appropriate ML method for a dataset is a gateway to recognizing the hidden value within seemingly inconsequential data points.

Works Cited

Zhouyu Shen and Dacheng Xiu, “Can Machines Learn Weak Signals?” Working paper, March 2024.

More from Chicago Booth Review

What AI Sees in the Market (That You Might Not)

Ten ways investors are, or should be, using large language models

CBR - Artificial Intelligence

Paying Off Credit-Card Debt May Take More Than a Nudge

One idea for helping consumers avoid debt traps didn’t work in a UK experiment, partly because people didn’t have the funds.

CBR - Behavioral Science

With Business Loans Harder to Get, Private Debt Funds Are Stepping In

Private debt funds are filling a $1 trillion hole in the lending market for middle-market businesses that can’t get conventional bank financing.

CBR - Finance

NECESSARY COOKIES These cookies are essential to enable the services to provide the requested feature, such as remembering you have logged in.	ALWAYS ACTIVE
	Accept \| Reject
PERFORMANCE AND ANALYTIC COOKIES These cookies are used to collect information on how users interact with Chicago Booth websites allowing us to improve the user experience and optimize our site where needed based on these interactions. All information these cookies collect is aggregated and therefore anonymous.
FUNCTIONAL COOKIES These cookies enable the website to provide enhanced functionality and personalization. They may be set by third-party providers whose services we have added to our pages or by us.
TARGETING OR ADVERTISING COOKIES These cookies collect information about your browsing habits to make advertising relevant to you and your interests. The cookies will remember the website you have visited, and this information is shared with other parties such as advertising technology service providers and advertisers.
SOCIAL MEDIA COOKIES These cookies are used when you share information using a social media sharing button or “like” button on our websites, or you link your account or engage with our content on or through a social media site. The social network will record that you have done this. This information may be linked to targeting/advertising activities.

Turning Weak Signals into Strong Predictions

Why some machine learning models unlock economic forecasting potential

Related Reading

More from Chicago Booth Review

What AI Sees in the Market (That You Might Not)

Paying Off Credit-Card Debt May Take More Than a Nudge

With Business Loans Harder to Get, Private Debt Funds Are Stepping In

Related Topics

More from Chicago Booth

Related Topics

Manage Cookie Preferences

Turning Weak Signals into Strong Predictions

Why some machine learning models unlock economic forecasting potential

Related Reading

What AI Sees in the Market (That You Might Not)

More from Chicago Booth Review

What AI Sees in the Market (That You Might Not)

Paying Off Credit-Card Debt May Take More Than a Nudge

With Business Loans Harder to Get, Private Debt Funds Are Stepping In

Related Topics

More from Chicago Booth

Related Topics