What AI Sees in the Market (That You Might Not)
Ten ways investors are, or should be, using large language models
What AI Sees in the Market (That You Might Not)ProStockStudio/Shutterstock
Before ChatGPT was released in November 2022, it might have been difficult to imagine artificial intelligence and large language models performing tasks that required human qualities such as empathy or judgment. But technology has moved so quickly that now medical LLMs are outshining doctors in terms of diagnostic accuracy and bedside manner when texting with patients, and an LLM generates better startup ideas than MBA students, according to University of Pennsylvania’s Ethan Mollick. LLMs may also excel at financial statement analysis, suggests research by Chicago Booth PhD student Alex G. Kim and Booth’s Maximilian Muhn and Valeri Nikolaev.
This process of determining a company’s health by examining its financial statements requires quantitative analysis followed by reasoning, critical thinking, and complex judgments—areas where one might expect a human to outperform AI. But when asked to analyze a company’s balance sheet and income statement, and then make predictions about the direction of the company’s future earnings, GPT-4 Turbo outperformed professional analysts. Its results were also on par with a sophisticated machine-learning model that was, unlike GPT-4, specifically trained to predict earnings.
LLMs are strong at textual analysis but have been known to perform poorly at mathematical calculations. This fact makes the AI’s performance even more impressive given that the researchers excluded all textual clues, including the management discussion and analysis sections of annual reports and any references to the year or the company name in the financial statements. These statements were pulled from the Compustat database and covered the years 1968 to 2021. Analyst forecasts, from 1983 to 2021, came from the I/B/E/S Estimates database.
Kim, Muhn, and Nikolaev prompted GPT-4 to use a thought process similar to that of human analysts, who generally observe trends in financial statements, compute financial ratios, and synthesize those findings to predict earnings. When the researchers tested their model by showing it standardized and anonymized financial statements that it had not seen before to predict the direction of earnings, it achieved an accuracy rate of about 60 percent, compared with about 53 percent for professional analysts.
The researchers also compared the results with those of a logistic regression model and an artificial neural network (ANN) that had been trained to predict the direction of earnings. Both of these models used 59 financial variables, such as the ratio of book value to price. The logistic regression model made predictions that were accurate about 53 percent of the time, while the accuracy rate for ANN was close to 60 percent, similar to GPT-4.
The researchers suggest that GPT-4 likely used its understanding of economic reasoning to analyze the insights it formed from the financial ratios and trends it recognized in the statements. They employed a prompting technique known as chain-of-thought, which builds on intermediate reasoning steps to enable more complex analysis, to guide the LLM.
When the researchers didn’t use CoT prompting, GPT-4’s accuracy fell to about 52 percent. The LLM’s predictions were less likely to be accurate when it was evaluating a company that was small, had a higher leverage ratio, had recorded a loss, or had exhibited volatile earnings, the research finds.
Kim, Muhn, and Nikolaev also evaluated whether GPT-4’s earnings forecasts could be put to economic use by informing a trading strategy. On the basis of information generated by GPT-4, they formed a long-short portfolio that delivered high risk-adjusted returns, relative to a benchmark, in backtesting (using historical data to validate a trading strategy).
Traditionally, LLMs have been considered tools to support an analyst’s work, but the researchers’ findings suggest that they have the potential to play a more central role in decision-making processes, rather than simply provide support. That’s not to say that LLMs should replace analysts. “Broadly, our analysis suggests that LLMs can take a more central place in decision-making,” the researchers write.
They note that humans still provide valuable insights that cannot be gleaned from financial statements. Professional analysts may bring a nuanced understanding of a company, the market, regulations, and more—but an LLM is stronger when it comes to predictions. The abilities of LLMs and analysts complement each other, the research concludes.
Alex G. Kim, Maximilian Muhn, and Valeri Nikolaev, “Financial Statement Analysis with Large Language Models,” Working paper, May 2024.
Let the events of your life enrich, not bias, your thinking.
Look Beyond Your Own ExperienceLarge language models can help revolutionize how science is practiced.
How Generative AI Can Improve Scientific ExperimentsChicago Booth’s Raghuram G. Rajan joins hosts Bethany McLean and Luigi Zingales to explore risks in the financial system and possible solutions.
Capitalisn’t: Why the Banking Crisis Isn’t OverYour Privacy
We want to demonstrate our commitment to your privacy. Please review Chicago Booth's privacy notice, which provides information explaining how and why we collect particular information when you visit our website.