A single word in a news report—a well-placed “undervalue,” for example—can drive a company’s stock price up or down. Investors can benefit if they can figure out which words matter within a few days, research suggests.
Investors and researchers have suspected for decades that text could be used to predict markets, some trying and failing. But applying machine-learning techniques originated by computer scientists, Harvard’s Zheng Tracy Ke, Yale’s Bryan T. Kelly, and Chicago Booth’s Dacheng Xiu have built a model that in early tests outperformed a similar strategy based on scores from RavenPack, the leading vendor of news-sentiment scores.
Traditionally finance researchers and market practitioners have relied on accounting data and fundamentals to predict where the market is headed. But quarterly reports arrive slowly for a market moving at warp speed, which led researchers and traders to look for other sources of predictive information, including news. To find out if news reports could be used to predict stock prices, Ke, Kelly, and Xiu borrowed machine-learning techniques used by computer scientists, who are increasingly training machines to understand text.
Efforts to predict market direction by parsing financial journalism date back to 1933, when economist and businessman Alfred Cowles III classified pieces in the Wall Street Journal as bullish, bearish, or neutral to inform trading strategies. That didn’t necessarily work—Cowles’s theoretical portfolio would have underperformed the market by more than 3 percent a year from 1902 to 1929, the researchers note—but other people have continued to pursue the idea of extracting useful information from text. Among them, Northwestern’s Scott R. Baker, Stanford’s Nicholas Bloom, and Chicago Booth’s Steven J. Davis analyzed years of newspaper articles to identify words associated with economic uncertainty, and have used those words to inform dozens of uncertainty-related indexes.
Some efforts to assess sentiment in text rely on preexisting dictionaries created for other purposes—such as the Harvard-IV Dictionary, a manually selected list of positive and negative psychosocial words, and the Loughran-McDonald Master Dictionary, developed to highlight meaningful words in financial texts and the sentiment associated with those words. The latter starts with word lists and uses US Securities and Exchange Commission filings to add terms relevant to the finance sector. For example, the dictionary added Scholes for the Black-Scholes modeling tool used with financial derivatives.
Ke, Kelly, and Xiu created a model that essentially automatically generates a dictionary of relevant words and allows for contextually specific sentiment scores. Using supervised machine learning and a method that required only a laptop and basic statistical capabilities, the researchers analyzed more than 22 million articles published from 1989 to 2017 by Dow Jones Newswires. Classifying words as either positive or negative, the researchers generated article-level sentiment scores—to highlight how news likely to be perceived as positive or negative would impact stock prices.