Embeddings

The world of finance produces vast amounts of data, yet many types – such as text, audio, and image data – have historically been underutilized in financial modeling. Traditionally, stock prices are predicted using standard financial metrics, leaving valuable information like shareholder letters, Federal Reserve meeting minutes, and government regulations, untapped. However, recent advances in machine learning have made it possible to analyze this data systematically. 

Asset Embeddings in Finance

A notable example is the work of Booth faculty member Dr. Ralph S.J. Koijen et al. whose research explores harnessing this unrealized data to generate new insights in finance. By adapting the same underlying mechanisms as ChatGPT, the team developed an asset embedding model that extends to traditional use cases including firm valuations, return co-movement, and asset substitution patterns, paving the way for new insights in financial modeling. 

The foundations of their work lie in a transformer-based model known as Bidirectional Encoder Representations from Transformers (BERT), a neural network that generates “embeddings” – continuous vector representations of unstructured data, such as text or speech, in high-dimensional spaces.

 LLM-fig1

 

While ChatGPT uses embeddings to analyze the semantic relationships between words, Koijen et al. adapts this logic to find patterns and relationships in financial information, developing a methodology to derive asset embeddings directly from portfolio holdings data.

“Just as documents arrange words that can be used to uncover word structures via embeddings, investors organize assets in portfolios that can be used to uncover firm characteristics that investors deem important via asset embeddings.” 

Traditionally, financial modeling has focused on a limited set of “factors” or “characteristics” that explain differences in average returns between stocks. Now however, the advancement in asset embeddings offers a more comprehensive approach to prediction, significantly outperforming traditional stock metrics. 

Building on the asset embeddings work, Kim and Nikolaev explored how contextual data can enhance analysis to a broader range of financial data. In their study, "Context-Based Interpretation of Financial Information", they utilized the same foundational BERT model to encode narrative text and assess its interaction with numerical disclosures. The researchers compared the performance of fully connected and partially connected artificial neural network (ANN) models:

LLM-fig2

figure 1-a

Each neuron from textual input interacts fully with numeric neurons, allowing complex relationships to emerge. This full interaction outputs the logit probability of directional changes in the target variable, (e.g., earnings increase or decrease in the subsequent period).

LLM-fig3

figure 1-b

Maintaining the same layer structure as figure 1-a, this model restricts interaction between vectors, limiting cross-layer influence. In this example, the contextual information vectors are not involved in the activation of neurons stemming from the accounting information vector, and vice versa. 

The difference in accuracy between the fully and partially connected model can be attributed to the contextuality of accounting information. Their findings revealed that incorporating narrative context significantly improves accuracy in predicting key financial outcomes (e.g. directional changes in earnings). Fully connected models showed 16% improvement over numeric-only models, highlighting the value of contextual data in interpreting numeric financial data, especially in periods of economic uncertainty or when numeric data alone is less reliable.

 

Fine-Tuned Embeddings: Enhanced Predictive Power

The “Asset Embeddings” study concluded with a recommendation to fine-tune text-based embeddings for enhanced predictive capacity. Around the same time, Li et al. introduced FLAME (Faithful Latent Feature Mining for Predictive Model Enhancement), a framework leveraging LLMs to infer latent features. By augmenting observed features with latent features, FLAME can enhance the predictive power of ML models in downstream tasks.

There are often unobserved but critical factors that traditional ML (Machine Learning) models struggle to incorporate. FLAME addresses challenges such as limited data availability and unobserved critical factors in domains where collected features are weakly correlated with outcomes, or where additional field collection is constrained by ethical or practical difficulties. By formulating latent feature mining as text-to-text propositional logical reasoning, their proposed framework aims to improve predictive modeling across industries by incorporating contextual information unique to specific domains. Case studies in criminal justice and healthcare demonstrated its efficacy in improving prediction accuracy under ethical and practical constraints.

Validating their framework were two case studies:

LLM-fig4

Their results showed that inferred latent features align well with ground truth labels and significantly enhance the downstream classifier. 

Fine-Tuned Embeddings: Improving Marketing Communication

Another application of fine-tuned embeddings was demonstrated by Lee et al., who developed a domain-specific model to optimize marketing communication. Their research, “Causal Alignment: Augmenting Language Models with A/B Tests”, showcased a framework that enhances email marketing performance by generating and ranking content suggestions. When deployed, the model generates improvements to human-generated marketing content. 

LLM-fig5

figure 2

The three key steps are (1) generating, (2) evaluating, and (3) selecting candidate decisions. (1) is done by a language model, (2) is done by a predictive model, and (3) is done by a human. Traditionally, (1) was done by a human, which is challenging in the email marketing copywriting setting. 

For a new email campaign, the user enters a subject line, key phrase, or both. The user has the option to feed in emotion tags, but if not provided then the model will infer the best emotion(s). The fine-tuned model generates many candidates (50-100), from which those with predicted accuracy below 0.4 on a scale from 0-1 are removed. The predictive model ranks the remaining candidates, then the top 10 are presented to the user who makes the final selection. 

Instead of replacing human input entirely, the model performed best in complement with human expertise by refining initial ideas. Specifically, the research leverages a smaller, fine-tuned LM to transform lower-performing content into optimized alternatives, validated through a large-scale field experiment on email marketing subject lines containing 283 million impressions across 36 campaigns.

In essence, Lee et al. found that fine-tuned models with fewer parameters could outperform a large general-purpose model (GPT-3.5-turbo), demonstrating the efficiency of smaller, domain-specific models over larger general-purpose ones for marketing content optimization. Their framework thus offers a model for integrating AI into marketing while maintaining human oversight and aligns AI output with business objectives, suggesting broader applications in optimizing unstructured content across industries.

Quality Control: Reducing Toxicity in AI

Ensuring reliability, appropriateness, and factual accuracy in AI-generated content is a top priority. Bradford et al. attempt to address this through the development of “BeanCounter”, a large-scale, low-toxicity dataset specific to business contexts. With over 159 billion tokens sourced from corporate disclosures, BeanCounter implements quality control measures (e.g., accuracy filtering, diversity checks, and toxicity detection) to reduce toxicity while enhancing domain-specific knowledge on finance-specific tasks, such as sentiment classification and named entity recognition.

What is toxicity in the context of LLMs?

  1. Discriminatory or Biased Content: Prejudiced or derogatory language targeted toward specific demographics, such as gender, race, religion, nationality, or sexual orientation
  2. Hateful or Hostile Language: Text that incites hate, hostility, or aggression, often targeting particular groups or individuals
  3. Profane or Inappropriate Content: Vulgar, obscene, or unprofessional language unsuitable for public or professional use

Models trained on BeanCounter demonstrated a 18-33% reduction in toxic content generation, along with improved demographic representation and sensitivity to bias, making them better suited for use cases where careful language is paramount, i.e. front-facing situations like customer interaction. The dataset trained on BeanCounter maintains diversity in demographic mentions, while also lowering toxicity associated with sensitive descriptors. The demographic terms used by the model were 59-89% less toxic than in traditional web-based datasets, which has the potential to mitigate bias that could affect model outputs negatively in professional settings. In finance-related tasks specifically, LLM models trained with BeanCounter outperformed baseline models by 1-4%. By reducing toxicity and maintaining demographic representation, BeanCounter is a benchmark for building safer, more respectful, and unbiased content, particularly in professional applications like business communications or customer interactions.

Conclusion

Language models, through the diverse application of embeddings, are driving innovation across diverse industries by both addressing traditional challenges and uncovering new pathways for understanding complex systems. Just as Booth faculty have advanced the field of finance by applying machine learning techniques to solve enduring problems and develop novel frameworks for financial analysis, these models are redefining the landscape in fields such as healthcare and marketing. By leveraging the capabilities of language models, industries are not only enhancing operational efficiency but also transforming their approaches to decision-making, customer interaction, and data-driven insights.

Want to explore generative AI and LLMs further? Check out CAAI’s interactive explanation of LLMs here