The future of financial analysis: How GPT-4 is disrupting the industry

May 28

Researchers from the University of Chicago have demonstrated that large language models (LLMs) can conduct financial statement analysis with accuracy rivaling and even surpassing that of professional analysts. The findings, published in a working paper titled “Financial Statement Analysis with Large Language Models,” could have major implications for the future of financial analysis and decision-making.

The researchers tested the performance of GPT-4, a state-of-the-art LLM developed by OpenAI, on the task of analyzing corporate financial statements to predict future earnings growth. Remarkably, even when provided only with standardized, anonymized balance sheets, and income statements devoid of any textual context, GPT-4 was able to outperform human analysts.

“We find that the prediction accuracy of the LLM is on par with the performance of a narrowly trained state-of-the-art ML model,” the authors write. “LLM prediction does not stem from its training memory. Instead, we find that the LLM generates useful narrative insights about a company’s future performance.”

Chain-of-thought prompts emulate human analyst reasoning

A key innovation was the use of “chain-of-thought” prompts that guided GPT-4 to emulate the analytical process of a financial analyst, identifying trends, computing ratios, and synthesizing the information to form a prediction. This enhanced version of GPT-4 achieved a 60% accuracy in predicting the direction of future earnings, notably higher than the 53-57% range of human analyst forecasts.

“Taken together, our results suggest that LLMs may take a central role in decision-making,” the researchers conclude. They note that the LLM’s advantage likely stems from its vast knowledge base and ability to recognize patterns and business concepts, allowing it to perform intuitive reasoning even with incomplete information.

The findings are all the more remarkable given that numerical analysis has traditionally been a challenge for language models. “One of the most challenging domains for a language model is the numerical domain, where the model needs to carry out computations, perform human-like interpretations, and make complex judgments,” said Alex Kim, one of the study’s co-authors. “While LLMs are effective at textual tasks, their understanding of numbers typically comes from the narrative context and they lack deep numerical reasoning or the flexibility of a human mind.”

Some experts caution that the “ANN” model used as a benchmark in the study may not represent the state-of-the-art in quantitative finance. “That ANN benchmark is nowhere near state of the art,” commented one practitioner on the Hacker News forum. “People didn’t stop working on this in 1989 — they realized they can make lots of money doing it and do it privately.”

Nevertheless, the ability of a general-purpose language model to match the performance of specialized ML models and exceed human experts points to the disruptive potential of LLMs in the financial domain. The authors have also created an interactive web application to showcase GPT-4’s capabilities for curious readers, though they caution that its accuracy should be independently verified.

As AI continues its rapid advance, the role of the financial analyst may be the next to be transformed. While human expertise and judgment are unlikely to be fully replaced anytime soon, powerful tools like GPT-4 could greatly augment and streamline the work of analysts, potentially reshaping the field of financial statement analysis in the years to come.

Daniel Miller

The future of financial analysis: How GPT-4 is disrupting the industry

Chain-of-thought prompts emulate human analyst reasoning

Understanding the Lifecycle of Equipment Financing

Ex-IRS Officer and Brother Sentenced for Stealing Millions in COVID-19 Relief Funds

Free 30-Minute Consultation