Financial data extraction

Transforming Financial Data Extraction With AI for a Global Professional Services Firm

Leveraging Generative AI, we automated the extraction of data from 3,000 to 7,000 financial documents yearly for a professional services firm, saving 3,100 hours annually.

Person examining a bar chart on a computer screen with a magnifying glass

The Challenge

Our client is a global professional services provider operating in numerous industries. They had specific challenges related to the extraction and analysis of information from a vast collection of financial reports containing both structured and unstructured data—such as footnotes or contracts. The main need was to extract specific fields of interest from these documents, a task our client used to perform manually, consuming substantial time and resources.

They had a necessity to train and deploy AI models able to extract the right information from all the different types of documents. The first challenge was the lack of a dataset for document classification. Additionally, the initial interaction of the system lacked a test dataset, which was necessary for achieving high precision in the extracted fields. Finally, there was a need to design an efficient validation mechanism for the extracted data.

Solution

Our client had a necessity to train and deploy AI models able to extract the right information from all the different types of documents. Together with our client’s team, CloudX leveraged Generative AI to build an advanced data pipeline designed for extracting and analyzing information from financial reports.

Human head with gears symbolizing AI, and two people working on it

Phase 1: Classification

We utilized Azure OpenAI to generate a classification dataset, in which a series of documents were labeled according to their specific types. This dataset was then used to train the classification model, using embeddings and Random Forest classification. This algorithm categorizes new, unseen documents with a high confidence level.

Robot interacting with floating data panels

Phase 2: Extraction

We developed a solution using Retrieval-Augmented Generation (RAG) and prompts specifically designed to extract the correct fields from the already classified documents. In an iterative process, both the prompts and the RAG model were continuously tested and improved, with the goal of increasing precision in the model’s answers.

Three people interacting with digital devices

Phase 3: UAT

In this stage we performed iterative User Acceptance Testing (UAT), testing our solution with its final users: the business team. Their deep business knowledge helped us determine whether the precision of the extraction was high enough or if it could be improved. We built a user interface (UI) embedded in Microsoft Teams that allows users to operate the extracted fields, manually editing them or even modifying the prompts to refine the extraction process. Once the required data is ready, it can be exported to a CSV file and made available for analyzing and sharing as needed.

Results

The implementation of our AI-driven solution marked a significant milestone for our client, addressing a challenge that had persisted for nearly a decade. Remarkably, within just 8 months, we developed a Minimum Viable Product (MVP) that effectively resolved this long-standing issue.

Our solution efficiently processes between 3,000 and 7,000 documents annually, automating the extraction of critical information from financial reports. Previously, this task was performed manually, consuming substantial time and resources. The automation not only streamlined the process but also enhanced accuracy and reliability.

Building on this success, we are now extending the solution's capabilities to include an agreement analyzer. This new application is projected to save approximately 3,100 hours annually, demonstrating the scalability and versatility of our AI solution.

Transforming Financial Data Extraction With AI for a Global Professional Services Firm

The project in a nutshell

The Challenge

How this partnership impacted our client’s business

Solution

Phase 1: Classification

Phase 2: Extraction

Phase 3: UAT

Still curious about this project?

Results

The project’s tech stack

Related Content

7 strategic steps to enterprise-ready Generative AI

Generative AI

The AI toolbox: from traditional to generative models