UK Retail Bank - Automating PPI Claims Operations Use Case (Guest blog by KPMG)

Guest blog by Tao Guo, Director, Data Science & AI at KPMG #AIWeek2023

What is the purpose of the AI application?

Banks faced a massive volume of complaints and queries ahead of the final Payment Protection Insurance (PPI) complaints deadline at the end of August 2019. These complaints have been received in a variety of formats and templates across both electronic (e.g., emailed PDFs, XLS with pre-submission lines) and paper forms (e.g., scanned cover letters and letters of authority). KPMG developed a solution which allowed banks to automatically extract key complaint features along with other important fields and de-duplicate complaints without the need to manually review them. Thus, accelerating and optimizing case allocation and facilitating the development of specific treatment strategies for faster and more effective case resolution.

How was it developed?

There were mainly Pre-Submission, Queries and Complaints data which was received through mail, online forms, and paper forms (scanned images). The key information to be extracted from these documents include CMC/ Customer name their address, DOB etc. along with complaint if it was mentioned. Since majority of data was scanned images, KPMG developed a solution which extracts the text using Optical Character Recognition (OCR) and preprocessed this text with advanced Natural Language Processing (NLP) techniques for the automatic extraction of information. Regex and Embedding techniques were also implemented to improve the extraction process and identifying the unique type of complaints and categorize them to accelerate the resolution.

To what extent does the AI application aim at full automation vs. enhancing human judgment?

KPMG implemented a complete solution which extract the key information from the PPI forms and stores this unstructured data into structured format after pre-processing. This entire solution is fully automated however, KPMG has also built a validation tool which compares the output against the actual document. This tool enables a human to monitor the model performance over time and enhance their judgment in case of new format or poor document image.

What outcomes have been observed so far?

KPMG developed the end-to-end solution and deployed in a secure AWS environment (AWS lambda functions and S3). The use of cloud (AWS) was leveraged for cost, speed and scalability. KPMG built a new PPI request database with the extracted information that enabled data enhancement from different sources, de-duplication across channels and the ability to perform analytics to identify customer cohort prioritization.

How is impact monitored and what performance benchmarks have been set?

KPMG built a validation tool which enables the bank case handlers to review documents and update the extracted information where necessary; displaying all the documents corresponding to a customer; affording faster processing of unstructured case submissions and headcount reduction. It was observed that the data extraction success rate on key fields (i.e., relevant information from documents) was 99.6% of the raw data, and a 90% match rate against existing complaint data.

How have risks been identified and mitigated?

There is potential risk in information extraction automation process due to the various formats available of the same document. Also, the scanned documents pose major challenges in how the pages are typically scanned in terms of the image quality and the skewness of the text on the page which were addressed through image processing techniques with the aim of enhancing the quality of the scan and ultimately the correctness of the text extracted through the OCR.

KPMG also provided a validation tool to compare the extracted data quickly against the actual document and verify the data quality. KPMG also recommended to monitor this match rate metric so that appropriate action can be taken in case when it deteriorates over period.

What regulation has been considered and how has compliance been ensured?

There were PII data present in documents. KPMG team strictly followed KPMG policy and process for handling PII data.

What are the barriers to AI deployment in this area? How could these be reduced and what are the potential benefits?

The significant challenges associated with Information Extraction AI models are continuous mining and analytics need to comprehend the same as growth rate of unstructured data is quite high. Scalability, dimensionality, and heterogeneity of unstructured data appear as the main challenges to harvest useful information. Therefore, even an efficient AI model which is well-trained on familiar dataset may degrade quickly and may not work as expected. Currently, lot of efforts are going on text generation models like GPT etc rather than text extraction which might eliminate this limitation and create a breakthrough in this domain.

AI and Data Analytics updates

Sign-up to our monthly newsletter to get the latest updates and opportunities from our AI and Data Analytics Programme straight to your inbox.

Authors

Dr Tao Guo

Director of Data Science & AI, KPMG

Return to listing

UK Retail Bank - Automating PPI Claims Operations Use Case (Guest blog by KPMG)

AI and Data Analytics updates

Authors

Dr Tao Guo

Building the Smarter State Conference