While extraction systems based on optical character recognition (OCR) can work well on highly structured forms that don’t vary, they don’t work nearly as well in environments where the document structure or layout varies, such as with invoice processing, or where data is less structured, as with mortgage document processing.
Intelligent document processing systems can do the work faster and at higher accuracy, simplifying the lives of engineering and finance teams that used to rely on older OCR technology.
For example, an OCR-based system might produce 90% quality for highly structured forms in a lab setting, but that might drop to 70% or lower in the real world. All too often, the system encounters a wide range of variations of exposure, skew, or foreign objects that disrupt the digitized image used as input to the OCR system.
An adaptive document processing system, which deploys machine learning models for classification, can achieve much better—and yes, even more “intelligent”—results.
Let’s say your ML model produces 90% quality but you need 99%, a common requirement in financial services. A deep learning system based on both computer vision and natural language models can meet that higher quality service-level agreement (SLA) by running selected forms through a human reviewer, in a process known as “human in the loop.”
During processing, the system alerts human reviewers to specific documents or form fields based on the system’s confidence level about its accuracy. If a document doesn’t meet the confidence threshold, it’s routed to a reviewer, who verifies it. That verification information loops back into the system, training the algorithm to learn how to identify the content in those more difficult documents. In this way, an adaptive document processing system can achieve that 99% customer requirement.
Human-in-the-loop QA can be valuable for a number of reasons, not the least of which is liability. For some use cases, the increasingly rare transcription error can mean a missed delivery, or a package that gets stuck in customs. Transcription and extraction errors can prove very costly to businesses, either at the character level or the field/document level.
OCR has effectively become commoditized, with Google now maintaining the popular open-source Tesseract package it took over from Hewlett-Packard. Today, Tesseract includes long short-term memory (LSTM) networks for increasingly accurate character-level recognition.
But research and software engineering progress doesn’t end there: Transformers have paved the way for both document layout detection and form understanding. The future looks bright for achieving ever-greater accuracy for extracting just the right, actionable information from old-school, messy paperwork. And of course, when regulations or quality standards demand it, it’s always helpful to bring humans into the pipeline to further improve the models by catching model errors.