Scale Events
+00:00 GMT
[CANCELED] Using Deep Learning to Understand Documents
LIVESTREAM

[CANCELED] Using Deep Learning to Understand Documents

***Apologies, but this Tech Talk was canceled. We have a great lineup of Tech Talks and other events being scheduled shortly for 2022. Come back shortly to our events pages to register. ***

Extracting key-fields from a variety of document types remains a challenging problem. Services such as AWS, Google Cloud and open-source alternatives provide text extraction to "digitize" images or pdfs to return phrases, words and characters. Processing these outputs is unscalable and error-prone as varied documents require different heuristics, rules or models and new types are uploaded daily. In addition, a performance ceiling exists as downstream models depend on good yet imperfect OCR algorithms.

We propose an end-to-end solution utilizing computer-vision based deep learning to automatically extract important text-fields from documents of various templates and sources. These produce state-of-the-art classification accuracy and generalizability through training on millions of images. We compare our in-house model accuracy, processing time and cost with 3rd party services and found favorable results to automatically extract important fields from documents.

Bill.com is working to build a paperless future. We process millions of documents a year ranging from invoices, contracts, receipts and a variety of others. Understanding those documents is critical to building intelligent products for our users.

Speakers

Eitan Anzenberg
Chief Data Scientist @ Bill.com

Agenda

Track View
From8:00 PM, GMT
To8:45 PM, GMT
Tags:
Stage 1
Presentation
Using Deep Learning to Understand Documents

Extracting key-fields from a variety of document types remains a challenging problem. Services such as AWS, Google Cloud and open-source alternatives provide text extraction to "digitize" images or pdfs to return phrases, words and characters. Processing these outputs is unscalable and error-prone as varied documents require different heuristics, rules or models and new types are uploaded daily. In addition, a performance ceiling exists as downstream models depend on good yet imperfect OCR algorithms.

We propose an end-to-end solution utilizing computer-vision based deep learning to automatically extract important text-fields from documents of various templates and sources. These produce state-of-the-art classification accuracy and generalizability through training on millions of images. We compare our in-house model accuracy, processing time, and cost with 3rd party services and found favorable results to automatically extract important fields from documents.

Bill.com is working to build a paperless future. We process millions of documents a year ranging from invoices, contracts, receipts, and a variety of others. Understanding those documents is critical to building intelligent products for our users.

+ Read More
Speakers:
Eitan Anzenberg
Event has finished
January 12, 8:00 PM, GMT
Online
Event has finished
January 12, 8:00 PM, GMT
Online