Scale Events
timezone
+00:00 GMT
Livestream
[CANCELED] Using Deep Learning to Understand Documents

***Apologies, but this Tech Talk was canceled. We have a great lineup of Tech Talks and other events being scheduled shortly for 2022. Come back shortly to our events pages to register. ***

Extracting key-fields from a variety of document types remains a challenging problem. Services such as AWS, Google Cloud and open-source alternatives provide text extraction to "digitize" images or pdfs to return phrases, words and characters. Processing these outputs is unscalable and error-prone as varied documents require different heuristics, rules or models and new types are uploaded daily. In addition, a performance ceiling exists as downstream models depend on good yet imperfect OCR algorithms.

We propose an end-to-end solution utilizing computer-vision based deep learning to automatically extract important text-fields from documents of various templates and sources. These produce state-of-the-art classification accuracy and generalizability through training on millions of images. We compare our in-house model accuracy, processing time and cost with 3rd party services and found favorable results to automatically extract important fields from documents.

Bill.com is working to build a paperless future. We process millions of documents a year ranging from invoices, contracts, receipts and a variety of others. Understanding those documents is critical to building intelligent products for our users.

Speakers
Eitan Anzenberg
Eitan Anzenberg
Chief Data Scientist @ Bill.com
Agenda
Track View
8:00 PM
8:45 PM
Stage 1
Presentation
calendar
Using Deep Learning to Understand Documents

Extracting key-fields from a variety of document types remains a challenging problem. Services such as AWS, Google Cloud and open-source alternatives provide text extraction to "digitize" images or pdfs to return phrases, words and characters. Processing these outputs is unscalable and error-prone as varied documents require different heuristics, rules or models and new types are uploaded daily. In addition, a performance ceiling exists as downstream models depend on good yet imperfect OCR algorithms.

We propose an end-to-end solution utilizing computer-vision based deep learning to automatically extract important text-fields from documents of various templates and sources. These produce state-of-the-art classification accuracy and generalizability through training on millions of images. We compare our in-house model accuracy, processing time, and cost with 3rd party services and found favorable results to automatically extract important fields from documents.

Bill.com is working to build a paperless future. We process millions of documents a year ranging from invoices, contracts, receipts, and a variety of others. Understanding those documents is critical to building intelligent products for our users.

+ Read More
Eitan Anzenberg
Event has finished
January 12, 8:00 PM, GMT
Online
Event has finished
January 12, 8:00 PM, GMT
Online