Automated accounting process using Machine Learning

Want to automate your bill reconciliation process? Accounting made easy and automatic with data science and computer vision.

OCR

Problem statement

Accounting is an integral part of business finance management. Keeping track of manual bills is an exhausting task and errors are likely to be introduced while handling large numbers of the same. Our client is an accounting company who wanted to automate some parts of their bill reconciliation process. Automation will help them perform the accounting quicker with fewer or no errors. The customers of our client will submit the manual bills for the accounting. The client wanted to build an OCR system to convert expense receipt stubs stored as scanned documents and images. To achieve this, we were required to extract elements and fields from the expense receipt stub, namely, date, total price, tax, etc.

Our solution

This project involves automatic OCR conversion of receipt stubs into textual CSV data. We gathered their dataset of receipt scans and performed preliminary data cleanup and grouping. Since all the documents contained standard fonts and languages, developing an OCR program was quite straightforward. There was no customized OCR model development required for this project and so we used the pre-trained LSTM model of Tesseract for this project. The next goal was to impart intelligence to the system by automatically identifying specific text fields in the OCR output. We used a natural language processing framework to model the text context from the text output of the OCR engine. We then packaged this solution as a library that was then integrated into their existing desktop application. The user will load a collection of scanned receipts and the OCR engine will produce a list of CSV files corresponding to the input files.

Key metrics

This project was developed in a time frame of 15 weeks. The OCR engine was very efficient and reduced the manual text conversion time to 0. The OCR accuracy was above 95%.

Technology stack

Logo for tesseract Logo for OpenCV image processing library Logo for tensorflow library Logo for NLTK

Trusted Worldwide By Innovation Driven Companies

Zetwerk logo
Thrift+ logo
Agrics logo
Floord logo
Brainpool logo
Neptune logo
ULC logo
Visualogyx logo