Home United States USA — software Fine-Tuning Microsoft’s LayoutLM Model for Invoice Recognition

Fine-Tuning Microsoft’s LayoutLM Model for Invoice Recognition

November 15, 2021

196

In this article, we will fine-tune the recently released Microsoft’s Layout LM model on an annotated custom dataset that includes French and English invoices.
Join the DZone community and get the full member experience. Image Credit Building on my recent tutorial on how to annotate PDFs and scanned images for NLP applications, we will attempt to fine-tune the recently released Microsoft’s Layout LM model on an annotated custom dataset that includes French and English invoices. While the previous tutorials focused on using the publicly available FUNSD dataset to fine-tune the model, here we will show the entire process starting from annotation and pre-processing to training and inference. The LayoutLM model is based on BERT architecture but with two additional types of input embeddings. The first is a 2-D position embedding that denotes the relative position of a token within a document, and the second is an image embedding for scanned token images within a document. This model achieved new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24), and document image classification (from 93.07 to 94.42). For more information, refer to the original article. Thankfully, the model was open-sourced and made available in the huggingface library. Thanks, Microsoft! For this tutorial, we will clone the model directly from the huggingface library and fine-tune it on our own dataset. Here is a link to Google Colab but first, we need to create the training data. Using a text annotation tool, I have annotated around 50 personal invoices.