Parsing Digitized Vietnamese Paper Documents

Abstract

In recent years, the need to exploit digitized document data has been increasing. In this paper, we address the problem of parsing digitized Vietnamese paper documents. The digitized Vietnamese documents are mainly in the form of scanned images with diverse layouts and special characters introducing many challenges. To this end, we first collect the UIT-DODV dataset, a novel Vietnamese document image dataset that includes scientific papers in Vietnamese derived from different scientific conferences.

Year of Publication
2021
Conference Name
International Conference on Computer Analysis of Images and Patterns
Date Published
01