By Meredith Alexander Kunz, Adobe Research
In the future, we would all like to do more intelligent things with our documents—to be able to share them in new ways, communicate outside their typical boundaries, automatically repurpose their contents.
But before we can get to these magical tasks, we need to know what’s in our documents. And because, traditionally, many of our texts start as handwritten notes, there’s an important first step: how can we help computers to accurately (and automatically) understand and digitize people’s handwriting?
Curtis Wigington, a research software developer at Adobe Research, has focused on this question, and his experimental work leveraging AI is opening new doors in the quest to transform handwritten pages into digital documents.
Wigington, along with Brian Price, Scott Cohen, and Chris Tensmeyer of Adobe Research, and Brigham Young University collaborators Brian Davis and Professor William Barrett, tackled a tough set of pages: images of archived historical documents from the 18th and 19th centuries that were written in slanting cursive handwriting. The letters and notebooks also often suffered from physical deterioration or bleed-through from ink on the other side.
This dataset proved a worthy challenge for this research team as they developed a holistic new way for computers to read and digitally extract handwriting.
The crux of their work is a model called “Start, Follow, Read,” which was shared in a research paper at ECCV (European Conference on Computer Vision) 2018.
This research pathway began during Wigington’s work as a graduate student at BYU. In 2017, he and fellow researchers took home the top prize in the ICDAR (International Conference on Document Analysis and Recognition) handwriting recognition competition using a more traditional approach. That work inspired him to begin developing a pioneering new method that would, ultimately, significantly outperform the previous one.
The novel approach taps into deep learning and computer vision to automate a computer’s understanding of what it sees written on the page. Prior methods used independent line segmentation (taking a document apart line by line) and line transcription to decipher old handwriting. Wigington and team’s work combined these methods, employing several machine learning networks to work largely without human labels identifying sections of the text—on whole pages of handwriting.
“This is full-page handwriting recognition without the need to do separate line segmentation or text detection,” says Cohen. “It’s very innovative to do the whole process at once.”
The system’s first step is to find the “start of line” of each piece of handwriting. This method searches a document image to identify where each line begins using computer vision techniques and a specific network designed for the task.
Next, the model figures out how to follow along on each line of handwriting, employing a line follower network. The system then de-warps curved lines of text and bends the words into a straight line. At that point, the computer can apply classic handwriting recognition. The system was trained on a set of archived historical documents written in European languages.
Predicting where lines start and following lines of writing on the page were novel ideas that enabled a more holistic approach to a whole sheet of writing. “Our approach merges the ideas of line segmentation and line-level recognition,” Wigington reports.
It’s work that could not only help improve online cultural heritage websites that host archived documents. It also holds promise for deciphering all those notebooks filled with precious ideas, just waiting to burst free from our desks.
Contributors: Curtis Wigington, Chris Tensmeyer, Brian Price, Scott Cohen (Adobe Research)
Brian Davis, William Barrett (Brigham Young University)