Cursive & Recursive: Generating Transcriptions of Archival Documents Using Machine Learning
About the Project
Vanderbilt’s Special Collections has a wealth of handwritten or early modern material that is difficult for computers to read. Optical Character Recognition (OCR) has come a long way, but still struggles with these texts. Fellows digitized select manuscripts and learned to produce transcriptions using machine learning techniques to teach the computer to recognize handwriting. They built a simple web exhibit displaying the digitized manuscript and its transcription side by side. Fellows learned project management skills, collaboration, and version control with Github; how machine learning works and when it doesn’t; and data management and project documentation best practices.
Abrahan Liddell, Rachel Wei, Kai Malcolm, Indraneel Pai, Nilai Vemula, Sahas Goli, Alfred Prah and Michelle Lin
Sarah Swanz, Librarian for Digital Media and Publishing
Nathan Jones, Archivist, Manager of Digital Imaging Laboratory