Skip to main content

Cursive & Recursive: Generating Transcriptions of Archival Documents Using Machine Learning

About the Project

Vanderbilt’s Special Collections has a wealth of handwritten or early modern material that is difficult for computers to read. Optical Character Recognition (OCR) has come a long way, but still struggles with these texts. Fellows digitized select manuscripts and learned to produce transcriptions using machine learning techniques to teach the computer to recognize handwriting. They built a simple web exhibit displaying the digitized manuscript and its transcription side by side. Fellows learned project management skills, collaboration, and version control with Github; how machine learning works and when it doesn’t; and data management and project documentation best practices.

The Fellows

Abrahan Liddell, Rachel Wei, Kai Malcolm, Indraneel Pai, Nilai Vemula, Sahas Goli, Alfred Prah and Michelle Lin

The Instructors

Sarah Swanz, Librarian for Digital Media and Publishing

Nathan Jones, Archivist, Manager of Digital Imaging Laboratory