Skip to main content

Modalities of Textual Analysis: Large-Scale Natural Language Processing on Spark/Databricks


Peixuan "Ancher" Li

Zifeng Liang

Junyan "Johnny" Ou

About the Project

The Modalities of Textual Analysis project explored different approaches to analyzing ProQuest's “British Periodicals Collections” using natural language processing on Databricks/Spark. Students had the opportunity to explore building custom NLP pipelines for projects like the correction of optical character recognition and document classification. 

The Fellows

Peixuan "Ancher" Li, Zifeng Liang and Junyan "Johnny" Ou

The Instructors

Mark Schoenfield, professor of English, interim director of undergraduate studies, English

Cliff Anderson,associate university librarian for research and digital strategy, interim director