Skip to main content

Modalities of Text Mining



About the Project

How can you identify and explore patterns across millions of documents? In this fellowship, Library Buchanan Fellows learned state-of-the-art techniques for text mining at scale. Fellows joined an ongoing research project to analyze constellations of information in Proquest’s British Periodicals Collections. Depending on interest, fellows learned to use Apache Spark, a framework for querying distributed data sets; BaseX, a native XML database; or Netsblox, a block-based programming language. They learned how to extract information from big data sets in the humanities, social sciences, or other fields with relative ease and confidence.

The Fellows

Emma Boldwyn, Shwe Khin, Rohit Khurana, Yuzhe Lu, Erskine Nyoike

The Instructors

Mark Schoenfield, professor of English, interim director of undergraduate studies, English

Cliff Anderson,associate university librarian for research and digital strategy, interim director