Skip to main content

Text Mining at Scale

About the Project

This semester, we focused particularly on querying large sets of textual data using Apache Spark, a framework for querying distributed data sets, and Sparqsonic, an emerging query language for Spark based on XQuery. After completing the sessions this semester, students should be able to explore and extract information from big data sets in the humanities, social sciences, or other disciplines with ease and confidence.