There is no single definition of text mining. In general, text mining is a subdomain of data mining that primarily deals with textual documents rather than discrete data. Text mining gathers documents into corpora and then applies statistical techniques to find associations or patterns between terms. There is no single way of going about mining text. The methods and technologies used will be dictated by the research goals. Steps to a text-mining project may include:
- identifying or building a corpus
- developing a model
- extracting data
- analyzing the data
Automated machine enabled text mining is used across disciplines. There are examples of research using text-mining at Vanderbilt.
The library has access to licensed materials for text mining. Most notably, the LexisNexis Web Services Kit (WSK). Which provides API access to the LexisNexis Academic database of licensed and published materials.
For Vanderbilt community members interested in developing programming skills to aid in text-mining progects, the library hosts an XQuery working group.
To learn more, contact:
Clifford Anderson, Director for Scholarly Communications
Hilary Craiglow, Director Walker Management Library
Your library subject specialist