What Is Data?
Data vs. Statistics
Data are raw ingredients from which statistics are created. Statistics are
useful when you just need a few numbers to support an argument (ex. In 2003,
98.2% of American households had a television set--from Statistical
Abstract of the United States). They are usually presented in tables.
Statistcal analysis can be performed on data to show relationships among the
variables collected. Through secondary data analysis, many different researchers
can re-use the same data set for different purposes.
Aggregate/Macro Data vs. Microdata
Aggregate or Macro Data are higher-level data that have been compiled from
smaller units of data. For example, the Census data that you find on AmericanFactfinder
have been aggregated to preserve the confidentiality of individual respondents.
Microdata contain individual cases, usually individual people, or in the case
of Census data, individual households. The Integrated Public Use Microdata
Sample (IPUMS) for the Census provides access to the actual survey data from
the Census, but eliminates information that would identify individuals.
Data Sets, Studies, and Series
In ICPSR, a data set or study is made up of the raw data file
and any related files, usually the codebook and setup files. The codebook
is your guide to making sense of the raw data. For survey data, the codebook
usually contains the actual questionnaire and the values for the responses
to each question. The setup files help will not display properly.
ICPSR uses the term series to describe collections of studies that
have been repeated over time. For example, the National Health Interview Survey
is conducted annually. In the ICPSR archive, you will find a description of
the series that provides an overview. You will also find individual descriptions
of each study (i.e. National Health Interview Survey, 2004). The study
number in ICPSR refers to the individual survey.
Types of Data
Cross-Sectional describes data that are only collected once.
Time Series study the same variable over time. The National Health
Interview Survey is an example of time series data because the questions generally
remain the same over time, but the individual respondents vary.
Longitudinal Studies describe surveys that are conducted repeatedly,
in which the same group of respondents are surveyed each time. This allows
for examining changes over the life course. The Project on Human Development
in Chicago Neighborhoods (PHDCN) Series contains a longitudinal component
that tracks changes in the lives of individuals over time through interviews.
For more definitions, I highly recommend the Glossary
of Selected Social Science Computing Terms and Social Science Data Terms compiled by Jim Jacobs, Data Services Librarian, UCSD.