This is the first in a series of articles describing the process of creating a comprehensive, canonical source of documentation for the Midlife in the United States (MIDUS) research project. The goal is to take the diverse set of sources that currently document the MIDUS study and create a standardized, DDI 3-based set of documentation that better enables researchers to discover and use the MIDUS data. We chose DDI 3 because it is the only standardized metadata format capable of storing the breadth of information required for the MIDUS study. The project is a joint effort between MIDUS and Colectica.
- Part 1: Taking Inventory
- Part 2: Importing Diverse Documentation Sources into DDI 3
MIDUS is a large, longitudinal study operated by the University of Wisconsin. From the MIDUS web site:
The first MIDUS investigation was conducted in 1994/95 with a sample of over 7000 Americans, aged 25 to 74. It was funded by the John D. and Catherine T. MacArthur Foundation. In 2002, the National Institute on Aging provided a grant to the Institute on Aging at the University of Wisconsin, Madison to carry out a longitudinal follow-up. Data collection for this second wave began in 2004 and was completed in 2009.
A new round of data collection will begin in 2011.
MIDUS data are used by social science researchers throughout the world, and an important goal of the project is to make its data easily accessible and understandable by these researchers.
Existing Documentation and Resources
The first step toward creating a comprehensive set of documentation for the MIDUS study is to take an inventory of the available sources of data and documentation. Some of these sources are available through the MIDUS web site or through the ICPSR archive, while others are internal resources.
Let’s take a look at each type of resource and see what information it provides for our desired documentation.
MIDUS data are made available in SAS, SPSS, and Stata formats both through the MIDUS web site and through ICPSR. These files contain variable and value labels, and of course they can be used to calculate summary statistics and frequencies.
Web Documents and Codebooks
The MIDUS web site provides detailed descriptions for each of its sub-projects, funding sources, links to related materials, and codebooks that contain more extended labels than are found in the SPSS data files. We will want to maintain all this information in the new, standardized documentation.
Many of these web documents are automatically generated from DDI 2. The MIDUS DDI 2 describes codebook level details such as dataset contents, variable labels, summary statistics, and frequencies. It also has the question text that was used to collect data for each variable.
MIDUS staff use several spreadsheets to document their data and internal processes.
- Extended Metadata
These spreadsheets list variables found in the data and provide longer labels, text of source questions, and other related information.
- Concept Mapping
These spreadsheets define concepts studied by MIDUS, and associate the concepts with the related variables found in the MIDUS datasets.
These spreadsheets list variables from the main MIDUS studies along with Japanese language question text for the corresponding variables in the MIDUS Japan project.
Computer Assisted Interviewing Source Code (CASES)
The surveys used to collect the MIDUS data were administered by interviewers using the CASES computer assisted interviewing system. The CASES source code for these surveys is still available, and provides the one true representation of the survey respondents answered, with precise question text and routing instructions.
MIDUS makes available many documents as PDF downloads. These include information about classification systems, methodological decisions, and more.
Putting it all Together
This is a lot of stuff, and it’s spread widely across different web sites and organizations. There isn’t one source where MIDUS data users can access everything. The goal of this project is to wrap all this information into one package using a standardized metadata model. Once we have that package, we will generate the types of documentation that researchers find useful, including:
- Web sites
- Codebooks, both printable and interactive
- Cross reference/concordance tables
- User manuals
Besides being able to generate these types of documentation, the resulting package can be shared with archives like ICPSR. The archive will then have the full set of information necessary for researchers to understand the MIDUS data well into the future. Archives can also ingest the standardized package into their own metadata systems to generate the types of documentation important to their users.
The sources of information about MIDUS map quite well to the DDI 3 elements. The table below summarized how the existing MIDUS sources will be mapped to DDI 3.
|Source||DDI 3 Mapping|
|Data Files||PhysicalInstance, Variable, VariableStatistics|
|Web documents and Codebooks||StudyUnit, FundingInformation, OtherMaterial|
|DDI 2||PhysicalInstance, Variable, QuestionItem, VariableStatistics|
|CAI Source Code||Instrument, QuestionItem, ControlConstructSequence|
The next article will detail how we created the first draft of a DDI 3 metadata package using the MIDUS project’s existing resources. Subsequent articles will show how we expanded on this initial draft and automatically generated end-user documentation.