|
Preparing ICPSR Data for Use with SAS
(by Amy Yuen)
Generally speaking, the datasets that the ICPSR makes available for download need at least some processing from
users before they are in an electronic format that is usable with statistical packages. Most of the ICPSR's
holdings have command files that allow users to read the raw data into SPSS, and the process for reading the raw
files into SPSS is usually quite easy (click here for help in reading ICPSR data into SPSS).
Unfortunately, however, not all ICPSR studies have SPSS command files (and not all ICPSR users want their data in
SPSS). This tutorial is designed for situations
where a user wants an ICPSR dataset for which the only available command files are for reading data into SAS. It
is intended for users who have little or no experience with SAS and wish to work with the data in another stats
package. This guide will walk them through how to move the raw data into SAS so that the data can be converted into other formats using StatTransfer.
(1) Start by downloading the datafile and SAS program file from the ICPSR website for the study you want to
download:

Click here for full-size image
Click on the "Download" link. You will be taken to an "Authorized Download - Emory University" page that
will ask you for the username and password for your ICPSR account - all ICPSR users are required to have such
accounts if they wish to download data, so you will need to set one up if you do not have one already. Enter
your username and password and, once you have been authenticated, you will see something like the following
screen:

Click here for full-size image
The ICPSR presents users with multiple options for downloading data. Generally speaking, the ICPSR will have
files available for different statistical packages. Here, for instance, there are ASCII data and setup files
available for SAS and SPSS as well as an SPSS "portable" file. [You can get additional detail about the files
available for a study by reading the file manifest that is available on the "Description" page.] You can
download just those files for a particular program (e.g. "ASCII Data File and SAS Setup Files") or download all
the available files for a particular study. Whichever files you select will then be added into your "data cart"
for download. If you go the data-cart route, you will be downloading a zipped archive of whatever files you
chose. Alternately, you can click on the "download individual files" link at the right and download files one at
a time. For our purposes, we only want the raw ASCII data file and the SAS setup file, so we will choose this
last route. Click on the "download individual files" link and you will taken to this page:

Click here for full-size image
(2) Now we can save the data and the SAS program file. First, click on the "Data" link and choose "Save Target
As...":

Click here for full-size image
Save the data in the desired location in your computer and be sure to add ".txt" to the end of the filename.
The "Save as Type" field should be "Text Document" Here, we'll save the data file as "da3966.txt":

Click here for full-size image
[Note that we are not using the default file names that the ICPSR assigns. Instead, we are using an older ICPSR
convention for naming files, mainly because the resulting file names are shorter. How you wish to name the files
is up to you - just be careful about what file extension you specify.]
Next, we need to save the "SAS setup" file, which is the SAS command file that we will use
to move the raw ICPSR data into SAS. Save the command file by right-clicking on the "SAS setup" link and saving
it to your hard drive. Here, we will name the file "sa3966.sas." Be sure to add ".sas" to the end of the
filename; otherwise, SAS will not recognize the file as a SAS program. So, the "Save as type" field should be
either "All Files" or "SAS Program" (either option will work - just make sure you attach the .sas extension):

Click here for full-size image
(4). Next, open the SAS program. Open the SAS command file you just saved in your working directory. The
file will open in the Editor window in the bottom right of the SAS window. You may notice that some of the
text in the file is in green. Such text is "commented out," which means that it will be ignored by SAS when
it is processing the command file. Generally, such text contain descriptions of the study and/or some
instructions for the user. Sometimes, however, the ICPSR will also comment out commands that you may need
when reading the data into SAS. Text and commands that are commented out are usually enclosed by a /* at the
start of the commented-out text and a */ at the end. If you remove these characters, the text should no longer
be green.
(a) Scroll down to the first command, which is likely to be "PROC FORMAT" in dark blue. Just before this command,
you need to add a command to create a library in SAS. In SAS, a "library" is essentially a pointer or shortcut
to a particular location on your computer. If you do not create a library, the data will be stored in SAS's
"work" library and will be purged when your session is over. Creating the library as part of the command file
ensures that (1) the resulting dataset will be stored in a place specified by you, the user, and (2) your hard
work will not be deleted when you close SAS out. To create a library, add the following command to the
command file just above the "PROC FORMAT" command:
LIBNAME libraryname "file path location";
Example: LIBNAME da3966 "C:\data\3966";
The file path in the double quotes should be the location where you want to save your data.
(IMPORTANT - Don't forget to end command lines with a semi-colon.)
(b) Right after the "PROC FORMAT" command, insert the following command:
LIBRARY=libraryname.formatfilename;
Example: PROC FORMAT LIBRARY=da3966.for3966;
The PROC FORMAT command creates what is called a catalog file in
which value labels for variables are stored. While SPSS and Stata embed
value labels within data files, SAS actually stores them in a separate
file - i.e. the catalog file. The PROC FORMAT command starts the process
for creating the value labels and saves them in the specified library
under the specified file name. In this example, SAS will create a catalog
file called "for3996" and save it in a library titled "da3966" (i.e. at
the location "C:\data\3966"). The commands that follow (VALUE, etc) assign
the values for each variable. These can be left as they are.

Click here for full-size image
(5). The next step is to scroll down to the end of the VALUE commands to the input section. You should see
the word "DATA" in dark blue followed by an "INFILE" command:
DATA;
INFILE "physical-filename" LRECL=439;
INPUT
(a) Above "DATA", add this line:
OPTIONS FMTSEARCH = (libraryname.formatfilename);
Example: OPTIONS FMTSEARCH = (da3966.for3966);
This command instructs SAS to look for the catalog file we created above in the library we created above.
SAS uses this information when attaching value labels to variables later in the command file (i.e. as part of
the "format" command which we will see below).
(b) On the same line as "DATA," add the library name and a new data filename for the dataset:
DATA libraryname.newdatafilename;
Example: DATA da3966.da3966;
A SAS data file titled "da3966" will now be created and saved in library da3966 (i.e. C:\data\3966\ in this
example).
(c) Next, fill in the file path and filename of the original raw data file
in place of "physical-filename," right after the "INFILE" command. This will tell SAS
where to find the raw (*.txt) data file that you want to use:
INFILE "physical-filename" LRECL=439;
Example: INFILE "C:\data\3966\da3966.txt" LRECL=439;
Your command file should then look something like the following:

Click here for full-size image
(6). As we scroll further down the command file, we should encounter a SAS FORMAT statement, which attaches
the value labels created above to specific variables in the data file:

Click here for full-size image
(7). At the end of the format file you will see the command "RUN". Above that command, add:
PROC FORMAT LIBRARY = libraryname.formatfilename CNTLOUT = libraryname.datafilename;
Example: PROC FORMAT LIBRARY = da3966.for3966 CNTLOUT = da3966.da3966_fmts;
This command reads the contents of the catalog file we created earlier (i.e. for3966) and puts them into
a separate SAS data file in a specified library. This step is necessary to successfully transfer the data
and value labels from SAS into SPSS or Stata, as we shall see
later.

Click here for full-size image
(8). After saving the command (.sas) file, click on the icon of the Running Man in the toolbar, or click on "Run" in the
upper menu and choose "Submit". The data should be read in and successfully
formatted as a SAS file:

Click here for full-size image
(9). To view the data file, click on the file drawer icon labeled 'Libraries'
and then click on the library that you created above (e.g. Da3966 in this tutoral):

Click here for full-size image
Then, click on the icon that looks like a spreadsheet with a red dot in the bottom right corner.
The name of the icon should be the name that you gave the data file; there should not be any suffix.

Click here for full-screen image
Click here for full-screen image
(10). In the directory where you created and saved all your work, you should see the following files listed:
(a) the original data file from the ICPSR (e.g. da3966.txt)
(b) the command file from the ICPSR (e.g. sa3966.sas)
(c) a SAS catalog file with a .sas7bcat extension (e.g. for3966.sas7bcat)
(d) a SAS data file with a .sas7bdat extension (e.g. da3966_fmts.sas7bdat); this data file contains the contents
of the catalog file and we will use it later with StatTransfer
(e) another SAS data file with a .sas7bdat extension (e.g. da3966.sas7bdat); this data file contains the actual
data we want to analyze

Click here for full-size image
Sample SAS Command Files:
Below are some sample SAS command files that will read raw ICPSR data into SAS and create the files needed for
converting the SAS data into other formats via StatTransfer. The files were downloaded from the ICPSR and edited
by the EDC staff in the manner described in this guide. To save any of these files, right-click on the appropriate
link and choose "Save Target As..." (in Internet Explorer) or "Save Link Target As..." (Netscape) and choose a
location to save the file(s). The files are for illustrative purposes.
SAS
Command File for ICPSR #3934 - Afrobarometer: Round I Survey of South
Africa, July-August 2000
SAS Command File for
ICPSR #3966 - Afrobarometer: Round I Survey of Tanzania 2001
SAS Command File for
ICPSR #3975 - World Values Surveys and European Values Surveys, 1999-2001
The ICPSR also has a help guide for working with its SAS program files at
http://webapp.icpsr.umich.edu/cocoon/ICPSR-FAQ/0061.xml that you might find useful. Note that not every SAS
syntax file that the ICPSR distributes follows the exact format of the file we've been looking at here - there is
some variation across studies, especially with older studies that have syntax files written for older
versions of SAS. More generally, the process for reading raw data into SAS is more complicated than is the case for
reading raw data into SPSS because of how SAS handles value labels. If you would like additional assistance, please
feel
free to contact the Data Center staff.
|