Return to UOCC HomeComputing News Home
Header bar

Some Updates on Designing Your Data for Transfer into SAS

Robin High
Statistical Programmer and Consultant
robinh@uoregon.edu

In the last issue of Computing News (http://cc.uoregon.edu/cnews/spring2003/datadesign.html), I described how to best design your data entry in Excel for transfer to a data analysis program like SAS or SPSS.

In the weeks since that article appeared, a few things about communication between Excel and SAS have surfaced, and the purpose of this article is to provide you with some pertinent updates.

Transferring Data from Excel to SAS

CSV and TXT files. One option I described was to create a comma-separated value (CSV) file or a tab-delimited (TXT) file of each individual worksheet, and then have SAS read these text files directly. (Before you save individual Excel worksheets as text files, be sure to save and back up your Excel workbook first, especially if it contains multiple worksheets. When you have finished exporting individual worksheets, exit the workbook without saving it.)

Whether you use Windows or Unix, you'll enjoy some advantages if you create CSV files with Excel and then read them directly with the SAS DATA step (or with DATA LIST in SPSS). It may not seem like this approach would be better than reading the Excel files directly, but it does have advantages if you have large data files, missing data, or inconsistently formatted columns. The DATA step allows you to read and format each variable, instead of leaving it to SAS to try to figure it out. How SAS does this is briefly described below.

Excel can also read comma- or tab-delimited text files directly, as easily as it can write them. By using CSV and TXT files, you will save on the time and disk space required to format every cell, substantially reducing the size of large files while ensuring that the file can be opened directly in either Excel or in SAS. However, keep in mind that problems can occur when importing text files into Excel that contain date variables or character data that Excel will interpret to be dates.

Close the Excel workbook before importing data. Before you attempt to run the PROC IMPORT statements or use the IMPORT wizard with SAS, you need to close the workbook in Excel first. For data integrity reasons, SAS and Excel aren't allowed to have access to the data at the same time. Even though you may have followed all the instructions correctly, SAS will give you an error message and fail to import the data if the file is still open in Excel.

Here are the statements for importing an Excel file into SAS:

PROC IMPORT DATAFILE="c:\<path>\test.xls"
OUT=test DBMS=excel2000 REPLACE;
SHEET="sequence";
RANGE="A15:H35";
GETNAMES=yes;
run;

Importing a portion of your Excel worksheet using the RANGE statement. One option I did not mention in my earlier article is that the IMPORT procedure allows you to read data from a specified portion of your Excel worksheet with the RANGE statement. It's not necessary to use this statement if you want to read the entire worksheet, but to specify only a portion of the worksheet, enter the range of cells between double quotes in a rectangular designation of upper left corner (A15) to lower right corner (e.g., H35), separated by a colon. This range assumes you have entered the variable names in Row 15.

If you are transferring one or more variables formatted as dates in Excel, SAS will read them correctly, but will save them with DATETIME formats (e.g., day/month/year/hour/min/sec) rather than a DATE (e.g., mm/dd/yyyy) format. This particular format may not work as you expect if you want to work only with the month, day, and/or year. To convert the date variable to a different format, use statements like the following:

DATA test1;
SET test;
DROP date;
FORMAT date_t mmddyy10. ;
date_t=DATEPART(date);
RUN;

Cell formatting. A common problem with either the import wizard or the PROC IMPORT statements is that if you haven't been consistent with cell formats in each column, SAS may get confused regarding the type of data it is actually trying to read. SAS scans the first 20 rows of an Excel worksheet to determine data types for each column. If for any particular column the first 20 rows contain all numbers (such as 3.3, 4.92, 2.0, 5, 3.1, etc.) or blank cells formatted as numbers, SAS will assign that variable a numeric format. If any data other than numeric are read in that column after row 20, SAS will set the value of that row and column as missing in the dataset. However, if within the first 20 rows any non-numeric data are found (such as the characters ‘NA,' ‘M,' or a period for missing values) SAS will assign that variable to have a character format, even if the column is defined as numeric.

The number of rows SAS scans when determining data types can be modified from its default value of 20. There is an option called "guessingrows" which can be modified in the registry settings. For details on how to use this option, see http://support.sas.com/techsup/unotes/SN/001/001075.html

Importing multiple Excel files. Suppose you need to import multiple Excel files with one worksheet and place them into one SAS dataset. And suppose these files have unique names like ab.xls, abc.xls, abcd.xls, etc., and you would like to import these files without writing their prefix names directly into multiple versions of the PROC IMPORT code. The corresponding SAS datasets should also be given the same names as the prefix of the Excel files. One solution is to create a list of file prefix names in a macro variable and then use a macro that reads each file listed in the macro variable. An example of this process is available at http://darkwing.uoregon.edu/~robinh/data_transfer.html

All the Excel files should be placed in the same directory. If they all have the same file structure and are to be appended to one another, this can be accomplished in one DATA file step with a SET statement that contains the same list of names from the recently created SAS datasets.

Exporting data from SAS into Excel

With all this focus on importing data from Excel, you may wonder if the reverse process is possible. Yes, there is both an EXPORT wizard and PROC EXPORT that work in an analogous manner to IMPORT. Exporting one SAS dataset to a single worksheet in a new Excel workbook is quite simple to do. However, EXPORT with version 8.2 is not as well-developed as IMPORT when working with multiple worksheets or transferring data formats. This will likely be a topic for a future article, especially when SAS 9.1, which includes features for efficiently exporting data into Excel, is released next year.


Summer 2003 Computing News | Computing Center Home Page