Return to UOCC HomeComputing News Home
Header bar

An Introduction to Appending Two or More SAS Datasets or Merging Them Together

Robin High
Statistical Programmer and Consultant
robinh@uoregon.edu

This article explains how to combine two datasets using SAS. We'll look at both appending datasets (resulting in an increase in the number of cases) and merging datasets (adding variables to existing cases). It is a condensed version of two much longer documents (http://darkwing.uoregon.edu/~robinh/061appl_data.txt and http://darkwing.uoregon.edu/~robinh/062appl_merge.txt).

Methods of Appending Datasets

To illustrate what we mean when we talk about appending two datasets, assume we have dataset alpha with variables subject, age, and weight, and dataset beta also with variables subject, age, and weight.

To use SAS to combine the data from those two datasets, one could simply enter:

DATA combo;
SET alpha beta; RUN;

However, there are other more specialized SAS procedures which can also be used for this task, including:

Choosing Between Using the SAS DATA Step Approach and PROC APPEND

Of the two approaches, the DATA step is the least efficient, although on today's fast systems, efficiency probably isn't a major concern for most reasonably sized datasets. The DATA step does have a distinct advantage in that it is the only method whereby SAS can compute new variables or enter conditional OUTPUT statements if there are any cases you want to delete. The DATA step also allows you to enter an option to keep track of which records came from specific datasets.

PROC APPEND. PROC APPEND also allows you to combine cases with the same variables, just as you can with a DATA step using a SET statement, but with more consistency checking.

By default, PROC APPEND takes special care to ensure that datasets to be combined are strictly congruent. It looks to see if all character variables are defined to be of the same length, and also checks both datasets to see if they have exactly the same set of variables.

You can override those PROC APPEND consistency checks if you want to (see the PROC APPEND FORCE option), but you should be careful when doing so. SAS is trying to help you avoid problems, so it's best not to ignore its efforts.

PROC APPEND comes in handy when you collect the same data over time or the same data from different sources and want to accumulate them into one file. PROC APPEND can also be used within a SAS macro where the same procedure is applied to many datasets and you need to place the results, such as regression coefficients or summary statistics, into one file. Other applications are found with simulations or bootstraps, where the output from many runs produce datasets with the same structure.

For examples and more detailed explanations of the various ways of appending SAS datasets, go to http://darkwing.uoregon.edu/~robinh/061appl_data.txt

How to MERGE Two or More SAS Datasets

When you merge two datasets, your objective is to add new variables from matching observations.

For example, suppose you have a dataset called “states” that has information about each of the 50 states. That “states” dataset might have each state's name, its area in square miles, and its population. A new dataset, called “vehicles,” has each state's name and the number of cars and the number of motorcycles registered in that state. You want to add the two variables from the “vehicles” dataset (the number of cars and the number of motorcycles) to the variables on the original dataset, “states.”

In this case, each dataset has data for all fifty states. Both files should have one or more key variables with the same format. In our example, that would be the state's name. Other examples of common keys are Social Security numbers or subject names. The conservative way to do this is to create a new dataset (we'll call it “widedata”) by entering:

PROC SORT DATA=states;
BY state_name;
PROC SORT DATA=vehicles;
BY state_name;
DATA widedata;
MERGE states vehicles;
BY state_name;
RUN;

This example is comparatively simple, and does not fully reflect all the possible ways that the DATA step with a MERGE statement, or the even more powerful PROC SQL, can be applied.

Why Do You Need to Sort and Use a BY Statement?

If you merge two datasets without a BY statement, the process automatically matches data from row 1 of file1 with data from row 1 of file2, data from row 2 in file1 with the data from row 2 in file2, and so forth.

In some situations when SAS merges files without a BY statement, the process may work correctly. But what happens when you do not include a BY statement in the DATA step when it is really needed? SAS still processes the two files, but the resulting output dataset has incorrectly matched the records.

Because it is very easy to unintentionally merge variables from observations across two datasets that do not match, we recommend that you always first sort the two datasets by their unique identification variables with PROC SORT. You then enter the BY statement into the DATA step following the MERGE statement that lists the names of the two datasets.

If you're concerned that you might forget to include a BY statement when merging files, you can activate a System Option to report when a merge is missing a BY statement (this option will stop data processing):

OPTIONS MergeNoBy=error;

To summarize, you should always perform the following sequence of steps:

  1. Sort the data first.
  2. Use a BY statement following the MERGE.
  3. Flag missing BY statements with a system option.

When merging files, many complications can arise. For example, you might encounter situations where you have multiple records in a file with the same value of the BY variable.

In such situations, the files can be combined, but only if you use very specific procedures. We strongly recommend that you print at least a sample of the resulting dataset to confirm that it merged the way it should. Details on some of the more complicated merge scenarios are available from http://darkwing.uoregon.edu/~robinh/062appl_merge.txt

Online References

  1. “Everything you wanted to know about MERGE but were afraid to ask”
    http://support.sas.com/techsup/technote/ts644.html
  2. “Reading, Combining, and Modifying SAS Data Sets”
    http://sas.uoregon.edu/sashtml/lrcon/z1125856.htm
  3. “The SQL Procedure”
    http://sas.uoregon.edu/sashtml/proc/z0086336.htm
  4. “How MERGE Really Works”
    http://www.pswcrl.ars.usda.gov/Popham%5Cmerge.pdf

Spring 2004 Computing News | Computing Center Home Page